البحث عن التشابه المعزز باستخدام وحدات معالجة الرسوميات مع MMseqs2 GPU-accelerated homology search with MMseqs2

المجلة: Nature Methods، المجلد: 22، العدد: 10
DOI: https://doi.org/10.1038/s41592-025-02819-8
PMID: https://pubmed.ncbi.nlm.nih.gov/40968302
تاريخ النشر: 2025-09-18
المؤلف: Felix Kallenborn وآخرون
الموضوع الرئيسي: دراسات الجينوميات والتطور

نظرة عامة

تناقش قسم ورقة البحث أداء وكفاءة أدوات البحث عن تجانس البروتين المختلفة، مع التركيز على مقارنة MMseqs2-GPU بأساليب أخرى مثل AlphaFold2 وJackHMMER. تسلط الدراسة الضوء على أن خطوط أنابيب ColabFold، التي تستخدم MMseqs2، فعالة بشكل خاص في عمليات البحث الميتاجينومية على نطاق واسع، حيث تحقق تحسينات كبيرة في السرعة—أسرع بمقدار 1.65 مرة من MMseqs2-CPU و31.8 مرة أسرع من AlphaFold2—بشكل أساسي بسبب تسريع توليد محاذاة التسلسل المتعددة (MSA). من الجدير بالذكر أن خطوة MSA في AlphaFold2 تستهلك جزءًا كبيرًا من وقت التشغيل الخاص بها (83%)، بينما تقلل MMseqs2-GPU هذه النسبة إلى 14.7%، مما يسهل التنفيذ الفعال على وحدة معالجة رسومات واحدة.

تشير نتائج القياس إلى أن جميع الأساليب تحافظ على دقة قابلة للمقارنة، مع متوسط درجة نمذجة القالب (TM-score) تبلغ 0.70 ± 0.05. تم تحديد MMseqs2-GPU كأداة البحث التطورية الأسرع والأكثر فعالية من حيث التكلفة، متفوقة على Foldseek-GPU في السرعة وتعزيز الحساسية قليلاً في مهام التعرف على الهياكل. تؤكد الورقة أن بنية MMseqs2-GPU تسمح بمتطلبات ذاكرة أقل وحساب فعال، مما يجعلها متاحة للباحثين ذوي الموارد المحدودة. بشكل عام، تعزز MMseqs2-GPU من إنتاجية عمليات البحث عن تجانس البروتين وتوقعات الهياكل دون المساس بالدقة، مما يوسع من إمكانية الوصول إلى أدوات المعلوماتية الحيوية السريعة والاقتصادية.

مقدمة

تستعرض المقدمة أهمية التصفية في عمليات البحث عن التجانس، مع التأكيد على دورها في تعزيز الكفاءة الحسابية من خلال تصنيف تسلسلات قاعدة البيانات المرجعية بسرعة. تستخدم الطرق التقليدية، مثل BLAST، هياكل الفهرسة للعثور على البذور الأولية، بينما تعتمد MMseqs2 استراتيجية تصفية تعتمد على تطابقات k-mer المتتالية المزدوجة، المخزنة في فهرس يعتمد على الذاكرة العشوائية للوصول السريع. ومع ذلك، تعاني هذه الطريقة من ضعف في محلية التخزين المؤقت بسبب الوصول العشوائي. بالمقابل، تحسن Diamond أداء التخزين المؤقت من خلال إنشاء قوائم متوازية من k-mers، على الرغم من زيادة تكاليف الفهرسة والترتيب.

بالإضافة إلى ذلك، تعتمد HMMER وHHblits تقنية تصنيف أكثر حساسية تبسط خوارزمية سميث-واترمان-غوتو لأداء المحاذاة بدون فجوات، مع التركيز على استبدالات البقايا لتحديد أطول تسلسل فرعي مشترك. هذه الطريقة، على الرغم من كونها أبطأ، تقدم دقة أفضل من الأساليب المعتمدة على الكلمات. تسلط المقدمة أيضًا الضوء على إمكانيات المسرعات الحديثة للأجهزة، مثل وحدات معالجة الرسومات، التي يمكنها التعامل بكفاءة مع التصفية بدون فجوات بسبب عدد النوى العالي وتعقيد التعليمات المنخفض، مما يحسن الأداء في عمليات البحث عن التجانس.

نقاش

في هذا القسم، يناقش المؤلفون أداء وكفاءة خوارزمية MMseqs2-GPU مقارنة بالطرق التقليدية لمحاذاة التسلسل والبحث عن التجانس. تشير النتائج إلى أن MMseqs2-GPU تتفوق بشكل كبير على JackHMMER وBLAST في كل من سيناريوهات الاستعلام الفردي والدفعات الكبيرة، حيث تحقق تحسينات في السرعة تصل إلى 199× و6.4×، على التوالي. كما يظهر النهج القائم على وحدة معالجة الرسومات كفاءة طاقة متفوقة، مع تحسين ملحوظ بمقدار 80.7 مرة مقارنة بـ JackHMMER في معالجة الدفعات الفردية. بالإضافة إلى ذلك، تم تقليل متطلبات الذاكرة لـ MMseqs2-GPU من حوالي 7 بايت إلى 1 بايت لكل بقايا، مما يسمح بمعالجة أكثر كفاءة لقواعد البيانات الكبيرة.

يبرز المؤلفون قدرة الخوارزمية على الحفاظ على دقة عالية في توقع الهياكل مع تسريع العملية، حيث تحقق درجات ROC التي تتجاوز PSI-BLAST وتقترب من تلك الخاصة بـ JackHMMER. تساهم التحسينات التي تم تنفيذها في سير عمل MMseqs2-GPU، بما في ذلك استخدام الذاكرة المشتركة وأنماط الوصول الفعالة للبيانات، في إنتاجيتها العالية وتقليل زمن الانتظار، خاصة عند التعامل مع مجموعات بيانات كبيرة. بشكل عام، تؤكد النتائج على إمكانيات MMseqs2-GPU كأداة قوية للتحليل السريع والدقيق للتسلسل في المعلوماتية الحيوية.

Journal: Nature Methods, Volume: 22, Issue: 10
DOI: https://doi.org/10.1038/s41592-025-02819-8
PMID: https://pubmed.ncbi.nlm.nih.gov/40968302
Publication Date: 2025-09-18
Author(s): Felix Kallenborn et al.
Primary Topic: Genomics and Phylogenetic Studies

Overview

The research paper section discusses the performance and efficiency of various protein homology search tools, specifically comparing MMseqs2-GPU with other methods like AlphaFold2 and JackHMMER. The study highlights that ColabFold pipelines, utilizing MMseqs2, are particularly effective for large-scale metagenomic searches, achieving significant speed improvements—1.65 times faster than MMseqs2-CPU and 31.8 times faster than AlphaFold2—primarily due to accelerated multiple sequence alignment (MSA) generation. Notably, AlphaFold2’s MSA step consumes a substantial portion of its runtime (83%), while MMseqs2-GPU reduces this to 14.7%, facilitating efficient single-GPU execution.

The benchmarking results indicate that all methods maintain comparable accuracy, with an average Template modeling score (TM-score) of 0.70 ± 0.05. MMseqs2-GPU is identified as the fastest and most cost-effective evolutionary search tool, outperforming Foldseek-GPU in speed and slightly enhancing sensitivity in structure recognition tasks. The paper emphasizes that MMseqs2-GPU’s architecture allows for lower memory requirements and efficient computation, making it accessible for researchers with limited resources. Overall, MMseqs2-GPU enhances the throughput of protein homology searches and structure predictions without compromising accuracy, thereby broadening the accessibility of rapid and economical bioinformatics tools.

Introduction

The introduction outlines the significance of filtering in homology searches, emphasizing its role in enhancing computational efficiency by quickly ranking reference database sequences. Traditional methods, such as BLAST, utilize index structures for initial seed finding, while MMseqs2 employs a filtering strategy based on double-consecutive k-mer matches, stored in a RAM-based index for rapid access. However, this approach suffers from poor cache locality due to random accesses. In contrast, Diamond improves cache performance by generating colinear lists of k-mers, albeit with increased indexing and sorting overhead.

Additionally, HMMER and HHblits adopt a more sensitive ranking technique that simplifies the Smith-Waterman-Gotoh algorithm to perform gapless alignments, focusing on residue substitutions to identify the longest common subsequence. This method, while slower, offers better resolution than word-based approaches. The introduction also highlights the potential of modern hardware accelerators, such as GPUs, which can efficiently handle gapless filtering due to their high core count and reduced instruction complexity, thus optimizing performance in homology searches.

Discussion

In this section, the authors discuss the performance and efficiency of the MMseqs2-GPU algorithm compared to traditional methods for sequence alignment and homology search. The results indicate that MMseqs2-GPU significantly outperforms JackHMMER and BLAST in both single-query and large-batch scenarios, achieving speed improvements of up to 199× and 6.4×, respectively. The GPU-based approach also demonstrates superior energy efficiency, with a notable 80.7-fold improvement over JackHMMER in single-batch processing. Additionally, the memory requirements for MMseqs2-GPU are reduced from approximately 7 bytes to 1 byte per residue, allowing for more efficient processing of large databases.

The authors highlight the algorithm’s ability to maintain high accuracy in structure prediction while accelerating the process, achieving ROC scores that surpass PSI-BLAST and approach those of JackHMMER. The optimizations implemented in the MMseqs2-GPU workflow, including the use of shared memory and efficient data access patterns, contribute to its high throughput and reduced latency, particularly when handling large datasets. Overall, the findings underscore the potential of MMseqs2-GPU as a powerful tool for rapid and accurate sequence analysis in bioinformatics.