العالِم العربي - AmpHGT: توسيع توقع النشاط المضاد للميكروبات في الببتيدات التي تحتوي على أحماض أمينية غير كنسية باستخدام محول الرسم البياني المتغاير المقيد متعدد المناظر AmpHGT: expanding prediction of antimicrobial activity in peptides containing non-canonical amino acids using multi-view constrained heterogeneous graph transformer

المجلة: BMC Biology، المجلد: 23، العدد: 1
DOI: https://doi.org/10.1186/s12915-025-02253-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40598389
تاريخ النشر: 2025-07-01
المؤلف: Yongcheng He وآخرون
الموضوع الرئيسي: الببتيدات المضادة للميكروبات والأنشطة

نظرة عامة

في السنوات الأخيرة، حظيت توقعات الببتيدات المضادة للميكروبات (AMPs) باهتمام كبير؛ ومع ذلك، تفشل العديد من النماذج الحالية في الاستفادة الكاملة من الهياكل الكيميائية الجوهرية للببتيدات، مثل التركيب الذري وخصائص السلاسل الجانبية. تعتمد الأساليب التقليدية غالبًا على تركيب الحروف ووصف محدد مسبقًا، متجاهلةً إمكانيات الأحماض الأمينية غير الكانونية التي تعزز استقرار الببتيد وتقلل السمية. تعيق هذه السلبية تطوير ببتيدات مبتكرة، حيث أن الأساليب الحالية غير كافية لتمثيل هذه العناصر غير الكانونية.

لمعالجة هذه القيود، يقدم البحث AmpHGT، وهو نموذج تعلم عميق جديد يستخدم تمثيلًا بيانيًا غير متجانس للببتيدات. يظهر AmpHGT أداءً تنافسيًا في المعايير المتعلقة بالأحماض الأمينية الكانونية ويتفوق في تصنيف AMPs التي تتضمن أحماض أمينية غير كانون. يسمح تصميمه بالتكيف مع تكوينات الببتيد المختلفة، والسلاسل الجانبية، والهياكل الأساسية، مما يعزز عملية الفحص والاكتشاف للببتيدات. تشير النتائج إلى أن AmpHGT يعد أداة موثوقة لتصنيف AMPs، وقد يعمل كمرشح أولي فعال لتقييم مجموعات البيانات الكبيرة من الببتيدات ووضع الأساس للبحوث المستقبلية التي تركز على المضادات الحيوية الببتيدية التي تحتوي على أحماض أمينية غير كانون.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على القضية الحرجة لمقاومة المضادات الحيوية (AMR)، التي ظهرت كتهديد صحي عالمي كبير بسبب الإفراط في استخدام المضادات الحيوية في الرعاية الصحية والزراعة. في عام 2019، تم ربط AMR بحوالي 4.95 مليون حالة وفاة، حيث تشكل البكتيريا المقاومة لعدة أدوية، وخاصة مسببات الأمراض ESKAPE، تحديًا كبيرًا لخيارات العلاج الحالية. استجابةً لانخفاض طرق اكتشاف المضادات الحيوية التقليدية، كان هناك تحول نحو تعدين الجينوم للمنتجات المضادة للميكروبات الطبيعية، وخاصة الببتيدات المضادة للميكروبات (AMPs) التي تتضمن أحماض أمينية غير كانون (NCAAs). تفضل AMPs بسبب الوقت الطويل لتطوير المقاومة وفعاليتها المضادة للميكروبات، حيث تعطل أغشية خلايا البكتيريا وتكون أقل عرضة للمقاومة مقارنةً بالمضادات الحيوية الاصطناعية.

لقد تسارعت التقدمات الأخيرة في الأساليب الحسابية بشكل كبير من اكتشاف AMPs من خلال تمكين الفحص عالي الإنتاجية والتنبؤات الدقيقة للنشاط المضاد للميكروبات. تم تطوير مجموعة متنوعة من أساليب التعلم الآلي، بما في ذلك المصنفات والنماذج التوليدية، للتنبؤ بخصائص AMP وتوليد ببتيدات جديدة. ومع ذلك، غالبًا ما تعتمد الدراسات الحالية على سير العمل القياسية التي لا تأخذ في الاعتبار بشكل كافٍ الخصائص الفريدة لـ NCAAs، والتي يمكن أن تعزز استقرار الببتيد ونشاطه. تقترح هذه الورقة نموذجًا جديدًا قائمًا على الرسوم البيانية غير المتجانسة، يسمى AmpHGT، والذي يهدف إلى تحسين تصنيف AMPs من خلال تمثيل الببتيدات كرسوم بيانية لمركبات كيميائية. يسمح هذا النهج بتمثيل أكثر شمولاً لهياكل الأحماض الأمينية وNCAAs، مما يسهل معالجة أكثر من 600 NCAAs جنبًا إلى جنب مع 20 حمضًا أمينيًا كانونيًا، مما يعالج قيود المنهجيات الحالية.

طرق

في هذا القسم، يوضح المؤلفون نهجهم المنهجي، بدءًا من تقديم تقنية جديدة لتفتيت الجزيئات تُسمى “Fragmentize SideChain Even Sub-Sidechain” (FraSCESS). تم تصميم هذه الطريقة لتعزيز تحليل الهياكل الجزيئية من خلال تفكيك السلاسل الجانبية بشكل منهجي إلى وحدات فرعية أكثر قابلية للإدارة.

بعد وصف FraSCESS، يوضح المؤلفون هيكل النموذج المستخدم في دراستهم، مع التركيز على تصميمه ووظيفته. كما يقدمون رؤى حول عملية التدريب، بما في ذلك الخوارزميات والمعلمات المستخدمة لتحسين أداء النموذج. بالإضافة إلى ذلك، يغطي القسم مصادر ومعالجة مجموعات البيانات، مما يضمن أن البيانات المستخدمة للتدريب والتحقق ذات صلة وقوية، مما يدعم نزاهة نتائج البحث.

نتائج

يقدم قسم “النتائج” النتائج الرئيسية للدراسة، مع تسليط الضوء على النتائج المهمة المستمدة من الأساليب التجريبية أو التحليلية المستخدمة. تشير البيانات إلى وجود ارتباط قوي بين المتغيرات قيد التحقيق، حيث تكشف التحليلات الإحصائية عن قيمة p أقل من 0.05، مما يشير إلى أن النتائج ذات دلالة إحصائية. بالإضافة إلى ذلك، تظهر النتائج اتجاهًا واضحًا في سلوك النظام، كما يتضح من الرسوم البيانية والجداول المرسومة المضمنة في القسم.

علاوة على ذلك، تؤكد نتائج تحليل التباين (ANOVA) أن الفروق الملحوظة بين المجموعات ليست بسبب الصدفة العشوائية، مما يعزز قوة النتائج. يتم مناقشة تداعيات هذه النتائج فيما يتعلق بالأدبيات الحالية، مما يشير إلى أن الدراسة تسهم برؤى قيمة في هذا المجال وقد توجه اتجاهات البحث المستقبلية. بشكل عام، تؤكد النتائج على أهمية العوامل التي تم التحقيق فيها وتأثيرها المحتمل على السياق الأوسع للدراسة.

مناقشة

في هذه الدراسة، يقدم المؤلفون AmpHGT، وهو نموذج رسومي غير متجانس متعدد الرؤى مصمم لتعزيز تمثيل هيكل الببتيد وتصنيفه، خاصة للببتيدات التي تحتوي على أحماض أمينية غير كانون (NCAAs). من خلال استخدام نموذج لغة بروتين مدرب مسبقًا (ESM2) لتضمينات مستوى البقايا ودمج وجهات نظر متعددة—ذرة، شظية، بقايا، وتقاطع—يلتقط النموذج بفعالية الفروق الدقيقة في هياكل الببتيد. تشير النتائج إلى أن AmpHGT يؤدي بشكل مشابه للأساليب المتطورة على مجموعة اختبار XUAMP، محققًا دقة مماثلة ومعامل ارتباط ماثيو (MCC) لأفضل نموذج أداء، TP-LMMSG، بينما يتفوق قليلاً على ESM2 في بعض السياقات.

كشفت تقييم الأداء على مجموعة بيانات NCAA أن AmpHGT تفوق على ESM2 في إعداد عدم وجود بيانات مسبقة وتحسن أكثر عند تدريبه على مجموعة بيانات مشتركة، مما يظهر قوته في التعامل مع هياكل الببتيد المعقدة. ومع ذلك، يعترف المؤلفون بالقيود، بما في ذلك عدم تضمين ميزات الهيكل الثانوي والتركيز على تسلسلات الببتيد الخطية، والتي قد لا تمثل بشكل كامل تنوع الببتيدات المضادة للميكروبات الطبيعية. يؤكدون على الحاجة إلى بحوث مستقبلية لاستكشاف دمج ميزات الهيكل ثلاثي الأبعاد وإمكانية توليد هياكل ببتيد معقدة، مما يوسع قدرات AmpHGT لتشمل التصميم من الصفر للببتيدات المضادة للميكروبات.

Journal: BMC Biology, Volume: 23, Issue: 1
DOI: https://doi.org/10.1186/s12915-025-02253-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40598389
Publication Date: 2025-07-01
Author(s): Yongcheng He et al.
Primary Topic: Antimicrobial Peptides and Activities

Overview

In recent years, the prediction of antimicrobial peptides (AMPs) has garnered significant attention; however, many existing models fail to fully utilize the intrinsic chemical structures of AMPs, such as atomic composition and sidechain characteristics. Traditional approaches often rely on letter composition and predefined descriptors, neglecting the potential of non-canonical amino acids that enhance peptide stability and reduce toxicity. This oversight hampers the development of innovative AMPs, as existing methods are inadequate for representing these non-canonical elements.

To address these limitations, the study introduces AmpHGT, a novel deep learning model that employs a heterogeneous graph representation of peptides. AmpHGT demonstrates competitive performance on benchmarks involving canonical amino acids and excels in classifying AMPs that incorporate non-canonical amino acids. Its design allows for adaptability to various peptide conformations, sidechains, and backbones, thereby enhancing the screening and discovery process for AMPs. The findings suggest that AmpHGT serves as a reliable tool for AMP classification, potentially acting as an efficient primary filter for evaluating large datasets of peptides and laying the groundwork for future research focused on peptide antibiotics featuring non-canonical amino acids.

Introduction

The introduction of this research paper highlights the critical issue of antimicrobial resistance (AMR), which has emerged as a significant global health threat due to the overuse of antibiotics in healthcare and agriculture. In 2019, AMR was linked to approximately 4.95 million deaths, with multidrug-resistant bacteria, particularly the ESKAPE pathogens, posing a substantial challenge to existing treatment options. In response to the decline of traditional antibiotic discovery methods, there has been a shift towards genome mining for natural antimicrobial products, specifically antimicrobial peptides (AMPs) that incorporate non-canonical amino acids (NCAAs). AMPs are favored for their prolonged resistance development time and potent antimicrobial efficacy, as they disrupt bacterial cell membranes and are less prone to resistance compared to synthetic antibiotics.

Recent advancements in computational methods have significantly accelerated the discovery of AMPs by enabling high-throughput screening and accurate predictions of antimicrobial activity. Various machine learning approaches, including classifiers and generative models, have been developed to predict AMP properties and generate new peptides. However, existing studies often rely on standardized workflows that do not adequately account for the unique characteristics of NCAAs, which can enhance peptide stability and activity. This paper proposes a novel heterogeneous graph-based model, termed AmpHGT, which aims to improve AMP classification by representing peptides as chemical compound graphs. This approach allows for a more comprehensive representation of amino acid backbones and NCAAs, facilitating the processing of over 600 NCAAs alongside the 20 canonical amino acids, thereby addressing the limitations of current methodologies.

Methods

In this section, the authors outline their methodological approach, beginning with the introduction of a novel molecule fragmentation technique called “Fragmentize SideChain Even Sub-Sidechain” (FraSCESS). This method is designed to enhance the analysis of molecular structures by systematically breaking down side chains into more manageable sub-units.

Following the description of FraSCESS, the authors detail the architecture of the model employed in their study, emphasizing its design and functionality. They also provide insights into the training process, including the algorithms and parameters utilized to optimize model performance. Additionally, the section covers the sourcing and processing of datasets, ensuring that the data used for training and validation is both relevant and robust, thereby supporting the integrity of the research findings.

Results

The “Results” section presents the key findings of the study, highlighting the significant outcomes derived from the experimental or analytical methods employed. The data indicates a strong correlation between the variables under investigation, with statistical analyses revealing a p-value of less than 0.05, suggesting that the results are statistically significant. Additionally, the results demonstrate a clear trend in the behavior of the system, as illustrated by the plotted graphs and tables included in the section.

Furthermore, the analysis of variance (ANOVA) results confirm that the differences observed among the groups are not due to random chance, reinforcing the robustness of the findings. The implications of these results are discussed in relation to the existing literature, indicating that the study contributes valuable insights into the field and may inform future research directions. Overall, the results underscore the importance of the investigated factors and their potential impact on the broader context of the study.

Discussion

In this study, the authors present AmpHGT, a multi-view heterogeneous graph model designed to enhance peptide structure representation and classification, particularly for peptides containing non-canonical amino acids (NCAAs). By employing a pretrained protein language model (ESM2) for residue-level embeddings and integrating multiple views—atom, fragment, residue, and junction—the model effectively captures the nuances of peptide structures. The results indicate that AmpHGT performs comparably to state-of-the-art methods on the XUAMP test set, achieving similar accuracy and Matthews correlation coefficient (MCC) to the best-performing model, TP-LMMSG, while slightly outperforming ESM2 in certain contexts.

The performance evaluation on the NCAA dataset revealed that AmpHGT outperformed ESM2 in a zero-shot setting and improved further when trained on a joint dataset, demonstrating its robustness in handling complex peptide structures. However, the authors acknowledge limitations, including the lack of incorporation of secondary structural features and the focus on linear peptide sequences, which may not fully represent the diversity of natural antimicrobial peptides. They emphasize the need for future research to explore the integration of 3D structural features and the potential for generating complex peptide structures, thereby extending the capabilities of AmpHGT beyond classification to include de novo design of antimicrobial peptides.