التنقل في مشاهد البروتين باستخدام نموذج خشن قابل للنقل تم تعلمه بواسطة الآلة Navigating protein landscapes with a machine-learned transferable coarse-grained model

المجلة: Nature Chemistry، المجلد: 17، العدد: 8
DOI: https://doi.org/10.1038/s41557-025-01874-0
PMID: https://pubmed.ncbi.nlm.nih.gov/40681718
تاريخ النشر: 2025-07-18
المؤلف: Nicholas E. Charron وآخرون
الموضوع الرئيسي: تعلم الآلة في علوم المواد

طرق

في هذه الدراسة، قام المؤلفون بتطوير مجموعة بيانات تتضمن محاكاة للمذيبات الصريحة لجميع الذرات لـ 50 مجالًا من CATH، والتي تمثل بروتينات صغيرة ذات هياكل مطوية متنوعة، إلى جانب حوالي 1,200 ثنائي من أحادي وثنائي الببتيدات. قاموا بتسجيل القوى الفورية المؤثرة على ذرات البروتين وأجروا تجميعًا أساسيًا للقوى على تمثيل هيكل عظمي خشن (CG) للبروتينات. لتسهيل ذلك، قاموا بتدريب حقل قوى CG، يسمى CGSchNet، والذي يدمج شبكة عصبية عميقة (GNN) مع مصطلحات سابقة مدفوعة بدافع فيزيائي.

بعد ذلك، نفذ الباحثون محاكاة واسعة باستخدام Langevin و tempering المتوازي (PT) باستخدام النموذج CG المدرب على بروتينات جديدة وغير مرئية بأحجام وهياكل مختلفة، وبالتالي تقييم أدائه وقيوده. بالإضافة إلى ذلك، حيثما أمكن، قاموا بإجراء محاكاة ديناميكية جزيئية (MD) شاملة لجميع الذرات للأنظمة الاختبارية وحللوا النتائج باستخدام نمذجة حالة ماركوف لتوفير إطار مقارن لنتائجهم.

نتائج

في هذه الدراسة، تم إنشاء مجموعة بيانات شاملة تتضمن محاكاة للمذيبات الصريحة لجميع الذرات لبروتينات صغيرة تظهر مجموعة متنوعة من الهياكل المطوية، إلى جانب العديد من تركيبات ثنائيات أحادي وثنائي الببتيدات. باستخدام بيانات التدريب هذه، طور المؤلفون حقل قوى خشن (CG) يسمى CGSchNet، والذي تم استخدامه بعد ذلك لإجراء محاكاة واسعة على بروتينات جديدة وغير مرئية سابقًا بأحجام وهياكل مختلفة. تم توضيح المنهجيات والنتائج التفصيلية في الشكل 1 وتم توضيحها بشكل أكبر في القسم 1 من الطرق والملاحق.

مناقشة

تناقش البحث تطوير والتحقق من صحة حقل قوى خشن (CG) قابل للنقل لمحاكاة الديناميات التشكيلية للببتيدات والبروتينات الصغيرة. تم اختبار النموذج ضد مجموعة متنوعة من 8-ببتيدات غير مرئية وبروتينات صغيرة سريعة الطي، مما يظهر قدرته على إعادة إنتاج المناظر الطاقية الحرة للطي/فك الطي بدقة مستمدة من محاكاة الديناميكا الجزيئية (MD) لجميع الذرات. من الجدير بالذكر أن نموذج CG توقع بنجاح حالات غير مستقرة وانتقالات الطي، محققًا درجة عالية من الدقة من حيث نسبة الاتصال الأصلية وقيم الانحراف الجذري المتوسط (r.m.s.d.)، خاصة بالنسبة لبروتينات مثل chignolin و TRPcage. تم مقارنة أداء النموذج مع حقول القوى CG الحالية (AWSEM و UNRES و Martini)، مما يبرز قدرته الفائقة على استكشاف وتثبيت حالات غير مستقرة متعددة.

علاوة على ذلك، أظهر نموذج CG أنه يمكنه التمدد بفعالية إلى بروتينات أكبر، مثل منطقة المنزل المنقوشة والبروتين المصمم de novo alpha3D، حيث تكون المحاكاة الذرية التقليدية مكلفة حسابيًا. لم يلتقط النموذج الهياكل الأصلية الصحيحة فحسب، بل سمح أيضًا باستكشاف آليات الطي وتأثيرات الطفرات على الاستقرار، مما يتوافق جيدًا مع البيانات التجريبية. تشير النتائج إلى أن نموذج CG المدرب بواسطة الآلة، الذي يستفيد من تقنيات التعلم العميق، يمكن أن يوفر إطارًا قويًا لمحاكاة ديناميات البروتين بتكلفة حسابية منخفضة بشكل كبير مع الحفاظ على دقة التنبؤ. ستركز الأعمال المستقبلية على تعزيز موثوقية النموذج في التنبؤ بالخصائص الديناميكية الحرارية وتوسيع تطبيقه على أنظمة بروتينية أكثر تعقيدًا.

Journal: Nature Chemistry, Volume: 17, Issue: 8
DOI: https://doi.org/10.1038/s41557-025-01874-0
PMID: https://pubmed.ncbi.nlm.nih.gov/40681718
Publication Date: 2025-07-18
Author(s): Nicholas E. Charron et al.
Primary Topic: Machine Learning in Materials Science

Methods

In this study, the authors developed a dataset comprising all-atom explicit solvent simulations of 50 CATH domains, which represent small proteins with a variety of folded structures, alongside approximately 1,200 dimers of mono- and dipeptides. They recorded the instantaneous forces acting on the protein atoms and performed basic force aggregation on a coarse-grained (CG) backbone representation of the proteins. To facilitate this, they trained a CG force field, termed CGSchNet, which integrates a deep graph neural network (GNN) with physically motivated prior terms.

Subsequently, the researchers executed extensive Langevin and parallel tempering (PT) simulations using the trained CG model on new, unseen proteins of differing sizes and structures, thereby assessing its performance and limitations. Additionally, wherever possible, they conducted comprehensive all-atom molecular dynamics (MD) simulations for the test systems and analyzed the results using Markov state modeling to provide a comparative framework for their findings.

Results

In this study, a comprehensive dataset was created comprising all-atom explicit solvent simulations of small proteins exhibiting a variety of folded structures, alongside numerous combinations of dimers of mono- and dipeptides. Utilizing this training data, the authors developed a coarse-grained (CG) force field named CGSchNet, which was subsequently employed to perform extensive simulations on new, previously unseen proteins of differing sizes and structures. Detailed methodologies and results are illustrated in Figure 1 and further elaborated in the Methods and Supplementary Section 1.

Discussion

The research discusses the development and validation of a transferable coarse-grained (CG) force field for simulating the conformational dynamics of peptides and small proteins. The model was tested against a variety of unseen 8-peptides and small fast-folding proteins, demonstrating its ability to accurately reproduce the folding/unfolding free energy landscapes derived from all-atom molecular dynamics (MD) simulations. Notably, the CG model successfully predicted metastable states and folding transitions, achieving a high degree of accuracy in terms of native contact fraction and root-mean-square deviation (r.m.s.d.) values, particularly for proteins like chignolin and TRPcage. The model’s performance was contrasted with existing CG force fields (AWSEM, UNRES, and Martini), highlighting its superior capability to explore and stabilize multiple metastable states.

Furthermore, the CG model was shown to effectively extrapolate to larger proteins, such as the engrailed homeodomain and the de novo designed protein alpha3D, where traditional atomistic simulations are computationally prohibitive. The model not only captured the correct native structures but also allowed for the exploration of folding mechanisms and the effects of mutations on stability, correlating well with experimental data. The findings suggest that the machine-learned CG model, leveraging deep learning techniques, can provide a robust framework for simulating protein dynamics at a significantly reduced computational cost while maintaining predictive accuracy. Future work will focus on enhancing the model’s reliability in predicting thermodynamic properties and extending its applicability to more complex protein systems.

كلمات مفتاحية: استقرار مؤقت، الديناميكا الحرارية، الذكاء الاصطناعي، الطي (تنفيذ DSP)، الفيزياء الإحصائية، الكيمياء، بروتينات، بنية البروتين، تعلم الآلة، تكوين البروتين، توقع بنية البروتين، ديناميات الجزيئات، ذرة (نظام على شريحة)، طي البروتين، علوم الحاسوب، عملية (الحوسبة)، فيزياء، قابلية النقل، كيمياء حاسوبية، مجال (رياضيات)، مجال القوة (الخيال)، مجموعة (نوع بيانات مجرد)، محاكاة الديناميات الجزيئية، نظام بيولوجي