دمج نماذج لغة البروتين والمصنع الحيوي التلقائي لتعزيز تطور البروتين Integrating protein language models and automatic biofoundry for enhanced protein evolution

المجلة: Nature Communications، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41467-025-56751-8
PMID: https://pubmed.ncbi.nlm.nih.gov/39934638
تاريخ النشر: 2025-02-11
المؤلف: Qiang Zhang وآخرون
الموضوع الرئيسي: آليات تخليق RNA والبروتين

نظرة عامة

تقدم الدراسة نهجًا جديدًا في هندسة البروتينات يدمج التعلم الآلي وتقنيات البيوفوندري الآلية لتعزيز عملية التطور الموجه التقليدية. من خلال استخدام نموذج لغة البروتين ESM-2، طور الباحثون منصة تطور تلقائية تعمل ضمن نظام حلقة مغلقة، مما يسهل دورة التصميم-البناء-الاختبار-التعلم. تبدأ هذه النظام عملية التطور من خلال إجراء توقعات بدون عينة لـ 96 نوعًا من البروتينات، والتي يتم بناؤها وتقييمها بعد ذلك بواسطة البيوفوندري. تُستخدم النتائج لتدريب نموذج تنبؤ لياقة متعدد الطبقات، مما يمكّن من توقع جولة ثانية من 96 نوعًا مع لياقة محسّنة.

باستخدام إنزيم tRNA synthetase كنموذج، حققت المنصة تقدمًا كبيرًا، حيث أكملت أربع جولات من التطور في 10 أيام فقط وأسفرت عن طفرات مع نشاط إنزيمي محسّن يصل إلى 2.4 مرة. يزيد هذا النظام الآلي بشكل ملحوظ من سرعة ودقة تطور البروتين، مما يعالج قيود الطرق التقليدية التي غالبًا ما تكون بطيئة وتستغرق وقتًا طويلاً. تؤكد النتائج على إمكانيات هذا النهج في تسريع هندسة البروتينات لمجموعة متنوعة من التطبيقات الصناعية، حيث تعتبر خصائص البروتين المحسّنة مثل الاستقرار والنشاط والانتقائية أمرًا حاسمًا.

الطرق

يستعرض قسم “الطرق” الأساليب التجريبية والتحليلية المستخدمة في الدراسة. يوضح تصميم التجارب، بما في ذلك اختيار المشاركين، والمواد المستخدمة، والإجراءات المحددة المتبعة لضمان الاتساق والموثوقية في جمع البيانات. تم إجراء تحليلات إحصائية لتقييم أهمية النتائج، باستخدام تقنيات مثل تحليل الانحدار واختبار الفرضيات لاستخلاص الاستنتاجات من البيانات.

بالإضافة إلى ذلك، يصف القسم أي نماذج حسابية أو محاكاة تم استخدامها لدعم النتائج التجريبية، بما في ذلك المعلمات والافتراضات التي تم اتخاذها خلال عملية النمذجة. بشكل عام، تم تصميم الطرق المستخدمة لاختبار الفرضيات بدقة والتحقق من النتائج، مما يضمن أن الاستنتاجات المستخلصة قوية وصحيحة علميًا.

النتائج

يقدم قسم “النتائج” النتائج المستخلصة من الدراسة، مع تسليط الضوء على النتائج الرئيسية المستمدة من الأساليب التجريبية أو التحليلية المستخدمة. تشير البيانات إلى وجود علاقة كبيرة بين المتغيرات قيد التحقيق، حيث تؤكد التحليلات الإحصائية قوة هذه العلاقات. يتم الإبلاغ عن مقاييس محددة، مثل قيم p وفترات الثقة، لدعم الادعاءات المقدمة.

بالإضافة إلى ذلك، تظهر النتائج أن النموذج أو الفرضية المقترحة تتنبأ بفعالية بالظواهر الملحوظة، كما يتضح من قيم R-squared العالية وهوامش الخطأ المنخفضة في تحليلات الانحدار. توضح التمثيلات المرئية، مثل الرسوم البيانية أو الجداول، الاتجاهات والأنماط المحددة في البيانات، مما يوفر نظرة شاملة على النتائج التي تم الحصول عليها. بشكل عام، تسهم النتائج في تقديم رؤى قيمة للجسم المعرفي القائم في هذا المجال، مما يشير إلى آثار محتملة للبحث والتطبيقات المستقبلية.

المناقشة

تقدم البحث منصة جديدة لهندسة البروتينات الآلية، تُسمى التطور التلقائي المدعوم بنموذج لغة البروتين (PLMeAE)، والتي تدمج نماذج لغة البروتين (PLMs) ضمن دورة التصميم-البناء-الاختبار-التعلم (DBTL) المغلقة. تستخدم المنصة وحدتين: تستهدف الوحدة الأولى البروتينات بدون مواقع طفرات محددة مسبقًا، متنبئة بطفرات فردية عالية اللياقة من خلال نهج التعلم بدون عينة. في المقابل، تركز الوحدة الثانية على البروتينات ذات مواقع الطفرات المعروفة، باستخدام طريقة أخذ عينات متقدمة تجمع بين PLMs ومقاييس تعقيد نقل المعلومات (ITC) لاختيار الطفرات المفيدة للتوصيف التجريبي. تم تصميم العملية التكرارية لاختيار المتغيرات، والتخليق، والاختبار لتحسين وظيفة البروتين بكفاءة.

تم التحقق من فعالية نظام PLMeAE من خلال هندسة إنزيم tRNA synthetase لمتجانس الميثانوكالدوكوكوس جاناشي (MjTyrRS) لتعزيز دمجه للأحماض الأمينية غير التقليدية (ncAAs). أظهرت الدراسة تحسينات كبيرة في نشاط الإنزيم عبر عدة جولات من التطور، حيث حقق أفضل نوع زيادة بنسبة 2.4 مرة في النشاط مقارنة بالنوع البري. تم تحقيق ذلك من خلال مزيج من التوقعات بدون عينة والتنقيح التكراري باستخدام نماذج التعلم الآلي المراقب، مما يبرز إمكانيات المنصة في تبسيط وتعزيز عملية هندسة البروتين. تشير النتائج إلى أن PLMeAE يمكن أن تحدد بفعالية الأنواع عالية اللياقة وتسهّل التصميم العقلاني للبروتينات لتطبيقات محددة.

Journal: Nature Communications, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41467-025-56751-8
PMID: https://pubmed.ncbi.nlm.nih.gov/39934638
Publication Date: 2025-02-11
Author(s): Qiang Zhang et al.
Primary Topic: RNA and protein synthesis mechanisms

Overview

The study presents a novel approach to protein engineering that integrates machine learning and automated biofoundry techniques to enhance the traditional directed evolution process. By utilizing the protein language model ESM-2, the researchers developed an automatic evolution platform that operates within a closed-loop system, facilitating the Design-Build-Test-Learn cycle. This system initiates the evolution process by making zero-shot predictions of 96 protein variants, which are then constructed and evaluated by the biofoundry. The results are used to train a multilayer perceptron fitness predictor, enabling the prediction of a second round of 96 variants with improved fitness.

Using tRNA synthetase as a model enzyme, the platform achieved significant advancements, completing four rounds of evolution in just 10 days and resulting in mutants with enzyme activity enhanced by up to 2.4-fold. This automated system markedly increases the speed and accuracy of protein evolution, addressing the limitations of traditional methods that are often slow and labor-intensive. The findings underscore the potential of this approach to accelerate protein engineering for various industrial applications, where enhanced protein properties such as stability, activity, and selectivity are critical.

Methods

The “Methods” section outlines the experimental and analytical approaches employed in the study. It details the design of the experiments, including the selection of participants, materials used, and the specific procedures followed to ensure consistency and reliability in data collection. Statistical analyses were conducted to evaluate the significance of the findings, employing techniques such as regression analysis and hypothesis testing to draw conclusions from the data.

Additionally, the section describes any computational models or simulations utilized to support the experimental results, including the parameters and assumptions made during the modeling process. Overall, the methods employed are designed to rigorously test the hypotheses and validate the results, ensuring that the conclusions drawn are robust and scientifically sound.

Results

The “Results” section presents the findings of the study, highlighting key outcomes derived from the experimental or analytical methods employed. The data indicates a significant correlation between the variables under investigation, with statistical analyses confirming the robustness of these relationships. Specific metrics, such as p-values and confidence intervals, are reported to substantiate the claims made.

Additionally, the results demonstrate that the proposed model or hypothesis effectively predicts the observed phenomena, as evidenced by the high R-squared values and low error margins in the regression analyses. Visual representations, such as graphs or tables, further elucidate the trends and patterns identified in the data, providing a comprehensive overview of the results obtained. Overall, the findings contribute valuable insights to the existing body of knowledge in the field, suggesting potential implications for future research and applications.

Discussion

The research presents a novel platform for automated protein engineering, termed Protein Language Model-enabled Automatic Evolution (PLMeAE), which integrates protein language models (PLMs) within a closed-loop Design-Build-Test-Learn (DBTL) cycle. The platform employs two modules: Module I targets proteins without predefined mutation sites, predicting high-fitness single mutants through a zero-shot learning approach. In contrast, Module II focuses on proteins with known mutation sites, utilizing an advanced sampling method that combines PLMs with Information Transport Complexity (ITC) metrics to select informative mutants for experimental characterization. The iterative process of variant selection, synthesis, and testing is designed to optimize protein function efficiently.

The efficacy of the PLMeAE system was validated through the engineering of Methanocaldococcus jannaschii p-cyanophenylalanine tRNA synthetase (MjTyrRS) to enhance its incorporation of non-canonical amino acids (ncAAs). The study demonstrated significant improvements in enzyme activity across multiple rounds of evolution, with the best variant achieving a 2.4-fold increase in activity compared to the wild type. This was accomplished through a combination of zero-shot predictions and iterative refinement using supervised machine learning models, showcasing the platform’s potential to streamline and enhance the protein engineering process. The results indicate that PLMeAE can effectively identify high-fitness variants and facilitate the rational design of proteins for specific applications.