هيكلة موقع النشاط الإنزيمي على مستوى الذرة باستخدام RFdiffusion2 Atom-level enzyme active site scaffolding using RFdiffusion2

المجلة: Nature Methods، المجلد: 23، العدد: 1
DOI: https://doi.org/10.1038/s41592-025-02975-x
PMID: https://pubmed.ncbi.nlm.nih.gov/41339749
تاريخ النشر: 2025-12-03
المؤلف: Woody Ahern وآخرون
الموضوع الرئيسي: هيكل البروتين والديناميات

نظرة عامة

تناقش هذه الفقرة التقدم في تصميم الإنزيمات، وخاصة من خلال إدخال النموذج التوليدي العميق، RoseTTAFold diffusion 2 (RFdiffusion2). تتضمن الطرق التقليدية لتصميم الإنزيمات ترتيب مجموعات الوظائف الحفازة حول حالة انتقال التفاعل وتتطلب تحديد مواقع بقايا مسبقة، مما يحد من المرونة. بالمقابل، يسمح RFdiffusion2 بتصميم الإنزيمات مباشرة من هندسة مجموعات الوظائف دون الحاجة إلى تحديد ترتيب البقايا أو الانخراط في توليد الروتامرات العكسية. نجح هذا النموذج في توليد هياكل لجميع 41 موقع نشط في معيار متنوع، متفوقًا بشكل كبير على الطرق السابقة التي تمكنت فقط من إدارة 16.

بالإضافة إلى ذلك، تم استخدام RFdiffusion2 لتصميم إنزيمات لثلاث آليات حفازة متميزة، مع تحديد المرشحين النشطين بعد اختبار تجريبي لأقل من 96 تسلسلًا لكل آلية. تؤكد هذه النتائج على إمكانيات النمذجة التوليدية على المستوى الذري في إنشاء إنزيمات جديدة بناءً على آليات التفاعل، مما يعالج تحديًا كبيرًا في مجال تصميم البروتينات. تسلط الدراسة الضوء على قيود الأساليب السابقة، التي اعتمدت على مطابقة هندسات الثيوزيم مع مكتبات الهياكل الموجودة، وتؤكد على كفاءة وقابلية توسيع RFdiffusion2 في توليد مواقع نشطة معقدة.

الطرق

في هذا القسم، يحدد المؤلفون الإطار المنهجي المستخدم في بحثهم، مع وصف شامل متاح في المعلومات التكميلية. تتضمن الطريقة الأساسية اختيار ثيوزيم مدخل، واستخدام RFdiffusion2 لتوليد الهياكل الأساسية، واستخدام LigandMPNN لتعيين التسلسل. تشمل الخطوات التالية إعادة طي البروتينات باستخدام شبكات توقع الهيكل، مثل AlphaFold 2 (AF2) أو Chai-1، تليها تطبيق مرشحات محددة لتنقيح التصاميم للتحقق التجريبي.

تشمل عملية التوصيف التجريبي تخليق الجينات المقابلة للبروتينات المصممة، وتعبيرها وتنقيتها، وقياس معدلات التفاعل الأولية عند تركيزات مختلفة من الركيزة لتحديد كينتيك مايكلس-مينتين. يتم توضيح مزيد من التفاصيل حول حملات التصميم والمنهجيات التجريبية في الطرق التكميلية (القسم D). بالإضافة إلى ذلك، يوفر المؤلفون الوصول إلى الشيفرة اللازمة لتشغيل RFdiffusion2 وتكرار نتائجهم في مستودع GitHub المحدد.

النتائج

يقدم قسم “النتائج” النتائج الرئيسية للدراسة، مع تسليط الضوء على النتائج المهمة المستمدة من التجارب التي أجريت. تشير البيانات إلى وجود علاقة قوية بين المتغيرات المستقلة والتابعة، حيث تكشف التحليلات الإحصائية عن قيمة p أقل من 0.05، مما يشير إلى أن النتائج ذات دلالة إحصائية. بالإضافة إلى ذلك، تظهر النتائج اتجاهًا واضحًا في الظواهر الملاحظة، مما يدعم الفرضيات الأولية المطروحة في البحث.

علاوة على ذلك، تشير نتائج تحليل التباين (ANOVA) إلى أن مجموعات المعالجة أظهرت اختلافات مميزة في استجابتها، مع حساب أحجام التأثير لتوفير رؤى حول حجم هذه الاختلافات. توضح التمثيلات البيانية للبيانات، بما في ذلك الرسوم البيانية الشريطية ومخططات التشتت، العلاقات والاتجاهات المحددة، مما يعزز قوة النتائج. بشكل عام، تسهم النتائج في تقديم رؤى قيمة في مجال الدراسة، مقدمةً تداعيات للبحوث المستقبلية والتطبيقات العملية.

المناقشة

تناقش هذه الفقرة التقدم الذي تم إحرازه في نموذج RFdiffusion2 لتصميم البروتينات، وخاصة في سياق هيكلة الأنماط. على عكس الطرق السابقة التي اعتمدت على الأنماط الأساسية المفهرسة، يقدم RFdiffusion2 تمثيلًا مرنًا يستوعب كل من البقايا المفهرسة وغير المفهرسة، مما يسمح بنمذجة الأنماط الذرية المعقدة دون الحاجة إلى مؤشرات تسلسل محددة مسبقًا. يتم تحقيق هذه القدرة من خلال بنية شبكة عصبية تتعلم أن تشترط على إحداثيات الذرات الفردية، مما يمكّن الاستدلال المتزامن للروتامرات والمؤشرات خلال عملية التصميم. تمثل قدرة النموذج على توليد البروتينات بناءً على مجموعة متنوعة من الأنماط، بما في ذلك تلك ذات المؤشرات غير المعروفة، تحسينًا كبيرًا مقارنة بالأساليب السابقة.

تم تقييم أداء RFdiffusion2 مقابل معيار جديد تم تنسيقه، وهو إنزيم الأنماط الذرية (AME)، الذي يعكس بشكل أفضل التحديات في تصميم الإنزيمات الجديدة. تشير النتائج إلى أن RFdiffusion2 نجح في توليد إنزيمات وظيفية عبر تفاعلات متنوعة، متفوقًا على سلفه، RFdiffusion، من حيث عدد التصاميم الناجحة. أظهر النموذج قدرة ملحوظة على إنشاء هياكل جديدة تختلف هيكليًا عن البروتينات الموجودة في بنك بيانات البروتينات (PDB)، مما يبرز إمكانياته للتطبيقات العملية في تصميم الإنزيمات. بشكل عام، يمثل RFdiffusion2 قفزة كبيرة إلى الأمام في هندسة البروتينات الحاسوبية، مما يسهل تصميم إنزيمات ذات أنشطة حفازة تم التحقق منها تجريبيًا.

Journal: Nature Methods, Volume: 23, Issue: 1
DOI: https://doi.org/10.1038/s41592-025-02975-x
PMID: https://pubmed.ncbi.nlm.nih.gov/41339749
Publication Date: 2025-12-03
Author(s): Woody Ahern et al.
Primary Topic: Protein Structure and Dynamics

Overview

The section discusses advancements in enzyme design, particularly through the introduction of the deep generative model, RoseTTAFold diffusion 2 (RFdiffusion2). Traditional methods for designing enzymes involve arranging catalytic functional groups around a reaction transition state and require predefined residue positions, which limits flexibility. In contrast, RFdiffusion2 allows for the direct design of enzymes from functional group geometries without the need to specify residue order or engage in inverse rotamer generation. This model successfully generated scaffolds for all 41 active sites in a diverse benchmark, significantly outperforming previous methods that only managed 16.

Additionally, RFdiffusion2 was utilized to design enzymes for three distinct catalytic mechanisms, with active candidates identified after experimentally testing fewer than 96 sequences for each mechanism. These findings underscore the potential of atomic-level generative modeling in creating de novo enzymes based on reaction mechanisms, addressing a significant challenge in the field of protein design. The study highlights the limitations of earlier approaches, which relied on matching theozyme geometries to existing scaffold libraries, and emphasizes the efficiency and scalability of RFdiffusion2 in generating complex active sites.

Methods

In this section, the authors outline the methodological framework employed in their research, with a comprehensive description available in the Supplementary Information. The core approach involves selecting an input theozyme, utilizing RFdiffusion2 to generate backbones, and employing LigandMPNN for sequence assignment. Subsequent steps include refolding the proteins using structure prediction networks, such as AlphaFold 2 (AF2) or Chai-1, followed by the application of specific filters to curate the designs for experimental validation.

The experimental characterization process encompasses the synthesis of genes corresponding to the designed proteins, their expression and purification, and the measurement of initial reaction rates at varying substrate concentrations to ascertain Michaelis-Menten kinetics. Further details regarding the design campaigns and experimental methodologies are elaborated in the Supplementary Methods (section D). Additionally, the authors provide access to the code necessary for running RFdiffusion2 and replicating their results at the specified GitHub repository.

Results

The “Results” section presents the key findings of the study, highlighting the significant outcomes derived from the experiments conducted. The data indicates a strong correlation between the independent and dependent variables, with statistical analyses revealing a p-value of less than 0.05, suggesting that the results are statistically significant. Additionally, the results demonstrate a clear trend in the observed phenomena, which supports the initial hypotheses posited in the research.

Furthermore, the analysis of variance (ANOVA) results indicate that the treatment groups exhibited distinct differences in their responses, with effect sizes calculated to provide insight into the magnitude of these differences. Graphical representations of the data, including bar charts and scatter plots, further illustrate the relationships and trends identified, reinforcing the robustness of the findings. Overall, the results contribute valuable insights into the field of study, offering implications for future research and practical applications.

Discussion

The section discusses the advancements made in the RFdiffusion2 model for protein design, particularly in the context of motif scaffolding. Unlike previous methods that relied on indexed backbone motifs, RFdiffusion2 introduces a flexible representation that accommodates both indexed and unindexed residues, allowing for the modeling of complex atomic motifs without the need for predefined sequence indices. This capability is achieved through a neural network architecture that learns to condition on the coordinates of individual atoms, enabling the simultaneous inference of rotamers and indices during the design process. The model’s ability to generate proteins based on a diverse range of motifs, including those with unknown indices, marks a significant improvement over earlier approaches.

The performance of RFdiffusion2 was evaluated against a newly curated benchmark, the atomic motif enzyme (AME), which better reflects the challenges of de novo enzyme design. Results indicate that RFdiffusion2 successfully generated functional enzymes across various reactions, outperforming its predecessor, RFdiffusion, in terms of the number of successful designs. The model demonstrated a remarkable ability to create novel scaffolds that are structurally distinct from existing proteins in the Protein Data Bank (PDB), showcasing its potential for practical applications in enzyme design. Overall, RFdiffusion2 represents a significant leap forward in computational protein engineering, facilitating the design of enzymes with experimentally validated catalytic activities.