DeepDTAGen: إطار تعلم عميق متعدد المهام لتوقع تقارب الأدوية والأهداف وتوليد الأدوية المدركة للأهداف DeepDTAGen: a multitask deep learning framework for drug-target affinity prediction and target-aware drugs generation

المجلة: Nature Communications، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41467-025-59917-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40447614
تاريخ النشر: 2025-05-30
المؤلف: Pir Masoom Shah وآخرون
الموضوع الرئيسي: طرق اكتشاف الأدوية الحاسوبية

نظرة عامة

تقدم ورقة البحث إطار تعلم متعدد المهام يهدف إلى تعزيز اكتشاف الأدوية من خلال التنبؤ في وقت واحد بمدى ارتباط الأدوية بالأهداف وتوليد أنواع جديدة من الأدوية المدركة للأهداف. لقد ركزت الأساليب التقليدية في التعلم الآلي في هذا المجال بشكل كبير على المهام الفردية، إما من خلال التنبؤ بتفاعلات الأدوية مع الأهداف (DTI) أو توليد الأدوية، مما يتجاهل الروابط الجوهرية بين هذه المهام. يستفيد الإطار المقترح من الميزات المشتركة لتعلم الخصائص الهيكلية لجزيئات الأدوية، والديناميات الشكلية للبروتينات، ونشاطها البيولوجي.

لمعالجة التحديات المتعلقة بالتحسين التي تنشأ في التعلم متعدد المهام، وخاصة الصراعات الناتجة عن التدرجات بين المهام، يقدم المؤلفون خوارزمية FetterGrad. تؤكد التجارب الواسعة التي أجريت على ثلاثة مجموعات بيانات من العالم الحقيقي فعالية هذا النهج، مما يظهر قدرته على تحسين كل من التنبؤ بمدى ارتباط الأدوية بالأهداف وتوليد أدوية جديدة، مما يساهم بشكل كبير في تقدم عملية اكتشاف الأدوية.

الطرق

تحدد قسم “الطرق” في ورقة البحث تصميم التجربة والتقنيات التحليلية المستخدمة للتحقيق في فرضية البحث. استخدمت الدراسة نهجًا كميًا، يتضمن تحليلات إحصائية لتقييم البيانات التي تم جمعها من عينة سكانية. شملت المنهجيات المحددة تجارب محكومة، واستطلاعات، ودراسات رصدية، مما يضمن تقييمًا شاملاً للمتغيرات المعنية.

شملت جمع البيانات أدوات موحدة لقياس النتائج ذات الصلة، وتم تحليل النتائج باستخدام برامج إحصائية مناسبة. تم تصميم الطرق بدقة لتقليل التحيز وتعزيز موثوقية النتائج. تم تعريف المقاييس والمعايير الرئيسية، وشمل التحليل كل من الإحصاءات الوصفية والاستنتاجية لاستخلاص استنتاجات ذات مغزى من البيانات. بشكل عام، أسس الإطار المنهجي قاعدة قوية للنتائج اللاحقة المقدمة في الورقة.

النتائج

في قسم النتائج، يتم تقييم أداء DeepDTAGen عبر مجموعات بيانات KIBA وDavis وBindingDB، مع التركيز على فعاليته في التنبؤ بالارتباط والمهام التوليدية. تشمل المقاييس الرئيسية لمهمة التنبؤ بالارتباط متوسط الخطأ التربيعي (MSE)، ومؤشر التوافق (CI)، وR-squared ($R^2$)، مع مناقشات تفصيلية وإعدادات تجريبية مقدمة في المواد التكميلية.

لتقييم الأداء التوليدي، تقيس الدراسة صلاحية، وجدة، وتفرد الجزيئات المولدة، إلى جانب التحليلات الكيميائية للخصائص مثل القابلية للذوبان، وخصائص الأدوية، وقابلية التركيب. يتم استخدام استراتيجيتين لتوليد SMILES: الطريقة الأولى تولد SMILES من خلال إدخال الشروط وSMILES الأصلية إلى جهاز فك تشفير المحول، بينما تقدم الطريقة العشوائية تباينًا من خلال إنتاج عناصر عشوائية. يتم تنظيم النتائج في أقسام فرعية، مع التركيز على الأداء في الارتباط في الأقسام السبعة الأولى، بينما تتناول الأقسام اللاحقة القدرات التوليدية.

المناقشة

في هذه الدراسة، قدم المؤلفون DeepDTAGen، وهو إطار تعلم متعدد المهام مصمم للتنبؤ بمدى ارتباط الأدوية بالأهداف (DTA) وتوليد أدوية جديدة. أظهر النموذج أداءً تنبؤيًا متفوقًا عبر ثلاث مجموعات بيانات مرجعية—KIBA وDavis وBindingDB—متفوقًا على نماذج التعلم الآلي التقليدية ونماذج التعلم العميق المنافسة مثل GraphDTA. على وجه التحديد، في مجموعة بيانات KIBA، حقق DeepDTAGen متوسط خطأ تربيعي (MSE) قدره 0.146، ومؤشر توافق (CI) قدره 0.897، و$r^2_m$ قدره 0.765، مما يظهر تحسينات كبيرة في CI و$r^2_m$ مقارنة بالنماذج الحالية. تم عزو التحسينات في دقة التنبؤ إلى دمج ميزات العقد المركزية لـ DTI واستخدام شبكة عصبية ذات تلافيف محكومة لاستخراج الميزات من تسلسلات البروتين، مما سمح بتمثيل أكثر دقة للهياكل الجزيئية.

كما قامت الدراسة بالتحقق من صحة النموذج من خلال اختبارات متنوعة، بما في ذلك اختبارات اختيار الأدوية، واختبارات العشوائية، واختبارات الارتباط البارد، مؤكدة على قوته وقدرته على تعلم العلاقات الحقيقية بين الأدوية والأهداف. أشارت اختبارات العشوائية إلى أن تنبؤات النموذج لم تكن مبنية على ارتباطات زائفة، حيث انخفض الأداء بشكل كبير عندما تم تبديل البيانات. علاوة على ذلك، نجح النموذج في توليد أدوية مدركة للأهداف، مع الحفاظ على خصائص كيميائية ملائمة وتأثيرات متعددة الأدوية، وهي ضرورية للتطبيقات العملية في اكتشاف الأدوية. بشكل عام، تؤكد النتائج على فعالية DeepDTAGen في كل من التنبؤ بتفاعلات الأدوية مع الأهداف وتوليد مركبات جديدة، مما يعزز مجال اكتشاف الأدوية الحسابية.

Journal: Nature Communications, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41467-025-59917-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40447614
Publication Date: 2025-05-30
Author(s): Pir Masoom Shah et al.
Primary Topic: Computational Drug Discovery Methods

Overview

The research paper presents a novel multitask learning framework aimed at enhancing drug discovery by simultaneously predicting drug-target binding affinities and generating new target-aware drug variants. Traditional machine learning approaches in this domain have largely focused on single tasks, either predicting drug-target interactions (DTI) or drug generation, which overlooks the intrinsic connections between these tasks. The proposed framework leverages shared features to learn the structural properties of drug molecules, the conformational dynamics of proteins, and their bioactivity.

To tackle optimization challenges inherent in multitask learning, particularly gradient conflicts between tasks, the authors introduce the FetterGrad algorithm. Extensive experiments conducted on three real-world datasets validate the effectiveness of this approach, demonstrating its capability to improve both the prediction of drug-target binding affinities and the generation of novel drugs, thereby significantly advancing the drug discovery process.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research hypothesis. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected from a sample population. Specific methodologies included controlled experiments, surveys, and observational studies, ensuring a comprehensive assessment of the variables in question.

Data collection involved standardized instruments to measure the relevant outcomes, and the results were analyzed using appropriate statistical software. The methods were rigorously designed to minimize bias and enhance the reliability of the findings. Key metrics and parameters were defined, and the analysis included both descriptive and inferential statistics to draw meaningful conclusions from the data. Overall, the methodological framework established a robust basis for the subsequent findings presented in the paper.

Results

In the Results section, the performance of DeepDTAGen is evaluated across the KIBA, Davis, and BindingDB datasets, focusing on its efficacy in affinity prediction and generative tasks. Key metrics for the affinity prediction task include Mean Squared Error (MSE), Concordance Index (CI), R-squared ($R^2$), and Area Under the Precision-Recall Curve (AUPR), with detailed discussions and experimental setups provided in the supplementary materials.

For the generative performance assessment, the study measures Validity, Novelty, and Uniqueness of the generated molecules, alongside chemical analyses of properties such as Solubility, Drug-likeness, and Synthesizability. Two strategies for generating SMILES are employed: the first method generates SMILES by inputting conditions and original SMILES into a transformer decoder, while the Stochastic method introduces variability by producing stochastic elements. The results are organized into subsections, with the initial seven focusing on binding affinity performance and the subsequent sections addressing generative capabilities.

Discussion

In this study, the authors introduced DeepDTAGen, a multitask learning framework designed for predicting drug-target affinities (DTA) and generating novel drugs. The model demonstrated superior predictive performance across three benchmark datasets—KIBA, Davis, and BindingDB—outperforming traditional machine learning models and competing deep learning models like GraphDTA. Specifically, on the KIBA dataset, DeepDTAGen achieved a mean squared error (MSE) of 0.146, a concordance index (CI) of 0.897, and an $r^2_m$ of 0.765, showcasing significant improvements in CI and $r^2_m$ compared to existing models. The enhancements in predictive accuracy were attributed to the incorporation of DTI-centric node features and the use of a Gated-Convolution neural network for feature extraction from protein sequences, which allowed for a more nuanced representation of molecular structures.

The study also validated the model through various tests, including drug selection tests, randomization tests, and cold-start affinity tests, confirming its robustness and ability to learn genuine drug-target relationships. The randomization tests indicated that the model’s predictions were not based on spurious correlations, as performance significantly declined when the data was permuted. Furthermore, the model successfully generated target-aware drugs, maintaining favorable chemical properties and polypharmacological effects, which are crucial for practical applications in drug discovery. Overall, the findings underscore the effectiveness of DeepDTAGen in both predicting drug-target interactions and generating novel compounds, thereby advancing the field of computational drug discovery.