استخدام نهج التعلم الآلي لإنتاج جزيئات نانوية PLGA محسّنة لتطبيقات توصيل الأدوية Utilization of machine learning approach for production of optimized PLGA nanoparticles for drug delivery applications

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-92725-y
PMID: https://pubmed.ncbi.nlm.nih.gov/40087385
تاريخ النشر: 2025-03-14
المؤلف: Khaled Almansour وآخرون
الموضوع الرئيسي: أنظمة توصيل الأدوية المتقدمة

نظرة عامة

تبحث هذه الدراسة في تطبيق تقنيات التعلم الآلي لتوقع حجم جزيئات بولي (حمض اللاكتيك-حمض الجليكوليك) (PLGA) النانوية، والتي تعتبر مهمة في أنظمة توصيل الأدوية. تستخدم الدراسة ميزات إدخال متنوعة، بما في ذلك البيانات الفئوية والرقمية، لتحسين الظروف اللازمة لتخليق جزيئات PLGA بحجم نانوي. تتكامل المنهجية مع ترميز Leave-One-Out لتحويل الميزات الفئوية، وعامل الشذوذ المحلي لاكتشاف الشواذ، وخوارزمية تحسين الخفافيش لضبط المعلمات الفائقة. تكشف التحليلات المقارنة لعدة نماذج، بما في ذلك الجيران الأقرب (KNN)، وطرق التجميع مثل Bagging وAdaBoost، ونموذج الانحدار KNN المحسن بحجم صغير (SBNNR) الجديد، أن نموذج ADA-KNN يتفوق في توقع حجم الجسيمات مع اختبار $R^2$ قدره 0.94385، بينما يحقق SBNNR اختبار $R^2$ قدره 0.97674 لتوقع الجهد الكهربائي زتا.

تسلط النتائج الضوء على فعالية دمج تقنيات المعالجة المتقدمة، والتحسين، وطرق التجميع في نمذجة الانحدار لتوصيف الجسيمات النانوية. يظهر نموذج SBNNR، الذي يدمج الشبكات التنافسية التوليدية واستخراج الميزات العميقة، أداءً متفوقًا على مجموعات البيانات النادرة، مما يبرز إمكانيات نهج التعلم الآلي المخصص في تطبيقات علوم المواد. تساهم هذه الدراسة في تحسين تخليق جزيئات PLGA النانوية وتؤكد على أهمية التعلم الآلي في تعزيز النمذجة التنبؤية في هذا المجال. قد تركز الأعمال المستقبلية على تحسين نموذج SBNNR وتوسيع تطبيقه على مجموعات بيانات ومجالات أوسع.

طرق

تستخدم المنهجية الموضحة في هذه الدراسة سير عمل شامل مصمم لتعزيز تحليل البيانات والنمذجة، كما هو موضح في الشكل 2. تتكامل مع تقنيات معالجة البيانات المتقدمة، وخوارزميات التحسين، ونماذج الانحدار التجميعية لتحقيق دقة تنبؤية محسنة وموثوقية. تستخدم المرحلة الأولية من المعالجة المسبقة ترميز Leave-One-Out (LOO) لتحويل فعال للمتغيرات الفئوية وطريقة عامل الشذوذ المحلي (LOF) لتحديد وإدارة الشواذ.

لتحسين النموذج، يتم استخدام خوارزمية تحسين الخفافيش (BA) لضبط المعلمات الفائقة، مما يعزز أداء نماذج الانحدار المختلفة. تتضمن مرحلة التدريب مجموعة من تقنيات التعلم الآلي، بما في ذلك الجيران الأقرب (KNN) وطرق التجميع مثل Bagging وAdaptive Boosting (AdaBoost). تنتهي المنهجية بتنفيذ نموذج الانحدار KNN المحسن بحجم صغير (SBNNR) الجديد، الذي يستفيد من الأساليب التوليدية واستخراج الميزات العميقة لمعالجة ندرة البيانات وتحسين الأداء العام. يبرز هذا النهج أهمية الكفاءة الحسابية والقدرة على التكيف مع تعقيدات مجموعة البيانات.

نتائج

في هذا القسم، يتم تقديم نتائج تحليل النمذجة التنبؤية، باستخدام مكتبات بايثون مثل scikit-learn وNumPy وMatplotlib. تم تقسيم مجموعة البيانات إلى مجموعات تدريب واختبار بنسبة 80:20، مما يسمح بالتحقق من صحة النموذج بشكل موثوق. تم استخدام مقاييس الأداء، بما في ذلك درجات R²، وجذر متوسط مربع الخطأ (RMSE)، ومتوسط الخطأ المطلق (MAE)، لتقييم النماذج. من الجدير بالذكر أن نموذج الجيران الأقرب التكيفي (ADA-KNN) أظهر أعلى دقة في توقع حجم الجسيمات، حيث حقق درجة R² اختبار قدرها 0.94385، متفوقًا بشكل كبير على نموذج الجيران الأقرب باستخدام Bagging (BAG-KNN) ونموذج الانحدار الجيران الأقرب باستخدام Bagging العشوائي (SBNNR)، اللذان سجلا درجات R² اختبار قدرها 0.82311 و0.91976، على التوالي.

تشير التحليلات إلى أن ADA-KNN يظهر قدرات تعميم قوية، مع فرق ضئيل بين درجة R² التدريبية (0.99252) ودرجة R² الاختبارية، مما يشير إلى أنه يتجنب الإفراط في التكيف. في المقابل، أظهر BAG-KNN انخفاضًا كبيرًا في الأداء من درجته التدريبية R² (0.99035) إلى درجته الاختبارية R²، مما يشير إلى مشاكل محتملة في الإفراط في التكيف. على الرغم من أن SBNNR حافظ على درجة R² تحقق عالية (0.96426 ± 0.04764)، إلا أن أدائه في الاختبار كان أقل اتساقًا مقارنة بـ ADA-KNN. تؤكد مقاييس الخطأ أيضًا تفوق ADA-KNN، حيث سجل أدنى RMSE اختبار (12.425) وأدنى MAE اختبار (10.882)، بينما كان لدى BAG-KNN أعلى مقاييس خطأ، مما يعكس أدائه التنبؤي الضعيف.

مناقشة

في هذا القسم، تناقش الدراسة المنهجيات والنتائج المتعلقة بتحسين جزيئات PLGA النانوية، مع التركيز على توقع حجم الجسيمات والجهد الكهربائي زتا باستخدام تقنيات التعلم الآلي المتقدمة. تتضمن مجموعة البيانات المستخدمة في التحسين متغيرات مثل نوع البوليمر، نوع المضاد للمذيب، محتوى PLGA، ونسبة المضاد للمذيب، مع كون المخرجات هي حجم الجسيمات (بالنانومتر) والجهد الكهربائي زتا (بالملي فولت). تستخدم الدراسة نماذج تعلم آلي متنوعة، بما في ذلك ADA-KNN وSBNNR، لتحديد الظروف المثلى لتخليق الجسيمات النانوية، محققة دقة عالية وتعميم قوي. من الجدير بالذكر أن ADA-KNN يتفوق في توقع حجم الجسيمات، بينما يظهر SBNNR أداءً متفوقًا في توقع الجهد الكهربائي زتا، مما يبرز أهمية اختيار النموذج بناءً على خصائص مجموعة البيانات المحددة.

تدمج الدراسة أيضًا تقنيات المعالجة المسبقة مثل ترميز Leave-One-Out وعامل الشذوذ المحلي لاكتشاف الشواذ، إلى جانب طرق التحسين مثل خوارزمية تحسين الخفافيش. تعالج هذه الأساليب بشكل فعال التحديات التي تطرحها مجموعات البيانات النادرة. تشير النتائج إلى أن SBNNR، المعزز بواسطة الشبكات التنافسية التوليدية لتوليد البيانات الاصطناعية، يحسن بشكل كبير من الدقة التنبؤية، خاصة في السيناريوهات التي تحتوي على بيانات محدودة. تؤكد النتائج على إمكانيات التعلم الآلي في تعزيز تحسين الجسيمات النانوية لتطبيقات توصيل الأدوية، مما يوفر إطار عمل شامل يمكن تكييفه لمهام مماثلة في علوم المواد. قد تتضمن اتجاهات البحث المستقبلية تحسين نموذج SBNNR واستكشاف قابليته للتطبيق عبر مجموعات بيانات ومجالات أوسع.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-92725-y
PMID: https://pubmed.ncbi.nlm.nih.gov/40087385
Publication Date: 2025-03-14
Author(s): Khaled Almansour et al.
Primary Topic: Advanced Drug Delivery Systems

Overview

This research investigates the application of machine learning techniques for predicting the size of Poly(lactic-co-glycolic acid) (PLGA) nanoparticles, which are significant in drug delivery systems. The study employs various input features, including categorical and numerical data, to optimize the conditions for synthesizing nanosized PLGA particles. The methodology integrates Leave-One-Out encoding for categorical feature transformation, Local Outlier Factor for outlier detection, and the Bat Optimization Algorithm for hyperparameter tuning. A comparative analysis of several models, including K-Nearest Neighbors (KNN), ensemble methods like Bagging and AdaBoost, and a novel Small-Size Bat-Optimized KNN Regression (SBNNR) model, reveals that the ADA-KNN model excels in predicting particle size with a test $R^2$ of 0.94385, while SBNNR achieves a test $R^2$ of 0.97674 for predicting Zeta Potential.

The findings highlight the effectiveness of combining advanced preprocessing, optimization, and ensemble techniques in regression modeling for nanoparticle characterization. The SBNNR model, which incorporates generative adversarial networks and deep feature extraction, demonstrates superior performance on sparse datasets, emphasizing the potential of tailored machine learning approaches in material science applications. This research contributes to the optimization of PLGA nanoparticle synthesis and underscores the importance of machine learning in enhancing predictive modeling within the field. Future work may focus on refining the SBNNR model and extending its application to broader datasets and domains.

Methods

The methodology outlined in this study employs a comprehensive workflow designed to enhance data analysis and modeling, as illustrated in Fig. 2. It integrates advanced data preprocessing techniques, optimization algorithms, and ensemble regression models to achieve improved predictive accuracy and robustness. The initial preprocessing phase utilizes Leave-One-Out (LOO) encoding for effective transformation of categorical variables and the Local Outlier Factor (LOF) method to identify and manage outliers.

For model optimization, the Bat Optimization Algorithm (BA) is employed to fine-tune hyperparameters, thereby enhancing the performance of various regression models. The training phase incorporates a spectrum of machine learning techniques, including K-Nearest Neighbors (KNN) and ensemble methods such as Bagging and Adaptive Boosting (AdaBoost). The methodology culminates in the implementation of a novel Small-Size Bat-Optimized KNN Regression (SBNNR) model, which leverages generative approaches and deep feature extraction to address data sparsity and improve overall performance. This approach underscores the importance of computational efficiency and adaptability to the complexities of the dataset.

Results

In this section, the results of the predictive modeling analysis are presented, utilizing Python libraries such as scikit-learn, NumPy, and Matplotlib. The dataset was split into training and testing sets in an 80:20 ratio, allowing for robust model validation. Performance metrics, including R² scores, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), were employed to evaluate the models. Notably, the Adaptive K-Nearest Neighbors (ADA-KNN) model exhibited the highest accuracy in predicting particle size, achieving a test R² score of 0.94385, significantly outperforming the Bagging K-Nearest Neighbors (BAG-KNN) and the Stochastic Bagging Nearest Neighbor Regression (SBNNR) models, which recorded test R² scores of 0.82311 and 0.91976, respectively.

The analysis indicates that ADA-KNN demonstrates strong generalization capabilities, with a minimal difference between its training R² score (0.99252) and test R² score, suggesting it avoids overfitting. In contrast, BAG-KNN showed a considerable decline in performance from its training R² score (0.99035) to its test R² score, indicating potential overfitting issues. Although SBNNR maintained a high validation R² score (0.96426 ± 0.04764), its test performance was less consistent compared to ADA-KNN. Error metrics further corroborate ADA-KNN’s superiority, as it recorded the lowest test RMSE (12.425) and test MAE (10.882), while BAG-KNN had the highest error metrics, reflecting its inferior predictive performance.

Discussion

In this section, the research discusses the methodologies and findings related to the optimization of PLGA nanoparticles, focusing on predicting Particle Size and Zeta Potential using advanced machine learning techniques. The dataset utilized for optimization includes variables such as polymer type, anti-solvent type, PLGA content, and anti-solvent percentage, with outputs being Particle Size (in nm) and Zeta Potential (in mV). The study employs various machine learning models, including ADA-KNN and SBNNR, to identify optimal conditions for nanoparticle synthesis, achieving high accuracy and robust generalization. Notably, ADA-KNN excels in predicting Particle Size, while SBNNR demonstrates superior performance in predicting Zeta Potential, highlighting the importance of model selection based on specific dataset characteristics.

The research further incorporates preprocessing techniques like Leave-One-Out encoding and Local Outlier Factor for outlier detection, alongside optimization methods such as the Bat Optimization Algorithm. These approaches effectively address challenges posed by sparse datasets. The findings indicate that SBNNR, enhanced by Generative Adversarial Networks for synthetic data generation, significantly improves predictive accuracy, particularly in scenarios with limited data. The results underscore the potential of machine learning in advancing nanoparticle optimization for drug delivery applications, providing a comprehensive framework that can be adapted for similar tasks in material science. Future research directions may involve refining the SBNNR model and exploring its applicability across broader datasets and domains.