حساسات حيوية بلازمونية محسّنة باستخدام التعلم الآلي للكشف فائق الحساسية Enhanced plasmonic biosensors with machine learning for ultra-sensitive detection

المجلة: Discover Nano، المجلد: 21، العدد: 1
DOI: https://doi.org/10.1186/s11671-025-04422-4
PMID: https://pubmed.ncbi.nlm.nih.gov/41484520
تاريخ النشر: 2026-01-04
المؤلف: M. Sahaya Sheela وآخرون
الموضوع الرئيسي: تخليق وتطبيقات جزيئات الذهب والفضة النانوية

نظرة عامة

ظهرت أجهزة الاستشعار البيولوجية البلازمونية كتقنية محورية للكشف عن الجزيئات الحيوية بشكل فائق الحساسية وبدون علامات، مع توقعات بأن يتجاوز سوق أجهزة الاستشعار البيولوجية العالمي 1 مليار دولار أمريكي. يقدم هذا البحث تقنية تحسين مدفوعة بالتعلم الآلي، تُسمى SERA، تهدف إلى تعزيز أداء أجهزة الاستشعار البيولوجية البلازمونية، لا سيما في تطبيقات الرنين البلازمي السطحي (SPR) وتشتت رامان المعزز السطحي (SERS). يستخدم نموذج SERA بيانات من قاعدة بيانات SERS-DB، ويخضع لخطوات معالجة مسبقة مثل تصحيح الأساس وتقليل الضوضاء، تليها استخراج الميزات باستخدام تحليل المكونات الرئيسية (PCA) والبحث عن القيم القصوى المحلية. تم تحديد مصنف الغابة العشوائية (RF) كنموذج التعلم الآلي الأمثل، حيث حقق دقة بنسبة 92%، ودقة واسترجاع بنسبة 90%، ودرجة F1 بنسبة 92% عبر ست فئات جزيئية، بما في ذلك مبيدات حشرية وجزيئات حيوية متنوعة.

أظهر نموذج SERA أداءً قويًا مع تحمل خطأ منخفض يبلغ حوالي 5%، ودرجة منطقة تحت منحنى ROC تبلغ حوالي 0.90، مما يدل على موثوقيته. تشمل اتجاهات البحث المستقبلية توسيع مجموعة البيانات لتشمل مجموعة أوسع من الفئات الجزيئية الحيوية والتحقق التجريبي في الوقت الحقيقي. كما تم تسليط الضوء على إمكانية دمج هياكل التعلم العميق، مثل الشبكات العصبية التلافيفية (CNNs)، ودمج نموذج SERA مع أنظمة الميكروفلويديك وأجهزة الاستشعار القابلة للارتداء، مما قد يعزز بشكل كبير تشخيصات نقطة الرعاية. بالإضافة إلى ذلك، يُقترح دمج تحسين بايزي مع التعلم المعزز لتحسين ضبط معلمات المستشعر في بيئات الاستشعار الديناميكية.

الطرق

في هذا القسم، يوضح المؤلفون المنهجية المستخدمة لطريقة SERA المقترحة (تحليل تقييم المستشعر والاستجابة)، مع التركيز على التحقق التجريبي والإعداد. يتم تقييم أداء نموذج SERA من خلال مقارنة توقعات الذكاء الاصطناعي مع النتائج الفعلية من المختبر لمجموعة متنوعة من الجزيئات الاختبارية. يتم استخدام مقاييس رئيسية مثل الدقة، والدقة، والاسترجاع، ومتوسط الخطأ المطلق (MAE) لتقييم فعالية النموذج. يتم حساب MAE باستخدام الصيغة:

\[
MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i – \hat{y}_i|
\]

حيث يمثل \(y_i\) القياسات الواقعية و\(\hat{y}_i\) تشير إلى توقعات النموذج. يتم تعريف الدقة، والدقة، والاسترجاع، ودرجة F1 رياضيًا، مما يسمح بتقييم شامل لقدرات النموذج التنبؤية.

بالنسبة للإعداد التجريبي، يصف المؤلفون تكوينًا أساسيًا للأجهزة والبرمجيات مصممًا للتدريب الأولي وتقييم نموذج SERA باستخدام مجموعة بيانات SERS-DB. يتضمن الإعداد جهاز كمبيوتر مكتبي بمعالج Intel Core i7، و32 جيجابايت من ذاكرة الوصول العشوائي، وقرص SSD بسعة 256 جيجابايت، إلى جانب نظام التشغيل Ubuntu 20.04 وPython 3.8 للبرمجة. يتم استخدام مكتبات أساسية مثل Scikit-learn وNumPy وpandas وmatplotlib للتحسين والتصور، بينما يستفيد النموذج من طرق فعالة حسابيًا مثل الغابة العشوائية (RF) وتحسين بايزي (BO) وتحليل المكونات الرئيسية (PCA) لتسهيل عملية التقييم.

النتائج

يقدم قسم النتائج تقييمًا شاملاً لنموذج SERA، مع التركيز على أدائه في تصنيف ستة أنواع جزيئية ذات صلة بتطبيقات أجهزة الاستشعار البيولوجية: ثياكلوبريد، إيميداكلوبريد، ثياميثوكسام، نيتينبيرام، تتراهيدروفولات، وديهيدروفولات. تتكون مجموعة البيانات من 100-500 طيف لكل جزيء، مع تقسيم التدريب/الاختبار بنسبة 70:30، مما ينتج عنه 420 عينة تدريب و180 عينة اختبار. يظهر النموذج تعميمًا فعالًا، كما يتضح من الانخفاض الثابت في خسارة التدريب وخسارة الاختبار، مع الحد الأدنى من الإفراط في التكيف وسلوك تقارب سلس. تصل دقة التدريب إلى حوالي 92%، مما يتماشى عن كثب مع دقة الاختبار، مما يعكس قوة النموذج في التعامل مع تباين البيانات الطيفية.

تكشف مصفوفة الالتباس عن أعداد إيجابية حقيقية عالية لكل فئة، مما يشير إلى تصنيف فعال مع الحد الأدنى من الأخطاء في التصنيف، خاصة بسبب تداخل الميزات الطيفية. يتم تقييم أداء النموذج كميًا باستخدام الدقة، والدقة، والاسترجاع، ودرجة F1، حيث تحقق دقة قصوى ودرجة F1 بنسبة 92%، بينما تكون الدقة والاسترجاع حوالي 90%. بالإضافة إلى ذلك، تُظهر تحليل تحولات الذروة الطيفية ارتباطًا إيجابيًا مع كفاءة الربط، مما يبرز حساسية النموذج للتفاعلات الجزيئية. يعزز خوارزمية تحسين بايزي أداء النموذج، حيث تحقق دقة مستقرة بالقرب من 92% بعد حوالي 20 تكرارًا، مما يظهر فعالية تحسين التعلم الآلي الموجه في تكوين أجهزة الاستشعار.

المناقشة

في هذا القسم، يناقش البحث تطوير إطار محاكاة مدفوع بالذكاء الاصطناعي، يُسمى SERA، يهدف إلى تحسين أداء أجهزة الاستشعار البيولوجية البلازمونية. تشمل الأهداف الرئيسية تعزيز الحساسية ودقة الكشف من خلال تقنيات التعلم الآلي المتقدمة، لا سيما استخراج الميزات والتعرف على الأنماط الطيفية. يدمج الإطار بيانات من قاعدة بيانات طيف رامان المعزز السطحي (SERS-DB) ويستخدم نهجًا منهجيًا يشمل معالجة البيانات المسبقة، واستخراج الميزات عبر تحليل المكونات الرئيسية (PCA)، وتطبيق خوارزمية الغابة العشوائية (RF) للتنبؤ بالأداء. يتم استخدام تحسين بايزي لضبط معلمات المستشعر، مما يؤدي إلى تحسينات كبيرة في حساسية الكشف، حيث تحقق حدودًا منخفضة تصل إلى 5 نانوغرام/مل.

تسلط المناقشة الضوء على التحديات الحالية في هذا المجال، مثل الاعتماد على التحسينات التجريبية وتباين خصائص المواد، مما يعيق تطوير أجهزة استشعار بيولوجية عالية الأداء. يعالج نموذج SERA هذه القضايا من خلال توفير منهجية قوية مدفوعة بالبيانات تقلل من عمليات التجربة والخطأ وتعزز القابلية للتطبيق في العالم الحقيقي. تُظهر التحليلات المقارنة أن إطار SERA يتفوق على الطرق التقليدية ونماذج التحسين الأخرى، حيث تحقق دقة بنسبة 92% وحساسية تصل إلى 1000 نانومتر/RIU. تُقترح اتجاهات البحث المستقبلية، بما في ذلك توسيع مجموعة البيانات لفئات جزيئية حيوية أوسع ودمج هياكل التعلم العميق لتحسين استخراج الميزات وأداء المستشعر في البيئات الديناميكية.

Journal: Discover Nano, Volume: 21, Issue: 1
DOI: https://doi.org/10.1186/s11671-025-04422-4
PMID: https://pubmed.ncbi.nlm.nih.gov/41484520
Publication Date: 2026-01-04
Author(s): M. Sahaya Sheela et al.
Primary Topic: Gold and Silver Nanoparticles Synthesis and Applications

Overview

Plasmonic biosensors have emerged as a pivotal technology for ultra-sensitive and label-free biomolecule detection, with the global biosensor market projected to exceed USD 1 billion. This paper introduces a machine learning-driven optimization technique, termed SERA, aimed at enhancing the performance of plasmonic biosensors, particularly in surface plasmon resonance (SPR) and surface-enhanced Raman scattering (SERS) applications. The SERA model utilizes data from the SERS-DB database, undergoing preprocessing steps such as baseline correction and noise reduction, followed by feature extraction using Principal Component Analysis (PCA) and Local Maxima search. A Random Forest (RF) classifier was identified as the optimal machine learning model, achieving an accuracy of 92%, precision and recall of 90%, and an F1-score of 92% across six molecular classes, including various pesticides and biological molecules.

The SERA model demonstrated robust performance with a low error tolerance of approximately 5%, and an area under the ROC curve score of around 0.90, indicating its reliability. Future research directions include expanding the dataset to encompass a broader range of biomolecular classes and real-time experimental validations. The potential for integrating deep learning architectures, such as Convolutional Neural Networks (CNNs), and coupling the SERA model with microfluidic systems and wearable biosensors is also highlighted, which could significantly enhance point-of-care diagnostics. Additionally, the integration of Bayesian optimization with reinforcement learning is proposed to refine sensor parameter tuning in dynamic sensing environments.

Methods

In this section, the authors detail the methodology employed for the proposed SERA (Sensor Evaluation and Response Analysis) method, focusing on its experimental validation and setup. The performance of the SERA model is assessed by comparing AI predictions with actual laboratory results for various test molecules. Key metrics such as accuracy, precision, recall, and the Mean Average Error (MAE) are utilized to evaluate the model’s effectiveness. The MAE is calculated using the formula:

\[
MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i – \hat{y}_i|
\]

where \(y_i\) represents the real-world measurements and \(\hat{y}_i\) denotes the model predictions. Accuracy, precision, recall, and F1-score are mathematically defined, allowing for a comprehensive assessment of the model’s predictive capabilities.

For the experimental setup, the authors describe a basic hardware and software configuration designed for the initial training and evaluation of the SERA model using the SERS-DB dataset. The setup includes a desktop with an Intel Core i7 processor, 32 GB of RAM, and a 256 GB SSD, alongside the Ubuntu 20.04 operating system and Python 3.8 for coding. Essential libraries such as Scikit-learn, NumPy, pandas, and matplotlib are employed for optimization and visualization, while the model leverages computationally efficient methods like Random Forest (RF), Bayesian Optimization (BO), and Principal Component Analysis (PCA) to facilitate the evaluation process.

Results

The results section presents a comprehensive evaluation of the SERA model, focusing on its performance in classifying six molecular types relevant to biosensor applications: Thiacloprid, Imidacloprid, Thiamethoxam, Nitenpyram, Tetrahydrofolate, and Dihydrofolate. The dataset comprises 100-500 spectra per molecule, with a training/testing split of 70:30, resulting in 420 training samples and 180 testing samples. The model demonstrates effective generalization, as indicated by the steady decrease in training loss and testing loss, with minimal overfitting and a smooth convergence behavior. The training accuracy reaches approximately 92%, closely aligning with testing accuracy, which reflects the model’s robustness in handling spectral data variability.

A confusion matrix reveals high true positive counts for each class, indicating effective classification with minimal misclassifications, particularly due to overlapping spectral features. The model’s performance is quantitatively assessed using accuracy, precision, recall, and F1-score, achieving a maximum accuracy and F1-score of 92%, while precision and recall are around 90%. Additionally, the analysis of spectral peak shifts shows a positive correlation with binding efficiency, underscoring the model’s sensitivity to molecular interactions. The Bayesian optimization algorithm enhances model performance, achieving stable accuracy near 92% after approximately 20 iterations, demonstrating the effectiveness of machine learning-guided optimization in biosensor configuration.

Discussion

In this section, the research discusses the development of an AI-driven simulation framework, termed SERA, aimed at optimizing the performance of plasmonic biosensors. The primary objectives include enhancing sensitivity and detection accuracy through advanced machine learning techniques, particularly feature extraction and spectral pattern recognition. The framework integrates data from the Surface-Enhanced Raman Spectroscopy Database (SERS-DB) and employs a systematic approach that encompasses data pre-processing, feature extraction via Principal Component Analysis (PCA), and the application of a Random Forest (RF) algorithm for performance prediction. Bayesian optimization is utilized to fine-tune sensor parameters, leading to significant improvements in detection sensitivity, achieving limits as low as 5 ng/mL.

The discussion highlights the existing challenges in the field, such as reliance on empirical optimizations and the variability of material properties, which hinder the development of high-performance biosensors. The SERA model addresses these issues by providing a robust, data-driven methodology that minimizes trial-and-error processes and enhances real-world applicability. Comparative analyses demonstrate that the SERA framework outperforms traditional methods and other optimization models, achieving an accuracy of 92% and a sensitivity of 1000 nm/RIU. Future research directions are proposed, including expanding the dataset for broader biomolecular classes and integrating deep learning architectures to further refine feature extraction and sensor performance in dynamic environments.