الذكاء الاصطناعي القابل للتفسير لتشخيص مرض باركنسون المبكر من خلال تحليل الصوت Explainable artificial intelligence to diagnose early Parkinson’s disease via voice analysis

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-96575-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40188263
تاريخ النشر: 2025-04-05
المؤلف: Matthew Shen وآخرون
الموضوع الرئيسي: اضطرابات الصوت والكلام

نظرة عامة

تستكشف هذه الورقة البحثية تطبيق تقنيات الذكاء الاصطناعي (AI) وتعلم الآلة (ML) للكشف المبكر عن مرض باركنسون (PD) من خلال تحليل الصوت. غالبًا ما تكون طرق التشخيص التقليدية لمرض PD غير فعالة ومكلفة، مما يبرز الحاجة إلى حلول مبتكرة. تستخدم الدراسة نموذجًا هجينًا يدمج الشبكات العصبية التلافيفية (CNN) والشبكات العصبية المتكررة (RNN) وتعلم النواة المتعددة (MKL) والبيرسيبترون متعدد الطبقات (MLP) لتحليل مجموعة بيانات تتكون من 81 تسجيل صوتي. تم استخراج وتحليل الميزات الصوتية الرئيسية، بما في ذلك معاملات سيفستر الترددية (MFCCs) والاهتزاز والوميض.

أظهر النموذج المقترح مقاييس أداء مثيرة للإعجاب، حيث حقق دقة بنسبة 91.11%، واسترجاع بنسبة 92.50%، ودقة بنسبة 89.84%، ودرجة F1 بنسبة 91.13%، ومنطقة تحت المنحنى (AUC) قدرها 0.9125. بالإضافة إلى ذلك، عزز استخدام تفسيرات شابلي الإضافية (SHAP) من قابلية تفسير نموذج الذكاء الاصطناعي من خلال تحديد الميزات الحرجة التي تؤثر على تشخيص مرض PD. تم أيضًا تطوير نظام تسجيل قائم على الاحتمالات لمساعدة المرضى والأطباء في مراقبة تقدم المرض. بشكل عام، يقدم هذا النهج المدفوع بالذكاء الاصطناعي طريقة غير جراحية وفعالة من حيث التكلفة وسريعة للكشف المبكر عن مرض PD، مما قد يمكّن من العلاج الشخصي من خلال تحليل العلامات الحيوية الصوتية.

مقدمة

في المقدمة، يصف المؤلفون عملية تحويل 81 تسجيلًا صوتيًا من إشارات المجال الزمني إلى المجال الترددي باستخدام تحويل فورييه (FT). يسمح هذا التحويل بتمثيل الإشارات الصوتية من حيث مكوناتها الترددية والأمواج المقابلة، مما يسهل عزل وتحديد الخصائص الصوتية المختلفة. من خلال تحليل المجال الترددي، يمكن للباحثين فهم الميزات الأساسية للكلام التي تساهم في الهوية الصوتية والخصائص بشكل أفضل.

طرق

توضح قسم “الطرق” الأساليب التجريبية والتحليلية المستخدمة في الدراسة. استخدم الباحثون مجموعة من التقنيات الكمية والنوعية لجمع البيانات، مما يضمن فهمًا شاملاً للظواهر قيد البحث. شملت المنهجيات المحددة تجارب محكومة، وتحليلات إحصائية، وتقنيات نمذجة، والتي تم تطبيقها لاختبار الفرضيات التي تم صياغتها في الدراسة.

شملت جمع البيانات أخذ عينات منهجية واستخدام أدوات موثوقة لقياس المتغيرات الرئيسية. تم إجراء التحليل باستخدام برامج إحصائية مناسبة، مما يسمح بفحص صارم للنتائج. يبرز القسم أهمية القابلية للتكرار والشفافية في الطرق، موضحًا الخطوات المتخذة لتقليل التحيز وضمان موثوقية النتائج. بشكل عام، كانت المنهجيات المستخدمة مصممة لمعالجة أسئلة البحث بشكل قوي والمساهمة في مجموعة المعرفة الموجودة في هذا المجال.

نتائج

في هذا القسم، يتم تقديم نتائج تقييم النموذج، مع التركيز على الدقة وفقدان الانتروبيا المتقاطعة كمقاييس رئيسية. تعكس الدقة قدرة النموذج على تصنيف تسميات البيانات بشكل صحيح، حيث تشير الدقة العالية إلى تمييز فعال بين تسجيلات التحكم الصحية (HC) ومرض باركنسون (PD). على العكس، تشير الدقة المنخفضة إلى عدد كبير من التصنيفات الخاطئة. يقيس فقدان الانتروبيا المتقاطعة التباين بين التسميات المتوقعة والفعلية أثناء التدريب، حيث تشير القيم التي تزيد عن 40% إلى عدم توافق ضعيف، بينما تشير القيم التي تقل عن 20% إلى أداء تنبؤي قوي.

لضمان تقييم قوي، تم استخدام نهج التحقق المتقاطع بخمس طيات (CV)، حيث تم تقسيم مجموعة البيانات المصنفة إلى خمس مجموعات فرعية. تم تدريب النموذج على أربع مجموعات بينما تم التحقق من صحة المجموعة الخامسة، وتكرار هذه العملية لجميع الطيات. تضمن هذه المنهجية أن يتم استخدام كل نقطة بيانات لكل من التدريب والتحقق، مما يوفر متوسطًا ثابتًا لمقاييس التقييم ويقلل من خطر الإفراط في التكيف. تم استخدام كل مجموعة فرعية للتحقق مرة واحدة فقط، مما يعزز موثوقية تقييم أداء النموذج.

مناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على تقييم أداء نموذج تعلم الآلة الهجين (MLP + CNN + RNN + MKL) الذي تم تطويره لتشخيص مرض باركنسون (PD) من خلال تحليل الصوت. حقق النموذج دقة متوسطة مثيرة للإعجاب بلغت 91.11%، متجاوزًا الدقة التشخيصية المجمعة البالغة 80.6% التي تم الإبلاغ عنها في تحليل ميتا لعام 2016 للفحوصات المرضية. شملت المقاييس الرئيسية دقة بنسبة 89.84%، واسترجاع بنسبة 92.50%، ودرجة F1 بنسبة 91.13%، وAUC قدرها 0.9125، مما يشير إلى قوة تمييز قوية بين التحكم الصحي (HC) ومرضى PD. يدعم الأداء المتسق للنموذج عبر الطيات المختلفة والقيم المنخفضة للخسارة موثوقيته في التنبؤ بالبيانات غير المرئية.

أظهر نظام التسجيل المطبق على العينات الصوتية تمييزًا واضحًا بين حالات HC وPD، حيث سجلت معظم ملفات HC بين 0 و0.30 وسجلت ملفات PD بين 0.70 و0.90. من الجدير بالذكر أن التصنيفات الخاطئة للنموذج كانت تعزى إلى تداخل الميزات الصوتية بين مرضى HC وPD في مراحل مبكرة، مما يبرز التحدي في تمييز الحالات الحدودية. أظهر استخدام SHAP لتفسير النموذج أن معاملات سيفستر الترددية (MFCCs) والاهتزاز والوميض كانت ميزات حرجة تؤثر على التنبؤات، مما يتماشى مع الخصائص السريرية المعروفة لاضطرابات الكلام المرتبطة بـ PD. تؤكد النتائج على إمكانيات هذا النهج المدفوع بالذكاء الاصطناعي لتشخيص مرض PD مبكرًا، حيث يقدم طريقة فحص غير جراحية ومتاحة يمكن أن تعزز بشكل كبير نتائج المرضى من خلال التدخل في الوقت المناسب وخطط العلاج الشخصية.

القيود

تسلط قيود الدراسة الضوء على عدة تحديات في تطبيق نموذج الذكاء الاصطناعي للكشف المبكر عن مرض باركنسون (PD) في البيئات السريرية. تعتبر مسألة عملية الحصول على تسجيلات صوتية عالية الجودة من المرضى مصدر قلق رئيسي، حيث قد تؤثر على قدرات النموذج التحليلية. بالإضافة إلى ذلك، لا تزال فعالية النموذج في معالجة البيانات الطولية غير مؤكدة، حيث يتم تدريبه على تسجيلات ثابتة وقد لا يتتبع تقدم مرض PD بدقة مع مرور الوقت. تقيد مجموعة البيانات الصغيرة المستخدمة في هذه الدراسة النتائج، على الرغم من تنفيذ تدابير مثل التحقق المتقاطع بخمس طيات لتعزيز القوة. من الجدير بالذكر أن التباين في الميزات الصوتية المستخرجة، لا سيما في النغمة (SD = 50.08 هرتز) ومعاملات سيفستر الترددية (MFCCs) (SD تصل إلى 43.82)، يشير إلى أن مجموعة البيانات تلتقط تنوع الصوت، مما يعالج المخاوف بشأن الإفراط في التكيف.

تشمل اتجاهات البحث المستقبلية تطبيق تقنية زيادة العينة للأقليات الاصطناعية (SMOTE) لتوليد تسجيلات صوتية اصطناعية، مما يوسع مجموعة بيانات التدريب. بالإضافة إلى ذلك، يمكن أن يعزز استخدام التعلم شبه المراقب أداء النموذج من خلال دمج البيانات غير المعلّمة. قد تؤدي تعقيدات النموذج الحالي، الذي يجمع بين البيرسيبترون متعدد الطبقات (MLP) والشبكة العصبية التلافيفية (CNN) والشبكة العصبية المتكررة (RNN) وتعلم النواة المتعددة (MKL)، إلى الإفراط في التكيف، مما يستلزم طرقًا لتحقيق التوازن بين تعقيد النموذج وقيود البيانات. يمكن أن يؤدي استكشاف الأساليب متعددة الوسائط، مثل دمج تحليل الصوت مع تتبع الحركة البدنية من خلال الأجهزة القابلة للارتداء، إلى تحسين دقة التشخيص بشكل أكبر. أخيرًا، بينما يظهر النموذج وعودًا لتشخيص مرض PD، لا تزال قابليته للتطبيق على اضطرابات عصبية أخرى غير مؤكدة، مما يشير إلى أن إطار عمل متعدد الوسائط يمكن أن يعزز دقة التصنيف عبر حالات مختلفة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-96575-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40188263
Publication Date: 2025-04-05
Author(s): Matthew Shen et al.
Primary Topic: Voice and Speech Disorders

Overview

This research paper investigates the application of Artificial Intelligence (AI) and Machine Learning (ML) techniques for the early detection of Parkinson’s disease (PD) through voice analysis. Traditional diagnostic methods for PD are often inefficient and costly, highlighting the need for innovative solutions. The study employs a hybrid model that integrates Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Multiple Kernel Learning (MKL), and Multilayer Perceptron (MLP) to analyze a dataset comprising 81 voice recordings. Key acoustic features, including Mel-Frequency Cepstral Coefficients (MFCCs), jitter, and shimmer, were extracted and analyzed.

The proposed model demonstrated impressive performance metrics, achieving an accuracy of 91.11%, a recall of 92.50%, precision of 89.84%, an F1 score of 91.13%, and an area under the curve (AUC) of 0.9125. Additionally, the use of SHapley Additive exPlanations (SHAP) enhanced the interpretability of the AI model by identifying critical features influencing PD diagnosis. A probability-based scoring system was also developed to assist patients and clinicians in monitoring disease progression. Overall, this AI-driven approach presents a non-invasive, cost-effective, and rapid method for early PD detection, potentially enabling personalized treatment through the analysis of vocal biomarkers.

Introduction

In the introduction, the authors describe the process of converting 81 speech recordings from time-domain signals to the frequency domain using Fourier Transform (FT). This transformation allows for the representation of audio signals in terms of their frequency components and corresponding amplitudes, facilitating the isolation and identification of various vocal characteristics. By analyzing the frequency domain, researchers can better understand the underlying features of speech that contribute to vocal identity and characteristics.

Methods

The “Methods” section outlines the experimental and analytical approaches employed in the study. The researchers utilized a combination of quantitative and qualitative techniques to gather data, ensuring a comprehensive understanding of the phenomena under investigation. Specific methodologies included controlled experiments, statistical analyses, and modeling techniques, which were applied to test the hypotheses formulated in the study.

Data collection involved systematic sampling and the use of validated instruments to measure key variables. The analysis was conducted using appropriate statistical software, allowing for rigorous examination of the results. The section emphasizes the importance of replicability and transparency in the methods, detailing the steps taken to minimize bias and ensure the reliability of the findings. Overall, the methodologies employed were designed to robustly address the research questions and contribute to the field’s existing body of knowledge.

Results

In this section, the evaluation results of the model are presented, focusing on accuracy and cross-entropy loss as primary metrics. Accuracy reflects the model’s ability to correctly classify data labels, with high accuracy indicating effective differentiation between healthy control (HC) and Parkinson’s disease (PD) recordings. Conversely, low accuracy suggests a significant number of misclassifications. Cross-entropy loss quantifies the divergence between predicted and actual labels during training, with values above 40% indicating poor alignment and values below 20% suggesting strong predictive performance.

To ensure robust evaluation, a five-fold cross-validation (CV) approach was employed, where the stratified dataset was divided into five subsets. The model was trained on four subsets while validating on the fifth, repeating this process for all folds. This methodology guarantees that each data point is utilized for both training and validation, thereby providing a consistent average of evaluation metrics and reducing the risk of overfitting. Each subset was used for validation only once, enhancing the reliability of the model’s performance assessment.

Discussion

The discussion section of the research paper highlights the performance evaluation of a hybrid machine learning model (MLP + CNN + RNN + MKL) developed for diagnosing Parkinson’s disease (PD) through voice analysis. The model achieved an impressive average accuracy of 91.11%, surpassing the pooled diagnostic accuracy of 80.6% reported in a 2016 meta-analysis of pathological examinations. Key metrics included a precision of 89.84%, recall of 92.50%, F1 score of 91.13%, and an AUC of 0.9125, indicating strong discriminative power between healthy controls (HC) and PD patients. The model’s consistent performance across different folds and low loss values further support its reliability in predicting unseen data.

The scoring system applied to the audio samples demonstrated clear differentiation between HC and PD cases, with most HC files scoring between 0 and 0.30 and PD files scoring between 0.70 and 0.90. Notably, the model’s misclassifications were attributed to overlapping acoustic features between HC and early-stage PD patients, emphasizing the challenge of distinguishing borderline cases. The use of SHAP for model interpretability revealed that Mel-frequency cepstral coefficients (MFCCs), jitter, and shimmer were critical features influencing predictions, aligning with established clinical characteristics of PD-related speech disorders. The findings underscore the potential of this AI-driven approach for early PD diagnosis, offering a non-invasive, accessible screening method that could significantly enhance patient outcomes through timely intervention and personalized treatment plans.

Limitations

The limitations of the study highlight several challenges in applying the AI model for early Parkinson’s Disease (PD) detection in clinical settings. A primary concern is the practicality of obtaining high-quality voice recordings from patients, which may affect the model’s analytical capabilities. Additionally, the model’s efficacy in processing longitudinal data remains uncertain, as it is trained on static recordings and may not accurately track PD progression over time. The small dataset used in this study further constrains the findings, although measures such as five-fold cross-validation were implemented to enhance robustness. Notably, the variability in extracted acoustic features, particularly in pitch (SD = 50.08 Hz) and Mel-frequency cepstral coefficients (MFCCs) (SD up to 43.82), indicates that the dataset captures vocal diversity, addressing concerns about overfitting.

Future research directions include the application of Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic voice recordings, thereby expanding the training dataset. Additionally, leveraging semi-supervised learning could enhance model performance by incorporating unlabeled data. The complexity of the current model, which combines Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Multiple Kernel Learning (MKL), may also lead to overfitting, necessitating methods to balance model sophistication with data limitations. Exploring multimodal approaches, such as integrating voice analysis with physical movement tracking through wearable devices, could further improve diagnostic accuracy. Lastly, while the model shows promise for PD diagnosis, its applicability to other neurological disorders remains uncertain, suggesting that a multi-modal framework could enhance classification accuracy across various conditions.