التعلم الجماعي مع الذكاء الاصطناعي القابل للتفسير لتحسين توقعات أمراض القلب استنادًا إلى مجموعات بيانات متعددة Ensemble learning with explainable AI for improved heart disease prediction based on multiple datasets

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-97547-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40263348
تاريخ النشر: 2025-04-22
المؤلف: Shahid Mohammad Ganie وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تبحث ورقة البحث في تطبيق تقنيات التعلم الجماعي، وتحديداً طرق التكديس والتصويت، لتحسين دقة التنبؤ بأمراض القلب. من خلال تدريب خمسة عشر نموذجاً أساسياً على مجموعتين بيانات مختلفتين لأمراض القلب، طورت الدراسة نماذج جماعية تجمع بين ستة نماذج أساسية باستخدام نموذج ميتا (تكديس) وتصويت الأغلبية (تصويت). أظهرت النتائج أن كلا الطريقتين الجماعيتين تفوقت على النماذج الفردية، حيث حقق التكديس دقة أعلى. أكدت التحقق الإحصائي من خلال اختبار فريدمان للرتب المتوافقة ومقارنات هولم بعد الاختبار قوة هذه النتائج. بالإضافة إلى ذلك، استخدمت الدراسة تحليل SHAP لتوفير قابلية تفسير تنبؤات النموذج، مع تسليط الضوء على تأثير الميزات الفردية على تقييم مخاطر أمراض القلب.

تعترف الورقة بالقيود، بما في ذلك الاعتماد على مجموعات البيانات المتاحة للجمهور، والتي قد لا تمثل تنوع السكان المرضى في العالم الحقيقي. تؤكد الحاجة إلى التحقق على مجموعات بيانات أكبر وتتناول التحيزات المحتملة التي تم الكشف عنها من خلال تحليل SHAP. تشمل اتجاهات البحث المستقبلية تطوير نماذج جماعية متعددة الطبقات (MTSE) لتعزيز قابلية تكيف النموذج وقابلية الشرح، ودمج النماذج الجماعية في سير العمل السريري، واستكشاف تقنيات الذكاء الاصطناعي المتقدمة لتحسين دقة التنبؤ. تؤكد النتائج على إمكانيات التعلم الجماعي في اتخاذ القرارات السريرية، مع آثار على التشخيص المبكر واستراتيجيات العلاج الشخصية في إدارة أمراض القلب.

طرق البحث

توضح قسم منهجية البحث الإجراءات النظامية المستخدمة في الدراسة، مع تفاصيل الإطار التجريبي وتقنيات التعلم الجماعي المطبقة. يصف المؤلفون اختيار خوارزميات معينة ودمجها في تصميم البحث، مع التأكيد على المنطق وراء اختيار هذه الطرق لتعزيز الأداء التنبؤي.

بالإضافة إلى ذلك، يسلط القسم الضوء على عملية جمع البيانات، بما في ذلك أي خطوات معالجة مسبقة تم اتخاذها لضمان جودة البيانات وملاءمتها. تم تصميم المنهجية لتسهيل إعادة الإنتاج وتوفير فهم واضح لكيفية مساهمة الطرق الجماعية في النتائج العامة للبحث.

النتائج

تفصل قسم النتائج الاكتشافات التجريبية حول التنبؤ بأمراض القلب باستخدام خوارزميات التعلم الجماعي. تم تقييم النماذج الأساسية باستخدام ستة مقاييس: الدقة، الدقة، الاسترجاع، الخصوصية، درجة F1، وROC. من بين النماذج المختبرة، حقق خوارزمية الأشجار الإضافية (ET) أعلى دقة عبر مجموعتي البيانات (D1 وD2)، بينما أظهرت خوارزمية الجيران الأقرب (KNN) والشبكة العصبية متعددة الطبقات (MLP) أدنى دقة في D1 وD2، على التوالي. بشكل عام، كانت الغابة العشوائية (RF)، ET، LightGBM (LGBM)، CatBoost (CB)، وXGBoost (XGB) تؤدي بشكل أفضل باستمرار على كلا مجموعتي البيانات.

كشفت التحليلات الإضافية لنماذج التكديس والتصويت أن نموذج التكديس تفوق على نموذج التصويت في D2، حيث صنف بشكل صحيح جميع الحالات ما عدا حالتين من 308. في المقابل، صنف نموذج التصويت بشكل صحيح 330 من 357 حالة في D1. حقق كلا النموذجين متوسط دقة قدره 91% لـ D1 و98% لـ D2، مع تفوق نموذج التكديس قليلاً على نموذج التصويت في معظم مقاييس الأداء، باستثناء الاسترجاع. أشارت درجات ROC-AUC إلى أداء متقارب لكلا النموذجين في D2 (0.97)، بينما تفوق التكديس على التصويت في D1 (0.92 مقابل 0.91). بالإضافة إلى ذلك، كانت منطقة تحت منحنى الدقة والاسترجاع (AUPRC) الأعلى للتكديس مع D2 (0.98)، وكانت معدل التصنيف المتوسط (MCR) الأدنى للتكديس مع D2 (1.67). أظهرت تحليل وقت التشغيل أن التكديس كان أسرع قليلاً من التصويت، حيث تطلب كلا النموذجين وقتاً أقل على مجموعة البيانات الأصغر D1.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقدم الكبير في التنبؤ بأمراض القلب من خلال تطبيق تقنيات التعلم الجماعي. أظهرت دراسات متنوعة فعالية طرق مثل التكديس والتصويت، التي تجمع بين عدة خوارزميات تعلم آلي لتعزيز الدقة التنبؤية. على سبيل المثال، حقق تشاندراسخار وبدكرشنا دقة ملحوظة قدرها 93.44% باستخدام مصنف تكديس تصويت ناعم، بينما اقترح تيواري وآخرون إطار تكديس جماعي تجاوز النتائج السابقة بدقة قدرها 92.34%. تؤكد دراسات أخرى، بما في ذلك تلك التي أجراها رضا ومييني وآخرون، على فعالية الطرق الجماعية، مع دقة تصل إلى 96.72% في سياقات معينة.

يؤكد المؤلفون على المساهمات الفريدة لدراستهم، والتي تشمل استكشافاً شاملاً لنماذج أساسية متنوعة، وتطوير أطر تكديس وتصويت قوية، ودمج الذكاء الاصطناعي القابل للتفسير (XAI) لتعزيز قابلية تفسير النموذج. من خلال استخدام اختبارات الدلالة الإحصائية والتركيز على الميزات الأساسية التي تؤثر على التنبؤات، تهدف الدراسة إلى معالجة التصور الشائع للنماذج الجماعية كـ “صناديق سوداء”. بشكل عام، تؤكد النتائج على إمكانيات تقنيات التعلم الجماعي في تحسين العمليات التشخيصية لأمراض القلب، مما يوفر مساراً واعداً للبحث والتطبيق السريري في المستقبل.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-97547-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40263348
Publication Date: 2025-04-22
Author(s): Shahid Mohammad Ganie et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper investigates the application of ensemble learning techniques, specifically stacking and voting methods, to improve the accuracy of heart disease prediction. By training fifteen base models on two distinct heart disease datasets, the study developed ensemble models that combined six base models using a meta-model (stacking) and a majority vote (voting). The results demonstrated that both ensemble approaches outperformed individual models, with stacking achieving superior accuracy. Statistical validation through the Friedman aligned ranks test and Holm post-hoc comparisons confirmed the robustness of these findings. Additionally, the study employed SHAP analysis to provide interpretability of model predictions, highlighting the influence of individual features on heart disease risk assessment.

The paper acknowledges limitations, including the reliance on publicly available datasets, which may not represent the diversity of real-world patient populations. It emphasizes the need for validation on larger datasets and addresses potential biases revealed through SHAP analysis. Future research directions include the development of multi-tier stacked ensembles (MTSE) to enhance model adaptability and explainability, integration of ensemble models into clinical workflows, and exploration of advanced AI techniques to improve predictive accuracy. The findings underscore the potential of ensemble learning in clinical decision-making, with implications for early diagnosis and personalized treatment strategies in heart disease management.

Methods

The research methodology section outlines the systematic procedures employed in the study, detailing the experimental framework and the ensemble learning techniques applied. The authors describe the selection of specific algorithms and their integration into the research design, emphasizing the rationale behind choosing these methods to enhance predictive performance.

Additionally, the section highlights the data collection process, including any preprocessing steps taken to ensure data quality and relevance. The methodology is designed to facilitate reproducibility and to provide a clear understanding of how the ensemble methods contribute to the overall findings of the research.

Results

The results section details the experimental findings on predicting heart disease using ensemble learning algorithms. The base models were evaluated using six metrics: accuracy, precision, recall, specificity, F1-score, and ROC. Among the models tested, the Extra Trees (ET) algorithm achieved the highest accuracy across both datasets (D1 and D2), while the K-Nearest Neighbors (KNN) and Multi-Layer Perceptron (MLP) exhibited the lowest accuracies in D1 and D2, respectively. Overall, Random Forest (RF), ET, LightGBM (LGBM), CatBoost (CB), and XGBoost (XGB) consistently performed better on both datasets.

Further analysis of the stacking and voting models revealed that the stacking model outperformed the voting model on D2, correctly classifying all but two instances out of 308. In contrast, the voting model correctly classified 330 out of 357 instances in D1. Both models achieved mean accuracies of 91% for D1 and 98% for D2, with the stacking model slightly surpassing the voting model in most performance metrics, except for recall. The ROC-AUC scores indicated comparable performance for both models on D2 (0.97), while stacking outperformed voting on D1 (0.92 vs. 0.91). Additionally, the Area Under the Precision-Recall Curve (AUPRC) was highest for stacking with D2 (0.98), and the mean classification rate (MCR) was lowest for stacking with D2 (1.67). The running time analysis showed that stacking was marginally faster than voting, with both models requiring less time on the smaller dataset D1.

Discussion

The discussion section of the research paper highlights the significant advancements in heart disease prediction through the application of ensemble learning techniques. Various studies have demonstrated the effectiveness of methods such as stacking and voting, which combine multiple machine learning algorithms to enhance predictive accuracy. For instance, Chandrasekhar and Peddakrishna achieved a notable accuracy of 93.44% using a soft voting ensemble classifier, while Tiwari et al. proposed a stacked ensemble framework that surpassed previous results with an accuracy of 92.34%. Other studies, including those by Raza and Mienye et al., further corroborate the efficacy of ensemble methods, with accuracies reaching up to 96.72% in specific contexts.

The authors emphasize the unique contributions of their study, which include a comprehensive exploration of diverse base models, the development of robust stacking and voting frameworks, and the integration of explainable artificial intelligence (XAI) to enhance model interpretability. By employing statistical significance tests and focusing on the underlying features influencing predictions, the study aims to address the common perception of ensemble models as “black boxes.” Overall, the findings underscore the potential of ensemble learning techniques in improving diagnostic processes for heart disease, offering a promising avenue for future research and clinical application.