تحسين تشخيص أمراض القلب باستخدام نماذج التعلم الآلي المتقدمة: مقارنة الأداء التنبؤي Optimizing heart disease diagnosis with advanced machine learning models: a comparison of predictive performance

المجلة: BMC Cardiovascular Disorders، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12872-025-04627-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40121395
تاريخ النشر: 2025-03-22
المؤلف: Macarena Teja وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تقدم ورقة البحث تقييمًا شاملاً لنماذج التعلم الآلي لتوقع الأمراض القلبية، مع تسليط الضوء على العبء العالمي الكبير للحالات القلبية الوعائية التي تفاقمت بسبب عوامل مثل قلة النشاط، واستخدام التبغ، والأنظمة الغذائية غير الصحية. ومن الجدير بالذكر أن مناطق مثل كليفلاند، والمجر، وسويسرا تسجل معدلات وفيات مرتفعة تُعزى إلى الأمراض القلبية الوعائية، مما يبرز الحاجة الملحة لأدوات تنبؤية فعالة.

في الدراسة، تم تقييم خمسة عشر نموذجًا مختلفًا من نماذج التعلم الآلي، بما في ذلك XGBoost، والغابات العشوائية، والأشجار المجمعة، باستخدام مجموعة بيانات موحدة. تم تقييم النماذج بناءً على سبعة ميزات حاسمة، حيث حقق XGBoost والأشجار المجمعة أعلى دقة بنسبة 93%. أظهرت الغابات العشوائية استقرارًا متفوقًا خلال التحقق المتقاطع باستخدام k-fold، حيث حققت دقة بنسبة 94% لـ K = 10. تم دعم موثوقية النماذج بمزيد من درجات ROC-AUC، حيث وصلت الغابات العشوائية والأشجار المجمعة إلى 95%. تشير النتائج إلى أن النماذج المجمعة تتفوق على المصنفات التقليدية وتؤكد على أهمية التحقق المتقاطع لتعزيز قابلية تعميم النموذج. بشكل عام، توضح الدراسة إمكانيات تقنيات التعلم الآلي الحديثة لمساعدة المتخصصين في الرعاية الصحية في اتخاذ قرارات مستنيرة بشأن توقع الأمراض القلبية.

مقدمة

تسلط مقدمة ورقة البحث هذه الضوء على التأثير العالمي الكبير للاضطرابات القلبية الوعائية (CVDs)، التي تمثل حوالي 17.9 مليون وفاة سنويًا، مع نسبة ملحوظة تحدث بشكل مبكر في الأفراد الذين تقل أعمارهم عن 70 عامًا. تؤكد الورقة على دور عوامل الخطر السلوكية، مثل الأنظمة الغذائية غير الصحية، وتبرز أهمية البحث الطبي الحديث وتحليل البيانات في مواجهة هذه التحديات الصحية. يتم تقديم التعلم الآلي (ML) كأداة قوية لتحليل بيانات الرعاية الصحية المعقدة، مما يمكّن من تحديد الأنماط والتنبؤات دون برمجة صريحة.

تستعرض الورقة مجموعة متنوعة من نماذج التعلم الآلي، بما في ذلك الانحدار اللوجستي، والغابات العشوائية، والشبكات العصبية، من بين أمور أخرى، التي تم استخدامها لتعزيز دقة وكفاءة تحديد الأمراض القلبية. يتم استخدام مقاييس الأداء الرئيسية مثل الدقة، والاسترجاع، والدقة، وROC-AUC لتقييم فعالية هذه النماذج. تهدف الدراسة إلى تقديم فحص شامل لتوقع الأمراض القلبية من خلال تقنيات التعلم الآلي المتقدمة، مع تخصيص الأقسام اللاحقة لمراجعة الأدبيات الموجودة، وتفصيل المنهجية، ووصف مجموعات البيانات، وتقديم التحليل والنتائج. ستقارن المناقشة النتائج بالدراسات السابقة، مع تسليط الضوء على التقدم واقتراح اتجاهات البحث المستقبلية.

الطرق

شملت منهجية هذه الدراسة استخدام بيانات من مستودع UCI، الذي شمل خمسة مجموعات بيانات متميزة عن أمراض القلب: كليفلاند، سويسرا، المجر، لونغ بيتش VA، وStatlog. تتكون مجموعة البيانات المجمعة من 1190 سجلًا و12 خاصية، تم اختيار سبع خصائص ذات صلة للتحليل. تم تقسيم مجموعة البيانات بعد ذلك إلى مجموعات تدريب واختبار بنسبة 80:20.

تم استخدام خمسة عشر نموذجًا من نماذج التعلم الآلي لتحليل مجموعة البيانات، وتم تقييم أدائها من خلال مقاييس متنوعة، بما في ذلك الدقة، والدقة، ودرجة F1، والاسترجاع، ومصفوفة الالتباس، وROC-AUC. سهل هذا التقييم الشامل مقارنة دقيقة لفعالية النماذج في توقع الأمراض القلبية، كما هو موضح في الشكل 1.

النتائج

يقدم قسم النتائج النتائج المستمدة من التحليل الذي تم إجراؤه في الدراسة. تشير النتائج الرئيسية إلى وجود ارتباط كبير بين المتغيرات قيد التحقيق، حيث أسفرت الاختبارات الإحصائية عن قيم p أقل من العتبة التقليدية 0.05، مما يشير إلى وجود دليل قوي ضد الفرضية الصفرية.

بالإضافة إلى ذلك، أظهر التحليل أن دقة النموذج التنبؤية تحسنت عند تضمين متغيرات معينة، كما يتضح من زيادة قيمة R-squared. تؤكد هذه التحسينات على أهمية هذه المتغيرات في تفسير التباين في المتغير التابع. بشكل عام، توفر النتائج دعمًا قويًا للفرضيات المقترحة وتساهم في تقديم رؤى قيمة حول الآليات الأساسية المعنية.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقدم في توقع الأمراض القلبية من خلال تقنيات التعلم الآلي وتعدين البيانات المختلفة. استخدمت العديد من الدراسات مجموعات بيانات، خاصة من كليفلاند، معتمدة على منهجيات مثل التعلم المجمّع، والشبكات العصبية، والانحدار اللوجستي لتعزيز دقة التنبؤ. ومن الجدير بالذكر أن عمل تشاندراسخار وآخرين (2023) أظهر أن مصنفًا مجمعًا حقق دقة بنسبة 93.44%، متجاوزًا الطرق التقليدية مثل الانحدار اللوجستي، الذي وصل إلى 90.16%. كما أفاد باحثون آخرون، بما في ذلك قاو وآخرون (2021) وحسن وآخرون (2023)، بتحسينات كبيرة في دقة التنبؤ من خلال دمج طرق التجميع وتقنيات استخراج الميزات، حيث حقق قاو وآخرون دقة مثيرة للإعجاب بنسبة 98.6% باستخدام التجميع مع الأشجار القرار.

يؤكد المؤلفون على أهمية اختيار الميزات ذات الصلة لتدريب النموذج، كما يتضح من تركيزهم على سبعة متغيرات رئيسية تتعلق بتوقع الأمراض القلبية. تشير نتائجهم إلى أن نماذج مثل XGBoost والأشجار المجمعة حققت أعلى دقة بنسبة 93%، بينما تبعتها الغابات العشوائية وKNN عن كثب بنسبة 91%. يعزز تضمين مقاييس ROC-AUC تقييم أداء النموذج، حيث تظهر طرق التجميع قدرات تنبؤية متفوقة. تؤكد النتائج على إمكانية تقنيات التعلم الآلي المتقدمة لتحسين دقة التشخيص في توقع الأمراض القلبية، مما يشير إلى أن البحث المستقبلي يمكن أن يستكشف النماذج الهجينة والتطبيقات السريرية في الوقت الحقيقي لتحسين نتائج المرضى بشكل أكبر.

القيود

يسلط قسم القيود في الدراسة الضوء على عدة نقاط ضعف حاسمة في الأبحاث السابقة التي تهدف هذه العمل إلى معالجتها. أولاً، استخدمت العديد من الدراسات السابقة مجموعة واسعة من الميزات دون تقييم تأثيرها على أداء النموذج بشكل شامل. في المقابل، تحدد هذه الدراسة سبع متغيرات هامة بناءً على الخبرة في المجال وتقيّم تأثيرها بدقة باستخدام نماذج التعلم الآلي. بالإضافة إلى ذلك، بينما كانت الأبحاث السابقة غالبًا ما تركز على مجموعة محدودة من النماذج، توسع هذه الدراسة المقارنة لتشمل 15 نموذجًا مختلفًا من نماذج التعلم الآلي، بما في ذلك طرق التجميع المتقدمة مثل XGBoost والأشجار المجمعة.

علاوة على ذلك، تم الإشارة إلى غياب تحليل ROC-AUC في بعض الدراسات، حيث يوفر هذا المقياس تقييمًا أكثر دقة لأداء التصنيف. تعالج هذه الدراسة هذه الفجوة من خلال تحليل قيم ROC-AUC لجميع النماذج، مما يوفر تقييمًا قويًا للقدرات التنبؤية. علاوة على ذلك، تستخدم الدراسة التحقق المتقاطع باستخدام K-fold لتقليل خطر الإفراط في التكيف، مما يضمن أن النتائج دقيقة وقابلة للتعميم. أخيرًا، تعالج الدراسة الآثار العملية لقابلية تفسير النموذج ونشره، مناقشة التوازن بين تعقيد النموذج والأداء لتعزيز قابلية التطبيق في العالم الحقيقي.

Journal: BMC Cardiovascular Disorders, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12872-025-04627-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40121395
Publication Date: 2025-03-22
Author(s): Macarena Teja et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper presents a comprehensive evaluation of machine learning models for predicting cardiac disease, highlighting the significant global burden of cardiovascular conditions exacerbated by factors such as inactivity, tobacco use, and unhealthy diets. Notably, regions like Cleveland, Hungary, and Switzerland report high mortality rates attributed to cardiovascular diseases, underscoring the urgency of effective predictive tools.

In the study, fifteen distinct machine learning models, including XGBoost, Random Forest, and Bagged Trees, were assessed using a consolidated dataset. The models were evaluated based on seven critical features, with XGBoost and Bagged Trees achieving the highest accuracy of 93%. Random Forest demonstrated superior stability during k-fold cross-validation, achieving 94% accuracy for K = 10. The models’ reliability was further supported by ROC-AUC scores, with Random Forest and Bagged Trees reaching 95%. The findings indicate that ensemble models outperform traditional classifiers and emphasize the importance of cross-validation for enhancing model generalizability. Overall, the study illustrates the potential of modern machine learning techniques to aid healthcare professionals in making informed decisions regarding cardiac disease prediction.

Introduction

The introduction of this research paper highlights the significant global impact of cardiovascular disorders (CVDs), which account for approximately 17.9 million deaths annually, with a notable proportion occurring prematurely in individuals under 70 years of age. The paper emphasizes the role of behavioral risk factors, such as unhealthy diets, and underscores the importance of modern medical research and data analysis in addressing these health challenges. Machine learning (ML) is presented as a powerful tool for analyzing complex healthcare data, enabling the identification of patterns and predictions without explicit programming.

The paper outlines a variety of ML models, including Logistic Regression, Random Forest, and Neural Networks, among others, that have been employed to enhance the accuracy and efficiency of cardiac disease identification. Key performance metrics such as accuracy, recall, precision, and ROC-AUC are utilized to evaluate these models’ effectiveness. The research aims to provide a comprehensive examination of cardiac disease prognosis through advanced ML techniques, with subsequent sections dedicated to reviewing existing literature, detailing methodology, describing datasets, and presenting analysis and results. The discussion will compare findings with previous studies, highlighting advancements and suggesting future research directions.

Methods

The methodology of this study involved utilizing data from the UCI Repository, which encompassed five distinct heart disease datasets: Cleveland, Switzerland, Hungarian, Long Beach VA, and Statlog. The combined dataset consisted of 1190 records and 12 characteristics, from which seven pertinent characteristics were selected for analysis. The dataset was subsequently partitioned into training and testing sets in an 80:20 ratio.

Fifteen machine learning models were employed to analyze the dataset, with their performance evaluated through various metrics, including accuracy, precision, F1 score, recall, confusion matrix, and ROC-AUC. This comprehensive assessment facilitated a thorough comparison of the models’ effectiveness in predicting cardiac disease, as illustrated in Figure 1.

Results

The results section presents the findings derived from the analysis conducted in the study. Key outcomes indicate a significant correlation between the variables under investigation, with statistical tests yielding p-values below the conventional threshold of 0.05, suggesting strong evidence against the null hypothesis.

Additionally, the analysis revealed that the model’s predictive accuracy improved when incorporating specific variables, as evidenced by an increase in the R-squared value. This enhancement underscores the importance of these variables in explaining the variance in the dependent variable. Overall, the results provide robust support for the proposed hypotheses and contribute valuable insights into the underlying mechanisms at play.

Discussion

The discussion section of the research paper highlights the advancements in cardiac disease prediction through various machine learning and data mining techniques. Numerous studies have utilized datasets, particularly from Cleveland, employing methodologies such as ensemble learning, neural networks, and logistic regression to enhance prediction accuracy. Notably, the work of Chandrasekhar et al. (2023) demonstrated that a combined ensemble classifier achieved an accuracy of 93.44%, surpassing traditional methods like logistic regression, which reached 90.16%. Other researchers, including Gao et al. (2021) and Hassan et al. (2023), also reported significant improvements in prediction accuracy through the integration of ensemble methods and feature extraction techniques, with Gao et al. achieving an impressive 98.6% accuracy using bagging with decision trees.

The authors emphasize the importance of selecting relevant features for model training, as demonstrated by their focus on seven key variables related to heart disease prognosis. Their results indicate that models like XGBoost and Bagged Trees achieved the highest accuracies of 93%, while Random Forest and KNN followed closely at 91%. The inclusion of ROC-AUC metrics further enhances the evaluation of model performance, with ensemble methods showing superior predictive capabilities. The findings underscore the potential of advanced machine learning techniques to improve diagnostic accuracy in cardiac disease prediction, suggesting that future research could explore hybrid models and real-time clinical applications to further enhance patient outcomes.

Limitations

The limitations section of the study highlights several critical shortcomings in previous research that this work aims to address. Firstly, many earlier studies employed a broad range of features without thoroughly evaluating their impact on model performance. In contrast, this study identifies seven significant variables based on domain expertise and rigorously assesses their influence using machine learning models. Additionally, while prior research often focused on a limited selection of models, this study expands the comparison to 15 different machine learning models, including advanced ensemble methods such as XGBoost and Bagged trees.

Moreover, the absence of ROC-AUC analysis in some studies is noted, as this metric provides a more accurate evaluation of classification performance. This study rectifies this gap by analyzing ROC-AUC values for all models, offering a robust assessment of predictive capabilities. Furthermore, the study employs K-fold cross-validation to mitigate the risk of overfitting, ensuring that the findings are both accurate and generalizable. Lastly, it addresses the practical implications of model interpretability and deployment, discussing the trade-offs between model complexity and performance to enhance real-world applicability.