نماذج تنبؤية قائمة على التعلم الآلي لاكتشاف الأمراض القلبية الوعائية Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases

المجلة: Diagnostics، المجلد: 14، العدد: 2
DOI: https://doi.org/10.3390/diagnostics14020144
PMID: https://pubmed.ncbi.nlm.nih.gov/38248021
تاريخ النشر: 2024-01-08
المؤلف: Adedayo Ogunpola وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول ورقة البحث القضية الصحية العالمية الملحة المتعلقة بأمراض القلب والأوعية الدموية، مع التركيز على الكشف المبكر عن احتشاء عضلة القلب من خلال تقنيات التعلم الآلي المتقدمة. مع الاعتراف بالقيود المفروضة على النماذج التنبؤية الحالية، وخاصة فيما يتعلق بمجموعات البيانات غير المتوازنة التي يمكن أن تؤثر على النتائج، تقيم الدراسة سبعة مصنفات: الجيران الأقرب (KNN)، آلة الدعم الشعاعي (SVM)، الانحدار اللوجستي، الشبكة العصبية التلافيفية (CNN)، تعزيز التدرج، XGBoost، وغابة عشوائية. تسلط النتائج الضوء على نموذج XGBoost باعتباره فعالًا بشكل خاص، حيث حقق دقة بنسبة 98.50%، وprecision بنسبة 99.14%، وrecall بنسبة 98.29%، وF1 score بنسبة 98.71%، مما يظهر إمكانيته في تعزيز دقة التشخيص في أمراض القلب.

في الختام، تؤكد الدراسة على الأداء المتفوق لكل من KNN وXGBoost عبر مجموعات بيانات مختلفة، حيث يظهر XGBoost كأداة تحويلية لتوقع أمراض القلب بسبب دقته العالية وprecision. إن تداعيات هذه النتائج مهمة للممارسة السريرية، حيث إنها لا تحسن فقط موثوقية التشخيص ولكنها تمهد أيضًا الطريق لتدخلات أكثر استهدافًا. تشمل اتجاهات البحث المستقبلية دمج مجموعات بيانات التصوير الطبي الأكبر واستكشاف نماذج التجميع لتعزيز القدرات التنبؤية بشكل أكبر. تسهم هذه العمل في التطور المستمر لتشخيص أمراض القلب، داعيةً إلى اختيار نماذج مخصصة بناءً على الاحتياجات السريرية المحددة وخصائص البيانات.

مقدمة

تسلط المقدمة الضوء على الدور الحاسم للقلب في الحفاظ على الحياة من خلال دوران الدم المؤكسج وتنظيم الهرمونات، مع التأكيد على انتشار وتأثير أمراض القلب والأوعية الدموية (CVD)، التي تشمل مجموعة متنوعة من الاضطرابات التي تؤثر على القلب والأوعية الدموية. تم تحديد مرض الشريان التاجي (CHD) كنوع فرعي مهم، حيث يمثل 64% من حالات CVD ويشكل خطرًا كبيرًا على الوفاة، حيث أفادت منظمة الصحة العالمية بوفاة حوالي 17.9 مليون شخص سنويًا بسبب هذه الأمراض. تساهم عوامل الخطر المختلفة في تطور CVD، بما في ذلك ارتفاع ضغط الدم، والسمنة، والسكري، والتدخين، وأنماط الحياة الخاملة، مما يبرز الحاجة الملحة للبحث والتقدم الطبي في هذا المجال.

تناقش الورقة أيضًا الإمكانات التحويلية للتعلم الآلي (ML) في الرعاية الصحية، وخاصة في تعزيز التنبؤ والكشف عن CVDs. على الرغم من التطبيقات الحالية، لا تزال التحديات مثل مجموعات البيانات غير المتوازنة قائمة، مما يتطلب تحسين النماذج التنبؤية. يستكشف المؤلفون فعالية سبع تقنيات ML – الجيران الأقرب، آلة الدعم الشعاعي، الانحدار اللوجستي، الشبكة العصبية التلافيفية، تعزيز التدرج، XGBoost، وغابة عشوائية – باستخدام مجموعتين من البيانات المعالجة مسبقًا. تهدف الدراسة إلى تحسين النماذج الحالية وتعزيز دقة وموثوقية توقعات CVD، مما يسهم في النهاية في تحسين التدخلات السريرية ورعاية المرضى.

طرق

تستخدم البحث إطارًا منهجيًا للتحقيق في الحالات المتعلقة بالقلب، بهدف تعزيز النمذجة التشخيصية والتنبؤية. تم تصميم الطرق المستخدمة لاستكشاف جوانب متعددة من هذه الحالات، مما يسهل فهمًا أعمق لتعقيداتها. يتم تمثيل المنهجية العامة بصريًا في الشكل 1، الذي يوضح النهج المنظم المتبع في هذه الدراسة. من خلال هذه الطرق، تسعى البحث إلى المساهمة بشكل كبير في دقة وفعالية تشخيص وتوقع حالات القلب.

نتائج

يقدم قسم النتائج في الدراسة تقييمًا شاملاً لمختلف نماذج التعلم الآلي لتوقع أمراض القلب، باستخدام مجموعتين من البيانات: مجموعة بيانات أمراض القلب والأوعية الدموية ومجموعة بيانات أمراض القلب في كليفلاند. تشير النتائج الرئيسية إلى أن الغابة العشوائية الهجينة مع النموذج الخطي (HRFLM) حققت دقة بنسبة 88%، بينما أظهرت نماذج آلة الدعم الشعاعي (SVM) والجيران الأقرب (KNN) دقة بنسبة 83% و96.50%، على التوالي. من الجدير بالذكر أن نموذج التصنيف-CART أظهر أعلى دقة بنسبة 99.14%، مع حساسية مثالية وخصوصية عالية. كما أن نموذج الذاكرة طويلة وقصيرة المدى (LSTM) قدم أداءً جيدًا، حيث حقق دقة بنسبة 98.4%.

فيما يتعلق بالمعالجة المسبقة، أكدت الدراسة على أهمية إدارة البيانات المفقودة، وترميز المتغيرات الفئوية، واستخدام تقنيات تحليل البيانات الاستكشافية (EDA)، مثل خرائط الحرارة الارتباطية والرسوم البيانية، لفهم توزيع الميزات والعلاقات. حقق نموذج KNN، الذي تم تحسينه بقيمة ‘k’ تساوي 7، مقاييس مثيرة للإعجاب، بما في ذلك درجة precision تبلغ حوالي 96.61% وrecall بنسبة 97.44%. بينما حقق نموذج الغابة العشوائية، بعد ضبط المعلمات، دقة تبلغ حوالي 98.60%، بينما حقق نموذج تعزيز التدرج دقة تبلغ 98.00% على مجموعة بيانات الاختبار. بشكل عام، تؤكد النتائج فعالية هذه النماذج من التعلم الآلي في توقع أمراض القلب بدقة، حيث أظهرت نماذج تعزيز التدرج وXGBoost مقاييس أداء قوية بشكل خاص، بما في ذلك درجات precision وrecall العالية.

مناقشة

في قسم المناقشة من ورقة البحث، يقدم المؤلفون نظرة شاملة على التقدم في تقنيات التعلم الآلي (ML) والتعلم العميق (DL) لتوقع أمراض القلب مبكرًا. يؤكدون على فعالية خوارزميات ML المختلفة، بما في ذلك طرق التعلم الموجه وغير الموجه، في تحليل مجموعات البيانات الكبيرة لكشف الأنماط الخفية المرتبطة بأمراض القلب والأوعية الدموية (CVD). تسلط الورقة الضوء على الدور المحوري للتعلم العميق، وخاصة من خلال الهياكل مثل الشبكات العصبية التلافيفية (CNN) وشبكات الذاكرة طويلة وقصيرة المدى (LSTM)، في تعزيز دقة التنبؤ وتمكين دمج مصادر البيانات المتنوعة، مثل تخطيط القلب الكهربائي (ECG) والبيانات السكانية للمرضى.

كما يناقش المؤلفون النتائج الواعدة لتجاربهم، وخاصة مع نماذج XGBoost والجيران الأقرب (KNN)، التي حققت معدلات دقة عالية تبلغ 98.50% و91.80%، على التوالي، في توقع أمراض القلب عبر مجموعات بيانات مختلفة. يشيرون إلى أن هذه النماذج لا تتفوق فقط في دقة التصنيف ولكنها أيضًا تظهر precision عالية ودرجات F1 متوازنة، وهي حاسمة لتقليل الإيجابيات الكاذبة في الإعدادات السريرية. تختتم المناقشة بالاعتراف بالحاجة إلى مزيد من الاستكشاف لنماذج تعزيز التدرج في الكشف عن أمراض القلب، مقترحةً أن قدراتها الفريدة يمكن أن تعزز بشكل كبير الأداء التنبؤي وفي النهاية تحسن نتائج المرضى في الرعاية الصحية.

Journal: Diagnostics, Volume: 14, Issue: 2
DOI: https://doi.org/10.3390/diagnostics14020144
PMID: https://pubmed.ncbi.nlm.nih.gov/38248021
Publication Date: 2024-01-08
Author(s): Adedayo Ogunpola et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper addresses the pressing global health issue of cardiovascular diseases, focusing on the early detection of myocardial infarction through advanced machine learning techniques. Recognizing the limitations of existing predictive models, particularly regarding imbalanced datasets that can skew results, the study evaluates seven classifiers: K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression, Convolutional Neural Network (CNN), Gradient Boost, XGBoost, and Random Forest. The findings highlight the XGBoost model as particularly effective, achieving an accuracy of 98.50%, precision of 99.14%, recall of 98.29%, and an F1 score of 98.71%, thus demonstrating its potential for enhancing diagnostic accuracy in heart disease.

In conclusion, the study emphasizes the superior performance of KNN and XGBoost across different datasets, with XGBoost emerging as a transformative tool for heart disease prediction due to its high accuracy and precision. The implications of these findings are significant for clinical practice, as they not only improve diagnostic reliability but also pave the way for more targeted interventions. Future research directions include the integration of larger medical imaging datasets and the exploration of ensemble models to further enhance predictive capabilities. This work contributes to the ongoing evolution of heart disease diagnostics, advocating for tailored model selection based on specific clinical needs and data characteristics.

Introduction

The introduction highlights the critical role of the heart in sustaining life through the circulation of oxygenated blood and hormone regulation, emphasizing the prevalence and impact of cardiovascular diseases (CVD), which encompass a variety of disorders affecting the heart and blood vessels. Coronary heart disease (CHD) is identified as a significant subtype, accounting for 64% of CVD cases and posing a substantial mortality risk, with the World Health Organization reporting approximately 17.9 million deaths annually due to these diseases. Various risk factors contribute to CVD development, including hypertension, obesity, diabetes, smoking, and sedentary lifestyles, underscoring the urgent need for research and medical advancements in this area.

The paper further discusses the transformative potential of machine learning (ML) in healthcare, particularly in enhancing the prediction and detection of CVDs. Despite existing applications, challenges such as imbalanced datasets persist, necessitating improved predictive models. The authors investigate the efficacy of seven ML techniques—K-Nearest Neighbors, Support Vector Machine, Logistic Regression, Convolutional Neural Network, Gradient Boost, XGBoost, and Random Forest—using two pre-processed datasets. The study aims to refine existing models and enhance the accuracy and reliability of CVD predictions, ultimately contributing to better clinical interventions and patient care.

Methods

The research employs a systematic framework to investigate heart-related conditions, aiming to enhance diagnostic and predictive modeling. The methods utilized are tailored to explore multiple facets of these conditions, thereby facilitating a deeper understanding of their complexities. The overarching methodology is visually represented in Figure 1, which outlines the structured approach taken in this study. Through these methods, the research seeks to contribute significantly to the accuracy and efficacy of heart condition diagnosis and prediction.

Results

The results section of the study presents a comprehensive evaluation of various machine learning models for heart disease prediction, utilizing two datasets: the Cardiovascular Heart Disease Dataset and the Heart Disease Cleveland Dataset. Key findings indicate that the Hybrid Random Forest with Linear Model (HRFLM) achieved an accuracy of 88%, while the Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) models demonstrated accuracies of 83% and 96.50%, respectively. Notably, the Classification-CART model exhibited the highest accuracy at 99.14%, with perfect sensitivity and high specificity. The Long Short Term Memory (LSTM) model also performed well, achieving an accuracy of 98.4%.

In terms of preprocessing, the study emphasized the importance of managing missing data, encoding categorical variables, and employing exploratory data analysis (EDA) techniques, such as correlation heatmaps and histograms, to understand feature distributions and relationships. The KNN model, optimized with a ‘k’ value of 7, yielded impressive metrics, including a precision score of approximately 96.61% and a recall of 97.44%. The Random Forest model, after hyperparameter tuning, reached an accuracy of around 98.60%, while the Gradient Boosting model achieved an accuracy of 98.00% on the test dataset. Overall, the results underscore the effectiveness of these machine learning models in accurately predicting heart disease, with the Gradient Boosting and XGBoost models showing particularly strong performance metrics, including high precision and recall scores.

Discussion

In the discussion section of the research paper, the authors provide a comprehensive overview of the advancements in machine learning (ML) and deep learning (DL) techniques for the early prediction of heart disease. They emphasize the effectiveness of various ML algorithms, including supervised and unsupervised learning methods, in analyzing large datasets to uncover hidden patterns associated with cardiovascular diseases (CVD). The paper highlights the pivotal role of deep learning, particularly through architectures like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, in enhancing predictive accuracy and enabling the integration of diverse data sources, such as electrocardiograms (ECG) and patient demographics.

The authors also discuss the promising results of their experiments, particularly with the XGBoost and K-Nearest Neighbors (KNN) models, which achieved high accuracy rates of 98.50% and 91.80%, respectively, in predicting heart disease across different datasets. They note that these models not only excel in classification accuracy but also demonstrate high precision and balanced F1 scores, which are critical for minimizing false positives in clinical settings. The discussion concludes by acknowledging the need for further exploration of Gradient Boosting models in heart disease detection, suggesting that their unique capabilities could significantly enhance predictive performance and ultimately improve patient outcomes in healthcare.