توقع مرض الشريان التاجي: دراسة مقارنة لخوارزميات التعلم الآلي Coronary Heart Disease Prediction: A Comparative Study of Machine Learning Algorithms

المجلة: Journal of Advances in Information Technology، المجلد: 15، العدد: 1
DOI: https://doi.org/10.12720/jait.15.1.27-32
تاريخ النشر: 2024-01-01
المؤلف: Ahmad Hammoud وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تركز هذه الدراسة على تحسين طرق الكشف عن أمراض القلب من خلال تطوير نموذج تعلم آلي يهدف إلى تحسين دقة التشخيص وتقليل تكاليف الرعاية الصحية. تم تقييم سبعة خوارزميات تعلم آلي – الانحدار اللوجستي، مصنف الدعم الشعاعي، الجار الأقرب (KNN)، الغابة العشوائية، شجرة القرار، بايز الساذج، ومصنف تعزيز التدرج – باستخدام مجموعة بيانات تتكون من 12 سمة و1189 ملاحظة من ثلاثة عيادات طبية (كليفلاند، ستاتلوج، هنغاريا). خضعت النماذج لتحسين المعلمات باستخدام البحث الشبكي، البحث العشوائي، وبحث بايز، مع مقاييس الأداء التي تشمل الخصوصية، الحساسية، ودرجات F1. قللت تقنيات اختيار الميزات من أبعاد مجموعة البيانات إلى سبع ميزات رئيسية، وبرز نموذج الغابة العشوائية كالأكثر فعالية، محققًا دقة تبلغ 94.96% بعد الضبط، متجاوزًا بشكل كبير الدقة المبلغ عنها سابقًا والتي كانت 86.9%.

تشير النتائج إلى أن نموذج الغابة العشوائية لا يوفر فقط أداة تصنيف موثوقة لمرضى مرض الشريان التاجي (CHD) ولكنه يعمل أيضًا كنظام دعم قرار محتمل للمهنيين في الرعاية الصحية. تسلط الدراسة الضوء على العلاقة الضعيفة المتوقعة بين مستويات الكوليسترول وCHD، مما يشير إلى أن الأبحاث المستقبلية يمكن أن تركز على تطوير خوارزميات أكثر تعقيدًا تأخذ في الاعتبار مجموعة أوسع من عوامل الخطر وتعزز من قابلية تفسير النموذج. يهدف هذا النهج إلى تحسين دقة التشخيص وتبسيط عملية الرعاية الصحية، مما يقلل في النهاية من التكاليف والوقت المرتبطين بتشخيص أمراض القلب.

مقدمة

تشكل الأمراض القلبية الوعائية (CVD) جزءًا كبيرًا من الوفيات العالمية، حيث تُعزى ثلث الوفيات السنوية إلى هذه الحالات، بما في ذلك 7.5 مليون حالة وفاة بسبب أمراض الشريان التاجي (CHD). ومن الجدير بالذكر أن حوالي 1.8 مليون من هذه الوفيات تحدث بشكل مفاجئ وترتبط بمتلازمة الشريان التاجي الحادة (ACS). يعد التشخيص الخاطئ قضية حاسمة، تؤثر على ما لا يقل عن 16.1% من مرضى فشل القلب. تظهر طرق التشخيص الحالية، مثل مسح الكالسيوم التاجي وتصوير الأوعية التاجية بالأشعة المقطعية (CTCA)، حساسية عالية (89%) وخصوصية (96%) لكنها لا تزال تواجه تحديات في الكشف عن عدم انتظامات قلبية دقيقة.

لمعالجة هذه القيود، تقترح هذه الدراسة نموذج تعلم آلي متقدم يهدف إلى تعزيز دقة وسرعة الكشف عن CHD. من خلال استخدام سبعة خوارزميات تعلم آلي مختلفة وتحسين أدائها، تسعى الأبحاث إلى تقليل معدلات التشخيص الخاطئ وتحسين العملية التشخيصية العامة في البيئات السريرية. إن دمج التعلم الآلي في الرعاية الصحية لديه القدرة على تحسين القدرات التشخيصية بشكل كبير، خاصة في تحديد الأنماط المعقدة وغير الخطية المرتبطة بالحالات القلبية.

طرق

توضح قسم “المواد والطرق” تصميم التجربة والإجراءات المستخدمة في الدراسة. يتناول اختيار المواد، بما في ذلك الكواشف والمعدات المحددة المستخدمة لجمع البيانات وتحليلها. تم هيكلة المنهجية لضمان إمكانية إعادة الإنتاج، مع أوصاف واضحة للبروتوكولات التجريبية، وتحضير العينات، وتقنيات القياس.

بالإضافة إلى ذلك، يتم تحديد الطرق الإحصائية لتحليل البيانات، بما في ذلك أي برامج تم استخدامها للحسابات. يبرز القسم أهمية الضوابط والتكرارات في التجارب للتحقق من النتائج. بشكل عام، يعمل هذا القسم كدليل شامل لتكرار الدراسة وفهم العمليات الأساسية التي أدت إلى النتائج المبلغ عنها.

مناقشة

في هذا البحث، تم استخدام خوارزميات تعلم آلي (ML) مختلفة لتصنيف مرضى مرض الشريان التاجي (CHD) باستخدام مجموعة بيانات مجمعة من هنغاريا وكليفلاند وستاتلوج، تضم 1,190 مريضًا مع 14 سمة تنبؤية. سلطت الدراسة الضوء على فعالية مصنفات مختلفة، بما في ذلك الانحدار اللوجستي، مصنف الدعم الشعاعي، الجار الأقرب، الغابة العشوائية، شجرة القرار، بايز الساذج، ومصنف تعزيز التدرج. ومن الجدير بالذكر أن نموذج الغابة العشوائية حقق أعلى دقة بلغت 94.96% بعد ضبط المعلمات، محسنًا بشكل كبير من 89.50% قبل الضبط. كما كشفت التحليلات عن علاقات ضعيفة بين مستويات الكوليسترول وCHD، مما أدى إلى استبعاد الكوليسترول من مجموعة الميزات، مما سهل النموذج دون المساس بالأداء.

تناولت الأبحاث عدم توازن البيانات واختيار الميزات بدقة، مستخدمة تقنيات مثل SMOTE وطرق التطبيع المختلفة لتعزيز موثوقية النموذج. تؤكد النتائج على إمكانية أنظمة دعم القرار المعتمدة على التعلم الآلي في الرعاية الصحية لتسريع تشخيص CHD، وتقليل تكاليف الرعاية الصحية، وتحسين نتائج المرضى. يُقترح أن تركز الأعمال المستقبلية على تطوير خوارزميات أكثر تقدمًا يمكن أن تشمل مجموعة أوسع من عوامل الخطر وتعزز من قابلية تفسير النموذج، مما يسهل اتخاذ قرارات سريرية أفضل.

Journal: Journal of Advances in Information Technology, Volume: 15, Issue: 1
DOI: https://doi.org/10.12720/jait.15.1.27-32
Publication Date: 2024-01-01
Author(s): Ahmad Hammoud et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

This study focuses on enhancing heart disease detection methods through the development of a machine learning model aimed at improving diagnostic accuracy and reducing healthcare costs. Seven machine learning algorithms—Logistic Regression, Support Vector Classifier, K-Nearest Neighbor (KNN), Random Forest, Decision Tree, Naïve Bayes, and Gradient Boosting Classifier—were evaluated using a dataset comprising 12 attributes and 1189 observations from three medical clinics (Cleveland, Statlog, Hungary). The models underwent hyperparameter optimization using Grid Search, Random Search, and Bayes Search, with performance metrics including specificity, sensitivity, and F1-scores. Feature selection techniques reduced the dataset’s dimensionality to seven key features, and the Random Forest model emerged as the most effective, achieving an accuracy of 94.96% post-tuning, significantly surpassing the previously reported accuracy of 86.9%.

The findings indicate that the Random Forest model not only provides a reliable classification tool for coronary heart disease (CHD) patients but also serves as a potential decision support system for healthcare professionals. The study highlights the anticipated weak correlation between cholesterol levels and CHD, suggesting that future research could focus on developing more sophisticated algorithms that account for a broader range of risk factors and enhance model interpretability. This approach aims to improve diagnostic accuracy and streamline the healthcare process, ultimately reducing costs and time associated with heart disease diagnosis.

Introduction

Cardiovascular Diseases (CVD) account for a significant portion of global mortality, with one-third of annual deaths attributed to these conditions, including 7.5 million fatalities from Coronary Heart Diseases (CHD). Notably, around 1.8 million of these deaths are sudden and associated with Acute Coronary Syndrome (ACS). Misdiagnosis is a critical issue, affecting at least 16.1% of heart failure patients. Current diagnostic methods, such as coronary calcium scans and CT Coronary Angiography (CTCA), demonstrate high sensitivity (89%) and specificity (96%) but still face challenges in detecting subtle cardiac irregularities.

To address these limitations, this study proposes an advanced machine learning (ML) model aimed at enhancing the accuracy and timeliness of CHD detection. By employing seven different ML algorithms and optimizing their performance, the research seeks to reduce misdiagnosis rates and improve the overall diagnostic process in clinical settings. The integration of ML into healthcare has the potential to significantly refine diagnostic capabilities, particularly in identifying complex and non-linear patterns associated with cardiac conditions.

Methods

The “Materials and Methods” section outlines the experimental design and procedures employed in the study. It details the selection of materials, including specific reagents and equipment used for data collection and analysis. The methodology is structured to ensure reproducibility, with clear descriptions of experimental protocols, sample preparation, and measurement techniques.

Additionally, statistical methods for data analysis are specified, including any software utilized for computations. The section emphasizes the importance of controls and replicates in the experiments to validate the findings. Overall, this section serves as a comprehensive guide for replicating the study and understanding the underlying processes that led to the reported results.

Discussion

In this research, various machine learning (ML) algorithms were employed to classify coronary heart disease (CHD) patients using a combined dataset from Hungary, Cleveland, and Statlog, comprising 1,190 patients with 14 predictive attributes. The study highlighted the effectiveness of different classifiers, including Logistic Regression, Support Vector Classifier, K-Nearest Neighbor, Random Forest, Decision Tree, Naïve Bayes, and Gradient Boosting Classifier. Notably, the Random Forest model achieved the highest accuracy of 94.96% after hyperparameter tuning, significantly improving from 89.50% pre-tuning. The analysis also revealed weak correlations between cholesterol levels and CHD, leading to the exclusion of cholesterol from the feature set, which streamlined the model without compromising performance.

The research addressed data imbalance and feature selection rigorously, utilizing techniques such as SMOTE and various normalization methods to enhance model reliability. The findings underscore the potential of ML-based decision support systems in healthcare to expedite CHD diagnosis, reduce healthcare costs, and improve patient outcomes. Future work is suggested to focus on developing more advanced algorithms that can incorporate a broader range of risk factors and enhance model interpretability, thereby facilitating better clinical decision-making.