نموذج متقدم للتعلم الآلي الهجين للكشف الدقيق عن أمراض القلب والأوعية الدموية Advanced Hybrid Machine Learning Model for Accurate Detection of Cardiovascular Disease

المجلة: International Journal of Computational Intelligence Systems، المجلد: 18، العدد: 1
DOI: https://doi.org/10.1007/s44196-025-00771-1
تاريخ النشر: 2025-03-06
المؤلف: Navita Navita وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول البحث القضية الحرجة لمرض القلب والأوعية الدموية (CVD)، وهو أحد الأسباب الرئيسية للوفيات العالمية، من خلال اقتراح نموذج كشف هجين يستفيد من تقنيات التعلم الآلي (ML) والتعلم العميق المتقدمة. يتكون النموذج من أربع مراحل: أولاً، يتناول عدم توازن البيانات باستخدام تقنية العينة الاصطناعية للأقليات – قاعدة الجيران الأقرب المعدلة (SMOTE-ENN)؛ ثانياً، يستخدم طريقة كاي-تربيع لاختيار الميزات من مجموعة بيانات تحتوي على 1190 سجلاً مع 11 ميزة سريرية، مستمدة من خمس مجموعات بيانات بارزة؛ ثالثاً، يستخدم نموذج تجميع تراكمي يدمج ثلاثة متعلمين أساسيين – شجرة الغابة العشوائية (RFT)، الجار الأقرب (K-NN)، ومصنف AdaBoost – جنباً إلى جنب مع متعلم ميتا، الانحدار اللوجستي (LR)، المحسن من خلال البحث الشبكي عبر التحقق المتقاطع (GSCV)؛ وأخيراً، يقيم الأداء باستخدام مقاييس مثل الدقة، الحساسية، الخصوصية، درجة F1، ودرجة ROC-AUC.

تشير النتائج إلى أن النموذج الهجين المقترح يحقق دقة ملحوظة تبلغ 97.8%، مع حساسية تبلغ 96.15%، وخصوصية تبلغ 96.75%، ودرجة ROC-AUC تبلغ 98.6%، متفوقاً على التقنيات الحالية. ومن الجدير بالذكر أن دقة النموذج تحسنت من 94.74% إلى 97.8% بعد تطبيق SMOTE-ENN وكاي-تربيع لاختيار الميزات. يبرز هذه الدراسة فعالية النموذج الهجين في تحديد مرض القلب والأوعية الدموية بدقة ويؤكد التأثير الكبير لتقنية SMOTE-ENN على تحسين أداء النموذج. قد تستكشف الأبحاث المستقبلية مزيدًا من التقدم في اختيار الميزات، وطرق العينة، وتطبيق تقنيات التعلم العميق للكشف عن الأمراض بشكل أوسع.

مقدمة

تسلط مقدمة ورقة البحث الضوء على القضية الحرجة للصحة العالمية المتعلقة بمرض القلب والأوعية الدموية (CVD)، الذي يمثل حوالي 17.9 مليون حالة وفاة سنويًا، كما أفادت منظمة الصحة العالمية (WHO). يتم التعرف على تصلب الشرايين، الذي يتميز بتراكم اللويحات في الشرايين، كأحد المساهمين الرئيسيين في مرض القلب والأوعية الدموية، مما يؤدي إلى تقييد تدفق الدم ومضاعفات صحية خطيرة. تشمل الأعراض الشائعة الذبحة الصدرية، وألم الصدر، والإرهاق، لكن العديد من الأفراد قد يبقون بدون أعراض حتى تحدث أحداث خطيرة مثل النوبات القلبية. تؤكد الورقة على أهمية تحديد وإدارة عوامل الخطر مثل ارتفاع ضغط الدم، والسكري، والخيارات الحياتية لمنع مرض القلب والأوعية الدموية.

لمعالجة التحديات التي تطرحها تعقيدات عوامل الخطر وعدم كفاءة طرق المراقبة التقليدية، يقترح المؤلفون نموذجًا قائمًا على التعلم الآلي (ML) للكشف المبكر عن مرض القلب والأوعية الدموية. يشيرون إلى إمكانيات الأجهزة القابلة للارتداء الذكية وإنترنت الأشياء (IoT) في تسهيل مراقبة الصحة عن بُعد، مما يولد بيانات كبيرة تتطلب تحليلًا فعالًا. يتضمن النموذج المقترح نهجًا هجينًا لمعالجة البيانات، باستخدام SMOTE-ENN لتحقيق توازن البيانات، يليها اختيار الميزات باستخدام طريقة كاي-تربيع. يستخدم النموذج النهائي تقنية التجميع مع مصنفات متعددة لتعزيز الأداء التنبؤي. توضح الورقة هيكلها، موضحة الأعمال ذات الصلة، النموذج المقترح، تحليل مجموعة البيانات، والنتائج التجريبية، بهدف تحسين دقة وموثوقية الكشف عن مرض القلب والأوعية الدموية.

النتائج

في هذه الدراسة، تم تطوير نموذج هجين لتشخيص أمراض القلب والأوعية الدموية (CVD)، باستخدام نهج توازن الفئات SMOTE-ENN، وكاي-تربيع لاختيار الميزات، وتجميع تراكمي من ثلاثة متعلمين أساسيين: أشجار الغابة العشوائية (RFT)، الجيران الأقرب (K-NN)، وAdaBoost، مع الانحدار اللوجستي (LR) كمتعلم ميتا. تم اختبار النموذج على خمس مجموعات بيانات شائعة، مما كشف عن تحسينات كبيرة في توازن الفئات وأهمية الميزات. بعد تطبيق SMOTE-ENN، تم تحقيق توازن في مجموعة البيانات، وقام كاي-تربيع بتصنيف الميزات، مما أدى إلى استبعاد الميزات ذات التصنيف المنخفض وتقليل مجموعة البيانات من 11 إلى 5 ميزات، مما عزز الكفاءة الحسابية.

تم تقييم أداء النموذج باستخدام مقاييس مثل الدقة، الحساسية، الخصوصية، ودرجة F1. حقق النموذج الهجين المقترح دقة مثيرة للإعجاب تبلغ 97.8% ودرجة ROC-AUC تبلغ 98.6%، متفوقًا على تقنيات التعلم الآلي الفردية ومظهرًا قدرات تصنيف متفوقة. تشير النتائج إلى أن النموذج الهجين يعالج بفعالية تحديات مجموعات البيانات غير المتوازنة واختيار الميزات، مما يجعله حلاً جديدًا وفعالًا للكشف عن مرض القلب والأوعية الدموية مقارنة بالمنهجيات الحالية التي اعتمدت بشكل أساسي على تقنيات التعلم الآلي التقليدية. تؤكد النتائج على إمكانيات نهج SMOTE-ENN الهجين واختيار الميزات في تعزيز أداء النموذج في التشخيص الطبي.

المناقشة

في قسم المناقشة، تستعرض الورقة تقنيات التعلم الآلي (ML) المختلفة المستخدمة في الكشف عن أمراض القلب والأوعية الدموية (CVD) وتبرز قيود النماذج الحالية. يتم الاستشهاد بعدة دراسات، تعرض نهجًا متنوعًا مثل أشجار القرار المعززة بالتدرج (GBDT) مع تفسيرات شابلي الإضافية (SHAP) لاختيار الميزات، ونماذج التعلم التجميعي، وتقنيات هجينة تدمج مصنفات متعددة. من الجدير بالذكر أن مصنفات التجميع أظهرت دقة متفوقة، حيث حققت بعض النماذج درجات ROC_AUC تصل إلى 100%. ومع ذلك، عانت العديد من النهج من مشكلات تتعلق بعدم توازن البيانات، مما قد يؤدي إلى توقعات متحيزة.

يقدم العمل المقترح نموذجًا هجينًا يعالج هذه القيود من خلال عملية من أربع مراحل: حل عدم توازن البيانات باستخدام تقنية SMOTE-ENN، وتطبيق كاي-تربيع لاختيار الميزات، واستخدام نموذج تجميع يتكون من شجرة الغابة العشوائية (RFT)، والجار الأقرب (K-NN)، ومصنفات AdaBoost، وتقييم الأداء من خلال مقاييس متنوعة. حقق النموذج دقة مثيرة للإعجاب تبلغ 97.8%، مما يحسن بشكل كبير من النتائج السابقة (94.74%) دون تطبيق SMOTE-ENN وكاي-تربيع. تؤكد النتائج على أهمية معالجة البيانات الفعالة واختيار الميزات في تعزيز أداء النموذج، بينما تقترح أيضًا مجالات للبحث المستقبلي، بما في ذلك تطبيق النموذج على مجموعات بيانات أكبر وأمراض أخرى.

Journal: International Journal of Computational Intelligence Systems, Volume: 18, Issue: 1
DOI: https://doi.org/10.1007/s44196-025-00771-1
Publication Date: 2025-03-06
Author(s): Navita Navita et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research addresses the critical issue of cardiovascular disease (CVD), a leading cause of global mortality, by proposing a hybrid detection model that leverages advanced machine learning (ML) and deep learning techniques. The model is structured in four stages: first, it addresses data imbalance using the Synthetic Minority Oversampling Technique-Edited Nearest Neighbors Rule (SMOTE-ENN); second, it employs the Chi-square method for feature selection from a dataset of 1190 records with 11 clinical features, derived from five prominent datasets; third, it utilizes a stacking ensemble model that integrates three base learners—Random Forest Tree (RFT), K-Nearest Neighbor (K-NN), and AdaBoost classifier—alongside a meta-learner, Logistic Regression (LR), optimized through Grid Search Cross-Validation (GSCV); finally, it evaluates performance using metrics such as accuracy, sensitivity, specificity, F1 score, and ROC-AUC score.

The results indicate that the proposed hybrid model achieves a remarkable accuracy of 97.8%, with sensitivity at 96.15%, specificity at 96.75%, and an ROC-AUC score of 98.6%, outperforming existing techniques. Notably, the model’s accuracy improved from 94.74% to 97.8% following the application of SMOTE-ENN and Chi-square for feature selection. This study highlights the effectiveness of the hybrid model in accurately identifying CVD and underscores the significant impact of the SMOTE-ENN technique on enhancing model performance. Future research may explore further advancements in feature selection, sampling methods, and the application of deep learning techniques for broader disease detection.

Introduction

The introduction of the research paper highlights the critical global health issue of cardiovascular disease (CVD), which accounts for approximately 17.9 million deaths annually, as reported by the World Health Organization (WHO). Atherosclerosis, characterized by plaque buildup in arteries, is identified as a primary contributor to CVD, leading to restricted blood flow and serious health complications. Common symptoms include angina, chest pain, and fatigue, but many individuals may remain asymptomatic until severe events like heart attacks occur. The paper emphasizes the importance of identifying and managing risk factors such as high blood pressure, diabetes, and lifestyle choices to prevent CVD.

To address the challenges posed by the complexity of risk factors and the inefficiencies of traditional monitoring methods, the authors propose a machine learning (ML)-based model for the early detection of CVD. They note the potential of smart wearable devices and the Internet of Things (IoT) in facilitating remote health monitoring, which generates substantial data that requires effective analysis. The proposed model incorporates a hybrid approach for data preprocessing, utilizing SMOTE-ENN for data balancing, followed by feature selection using the Chi-square method. The final model employs a stacking technique with multiple classifiers to enhance predictive performance. The paper outlines its structure, detailing related work, the proposed model, dataset analysis, and experimental results, ultimately aiming to improve the accuracy and reliability of CVD detection.

Results

In this study, a hybrid model for diagnosing cardiovascular diseases (CVD) was developed, utilizing the SMOTE-ENN class balancing approach, Chi-square for feature selection, and a stacking ensemble of three base learners: Random Forest Trees (RFT), K-Nearest Neighbors (K-NN), and AdaBoost, with Logistic Regression (LR) as the meta-learner. The model was tested on five popular datasets, revealing significant improvements in class balance and feature importance. After applying SMOTE-ENN, the dataset was balanced, and Chi-square ranked the features, leading to the elimination of low-ranked features and reducing the dataset from 11 to 5 features, which enhanced computational efficiency.

The model’s performance was evaluated using metrics such as accuracy, sensitivity, specificity, and F1 score. The proposed hybrid model achieved an impressive accuracy of 97.8% and a ROC-AUC score of 98.6%, outperforming individual machine learning techniques and demonstrating superior classification capabilities. The results indicate that the hybrid model effectively addresses the challenges of imbalanced datasets and feature selection, making it a novel and efficient solution for CVD detection compared to existing methodologies that primarily relied on traditional machine learning techniques. The findings underscore the potential of the hybrid SMOTE-ENN approach and feature selection in enhancing model performance in medical diagnostics.

Discussion

In the discussion section, the paper reviews various machine learning (ML) techniques employed in the detection of cardiovascular diseases (CVD) and highlights the limitations of existing models. Several studies are cited, showcasing diverse approaches such as Gradient Boosting Decision Trees (GBDT) with Shapley additive explanations (SHAP) for feature selection, ensemble learning models, and hybrid techniques that integrate multiple classifiers. Notably, stacking classifiers demonstrated superior accuracy, with some models achieving ROC_AUC scores of up to 100%. However, many approaches suffered from issues related to data imbalance, which can lead to biased predictions.

The proposed work introduces a hybrid model that addresses these limitations through a four-stage process: resolving data imbalance using the SMOTE-ENN technique, applying Chi-square for feature selection, utilizing a stacking ensemble model comprising Random Forest Tree (RFT), K-Nearest Neighbor (K-NN), and AdaBoost classifiers, and evaluating performance through various metrics. The model achieved an impressive accuracy of 97.8%, significantly improving upon previous results (94.74%) without the application of SMOTE-ENN and Chi-square. The findings underscore the importance of effective data preprocessing and feature selection in enhancing model performance, while also suggesting avenues for future research, including the application of the model to larger datasets and other diseases.