تحسين اختيار الميزات والتعلم الجماعي لتنبؤ أمراض القلب والأوعية الدموية: دمج هجين GOL2-2T وقرار معزز متكيف مع تحسين الرعاية Enhanced feature selection and ensemble learning for cardiovascular disease prediction: hybrid GOL2-2 T and adaptive boosted decision fusion with babysitting refinement

المجلة: Frontiers in Medicine، المجلد: 11
DOI: https://doi.org/10.3389/fmed.2024.1407376
PMID: https://pubmed.ncbi.nlm.nih.gov/39071085
تاريخ النشر: 2024-07-05
المؤلف: S. Phani Praveen وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول ورقة البحث الحاجة الملحة لتحسين طرق التشخيص في الكشف المبكر والتنبؤ بأمراض القلب والأوعية الدموية (CVD)، وهي سبب رئيسي للوفيات. يقترح المؤلفون إطار عمل جديد للتعلم الآلي يدمج تقنيات متعددة: الإكمال المتعدد بواسطة المعادلات المتسلسلة (MICE) لمعالجة البيانات المفقودة، ونطاق الربع (IQR) لإدارة القيم الشاذة، وتقنية زيادة العينة للأقليات الاصطناعية (SMOTE) لتصحيح عدم التوازن في الفئات. بالإضافة إلى ذلك، يقدمون تحسين الجراد الهجين ذو المستويين مع تنظيم L2 (GOL2-2T) لاختيار الميزات الأمثل ويستخدمون خوارزمية تعلم جماعي مدعومة بالتكيف (ABDF) مع ضبط المعلمات لتحسين دقة التنبؤ.

تشير النتائج إلى أن نموذج التنبؤ بأمراض القلب المقترح يحقق دقة تبلغ 83.0% ودرجة F1 متوازنة تبلغ 84.0%. إن دمج المنهجيات المذكورة أعلاه يحسن بشكل كبير من قوة النموذج وأدائه التنبؤي، مما يظهر فعالية التعلم الآلي في التشخيص الطبي. تؤكد النتائج على إمكانية تقنيات التعلم الآلي المتقدمة في توفير أدوات موثوقة للتشخيص المبكر وعلاج أمراض القلب والأوعية الدموية، مع آثار على الأبحاث المستقبلية التي تهدف إلى تحسين نماذج التنبؤ واستكشاف عوامل إضافية لتعزيز دقة التشخيص.

مقدمة

تسلط المقدمة الضوء على القضية الصحية العالمية الحرجة المتعلقة بأمراض القلب، التي تؤثر بشكل كبير على المجتمعات، مع إحصائية مذهلة تشير إلى أن شخصًا يموت بسبب مشاكل القلب والأوعية الدموية كل 37 ثانية في أمريكا (جمعية القلب الأمريكية، 2022). يتم التأكيد على تعقيد أمراض القلب، بما في ذلك مرض الشريان التاجي، وفشل القلب، واضطرابات النظم، والتشوهات الخلقية، مما يبرز علم أسبابها المتعددة الذي يشمل عوامل وراثية وسلوكية وكيميائية حيوية. تمتد السرد إلى ما هو أبعد من الآثار السريرية، حيث تصور أمراض القلب كقصص شخصية من الشجاعة والأمل تؤثر بشكل عميق على الأفراد والعائلات.

يناقش النص أيضًا العوامل الرئيسية المسببة لأمراض القلب والأوعية الدموية، مثل النظام الغذائي السيئ، وقلة النشاط البدني، واستخدام التبغ، وإساءة استخدام الكحول، والسمنة، ويؤكد على أهمية استراتيجيات الوقاية مثل تناول الطعام الصحي، وممارسة الرياضة، والإقلاع عن التدخين. يقدم Adaptive Enhanced Decision Fusion (ABDF) كنهج جديد لتنبؤ الأمراض وإدارتها في صحة القلب والأوعية الدموية، مما يعزز خيارات الكشف المبكر والعلاج. علاوة على ذلك، تكشف البيانات من الهند في عام 2020 عن تفاوتات عمرية في انتشار أمراض القلب والأوعية الدموية، حيث تظهر الفئات العمرية الأكبر معدلات أعلى. وهذا يبرز الحاجة إلى سياسات مستهدفة ومبادرات تعليمية لمعالجة الزيادة في حدوث أمراض القلب والأوعية الدموية، لا سيما بين الفئات العمرية المتقدمة.

طرق

تشمل المنهجية المقترحة لاختيار الميزات الهجينة ذات المستويين GOL2-2 T نهجًا منظمًا لمعالجة البيانات، حيث يتم تخصيص 70% من مجموعة البيانات للتدريب و30% للاختبار. تتضمن المنهجية استخدام الإكمال المتعدد بواسطة المعادلات المتسلسلة (MICE) للتعامل مع البيانات المفقودة، مما يضمن الاحتفاظ بمعلومات شاملة عبر المتغيرات. يتم استخدام تقنيات مثل الإكمال، وتقييس البيانات، وترميز التسميات، جنبًا إلى جنب مع نطاق الربع (IQR) لاكتشاف القيم الشاذة، مما يعزز مرونة النموذج ضد النقاط البيانية غير الطبيعية. تستخدم الطريقة أيضًا تقنية زيادة العينة للأقليات الاصطناعية (SMOTE) لمعالجة عدم التوازن في الفئات من خلال توليد حالات أقلية اصطناعية، مما يقلل من التحيزات المرتبطة بالتمثيل المفرط للفئة السائدة.

عند تقييم الطريقة المقترحة مقابل تقنيات أخرى على مجموعة بيانات أمراض القلب، حقق نموذج GOL2-2 T الهجين دقة تبلغ 83.0%، متجاوزًا طرق شجرة التصنيف والشبكة العصبية الاصطناعية (ANN) في أداء التصنيف. بينما حققت تقنية Naive Bayes (NB) دقة أعلى قليلاً تبلغ 81.25%، إلا أنها أظهرت دقة أقل، واسترجاع، ودرجة F1، مما يشير إلى معدل أعلى من الإيجابيات الكاذبة. تؤكد النتائج على التوازن الفعال للطريقة المقترحة بين الدقة والدقة-الاسترجاع، مما يجعلها مناسبة بشكل خاص لتصنيف أمراض القلب. تسلط النتائج الضوء على أهمية اختيار التقنيات المناسبة لتحسين مؤشرات الأداء، مما يظهر تفوق النهج المقترح على الطرق الحالية.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على الأهمية المتزايدة لتقنيات التعلم الآلي والتنقيب عن البيانات في التنبؤ بأمراض القلب، وهي سبب رئيسي للوفيات العالمية. تؤكد العديد من الدراسات، بما في ذلك تلك التي أجراها شاه وآخرون (2020) وكاتاريا وآخرون (2020)، على ضرورة الكشف الدقيق وفي الوقت المناسب عن أمراض القلب، داعية إلى استخدام خوارزميات متنوعة مثل الجيران الأقرب (KNN)، وآلات الدعم الناقل (SVM)، والغابات العشوائية. برز KNN كالأكثر دقة في دراسة شاه وآخرون، بينما أكد كاتاريا وآخرون على دور التعلم الموجه في اتخاذ القرارات الصحية. تشير النتائج مجتمعة إلى أنه بينما تظهر نماذج التعلم الآلي وعدًا في تعزيز دقة التنبؤ، لا تزال التحديات مثل قيود مجموعة البيانات والحاجة إلى تقنيات اختيار ميزات متقدمة قائمة.

علاوة على ذلك، تناقش الورقة التقدمات الأخيرة، بما في ذلك ما قام به بهات وآخرون (2023)، الذين طوروا نموذجًا لتنبؤ أمراض القلب والأوعية الدموية باستخدام مجموعة بيانات كبيرة، محققين معدلات دقة عالية مع عدة خوارزميات. وجد عبود كاظم وآخرون (2023) أن آلات الدعم الناقل قدمت أعلى دقة تشخيصية. تعزز هذه الدراسات إمكانية الذكاء الاصطناعي في تحسين استراتيجيات الكشف المبكر والتدخل في رعاية القلب. ومع ذلك، يعترف المؤلفون بالقيود مثل الاعتماد على مجموعات بيانات محددة والحاجة إلى مزيد من البحث لتحسين الخوارزميات وزيادة قابلية التفسير. تنبع دوافع هذا البحث من الحاجة الملحة لمعالجة الزيادة في حدوث أمراض القلب والأوعية الدموية من خلال تحسين نماذج التنبؤ ونتائج الرعاية الصحية.

Journal: Frontiers in Medicine, Volume: 11
DOI: https://doi.org/10.3389/fmed.2024.1407376
PMID: https://pubmed.ncbi.nlm.nih.gov/39071085
Publication Date: 2024-07-05
Author(s): S. Phani Praveen et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper addresses the pressing need for enhanced diagnostic methods in the early detection and prediction of cardiovascular disease (CVD), a leading cause of mortality. The authors propose a novel machine learning framework that integrates multiple techniques: Multiple Imputation by Chained Equations (MICE) for addressing missing data, Interquartile Range (IQR) for outlier management, and Synthetic Minority Over-sampling Technique (SMOTE) to rectify class imbalance. Additionally, they introduce a Hybrid 2-Tier Grasshopper Optimization with L2 regularization (GOL2-2T) for optimal feature selection and employ an Adaptive Boosted Decision Fusion (ABDF) ensemble learning algorithm with hyperparameter tuning to enhance predictive accuracy.

The results indicate that the proposed heart disease prediction model achieves an accuracy of 83.0% and a balanced F1 score of 84.0%. The integration of the aforementioned methodologies significantly improves the model’s robustness and predictive performance, demonstrating the effectiveness of machine learning in medical diagnostics. The findings underscore the potential of advanced machine learning techniques to provide reliable tools for early diagnosis and treatment of CVD, with implications for future research aimed at refining prediction models and exploring additional factors to enhance diagnostic accuracy.

Introduction

The introduction highlights the critical global health issue of heart disease, which significantly impacts communities, with a staggering statistic indicating that a person dies from cardiovascular issues every 37 seconds in America (American Heart Association, 2022). The complexity of heart diseases, including coronary artery disease, heart failure, arrhythmias, and congenital malformations, is underscored, emphasizing their multifactorial etiology involving genetic, behavioral, and biochemical factors. The narrative extends beyond clinical implications, portraying heart diseases as personal stories of courage and hope that affect individuals and families profoundly.

The text also discusses major risk factors for cardiovascular disease, such as poor diet, physical inactivity, tobacco use, alcohol abuse, and obesity, and stresses the importance of prevention strategies like healthy eating, exercise, and smoking cessation. It introduces Adaptive Enhanced Decision Fusion (ABDF) as a novel approach for disease prediction and management in cardiovascular health, enhancing early detection and treatment options. Furthermore, data from India in 2020 reveals age-related disparities in cardiovascular disease prevalence, with older populations exhibiting higher rates. This underscores the need for targeted policies and educational initiatives to address the rising incidence of cardiovascular diseases, particularly among the aging demographic.

Methods

The proposed methodology for the two-tier Feature Selection Hybrid GOL2-2 T involves a structured approach to data pre-processing, where 70% of the dataset is allocated for training and 30% for testing. The methodology incorporates the Multivariate Imputation by Chained Equations (MICE) for handling missing data, ensuring comprehensive information retention across variables. Techniques such as imputation, data scaling, and label encoding are employed, alongside the Inter Quartile Range (IQR) for outlier detection, enhancing model resilience against abnormal data points. The method also utilizes the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance by generating synthetic minority instances, thereby reducing biases associated with the over-representation of the dominant class.

In evaluating the proposed method against other techniques on a heart disease dataset, the Hybrid GOL2-2 T model achieved an accuracy of 83.0%, surpassing the Classification Tree and Artificial Neural Network (ANN) methods in classification performance. While the Naive Bayes (NB) technique achieved a slightly higher accuracy of 81.25%, it exhibited inferior precision, recall, and F1-score, indicating a higher rate of false positives. The findings underscore the proposed method’s effective balance between accuracy and precision-recall, making it particularly suitable for heart disease classification. The results highlight the importance of selecting appropriate techniques to optimize performance indicators, demonstrating the superiority of the proposed approach over existing methods.

Discussion

The discussion section of the research paper highlights the increasing importance of machine learning and data mining techniques in predicting heart disease, a leading cause of global mortality. Several studies, including those by Shah et al. (2020) and Katarya et al. (2020), emphasize the necessity for accurate and timely heart disease detection, advocating for the use of various algorithms such as K-nearest neighbors (KNN), support vector machines (SVM), and random forests. KNN emerged as the most accurate in Shah et al.’s study, while Katarya et al. underscored the role of supervised learning in healthcare decision-making. The findings collectively indicate that while machine learning models show promise in enhancing predictive accuracy, challenges such as dataset limitations and the need for advanced feature selection techniques persist.

Further, the paper discusses recent advancements, including Bhatt et al. (2023), who developed a cardiovascular disease prediction model using a large dataset, achieving high accuracy rates with multiple algorithms. Abood Kadhim et al. (2023) found that support vector machines provided the highest diagnostic accuracy. These studies reinforce the potential of artificial intelligence in improving early detection and intervention strategies in cardiac care. However, the authors acknowledge limitations such as reliance on specific datasets and the necessity for further research to refine algorithms and enhance interpretability. The motivation for this research stems from the urgent need to address the rising incidence of cardiovascular diseases through improved predictive modeling and healthcare outcomes.