إطار عمل غابة عشوائية معززة بواسطة SGO وتعزيز التدرج الشديد لتوقع أمراض القلب SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-02525-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40414947
تاريخ النشر: 2025-05-25
المؤلف: Anima Naik وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول ورقة البحث القضية الصحية العالمية الملحة المتعلقة بأمراض القلب والأوعية الدموية (CVD)، التي تمثل حوالي 31.5% من الوفيات العالمية، مع توقعات تشير إلى زيادة تصل إلى 24.2 مليون وفاة سنويًا بحلول عام 2030. تقدم الدراسة نموذجًا لتوقع أمراض القلب (HDP) باستخدام مصنفات الغابة العشوائية (RF) وتعزيز التدرج المتطرف (XGB)، المحسّنة من خلال ضبط المعلمات الفائقة عبر خوارزمية تحسين المجموعة الاجتماعية (SGO). أظهرت التحقق على مجموعات بيانات كليفلاند وستاتلوج تحسينات كبيرة في أداء النموذج بعد التحسين، حيث حقق RF دقة قدرها 95.08% وROC-AUC قدره 95.26% على كليفلاند، بينما حقق XGB دقة 97.62% مع ROC-AUC قدره 97.50% على ستاتلوج.

تؤكد النتائج على أهمية معالجة البيانات وضبط المعلمات الفائقة المستندة إلى SGO في تعزيز مقاييس أداء النموذج مثل الدقة، والدقة، والاسترجاع، ودرجة F1. ومن الجدير بالذكر أن مصنف XGB المحسّن بواسطة SGO أظهر دقة استثنائية، محققًا درجة مثالية على مجموعة بيانات ستاتلوج. ومع ذلك، تعترف الدراسة بالقيود، بما في ذلك الاعتماد على مجموعتين فقط من البيانات وغياب التحقق الخارجي، مما قد يؤثر على إمكانية تعميم النتائج. يُوصى بإجراء أبحاث مستقبلية للتحقق من النموذج عبر مجموعات بيانات متنوعة وتقييم قابليته للتطبيق في العالم الحقيقي وكفاءته الحاسوبية، بهدف تحسين التشخيص المبكر وإدارة أمراض القلب، مما قد يؤدي إلى تقليل معدلات الوفيات الناتجة عن CVD.

مقدمة

تستعرض مقدمة ورقة البحث أهمية الدراسة التي أجراها ناك وآخرون، والتي تركز على متغير SGO ضمن مجال تطبيق محدد. يؤكد المؤلفون على أهمية نتائجهم في تعزيز فهم آثار ومتغيرات SGO وإمكانات تطبيقاته. تسلط النتائج الرئيسية من الدراسة الضوء على فعالية متغير SGO في معالجة التحديات ضمن المجال المختار، مما يظهر فائدته ويقدم رؤى لتوجيهات البحث المستقبلية. تمهد المقدمة الطريق لاستكشاف مفصل للمنهجية والنتائج التي تليها في الأقسام اللاحقة من الورقة.

طرق

تركز المنهجية الموضحة في هذا البحث على تطوير خوارزمية تعلم آلي (ML) فعالة لاكتشاف أمراض القلب. تستخدم الدراسة مصنفات الغابة العشوائية (RF) وXGBoost (XGB)، المحسّنة من خلال خوارزمية SGO الميتاهيرستية لتعزيز دقة التنبؤ. تم استخدام مجموعات بيانات كليفلاند وستاتلوج من كاجل للتدريب والاختبار، تم اختيارها لملاءمتها للطبيعة عالية الأبعاد وغير المتوازنة والصاخبة للبيانات الطبية. تُعرف RF بمتانتها وقابليتها للتفسير، بينما يتفوق XGB في الأداء من خلال تقنيات تعزيز التدرج. تم إجراء ضبط المعلمات الفائقة قبل تدريب النموذج، مع تقسيم 80/20 لمجموعة البيانات لأغراض التدريب والاختبار.

تم تنفيذ التجارب باستخدام بايثون ومكتبة Scikit-learn، إلى جانب أدوات الحوسبة العلمية الأخرى مثل NumPy وmatplotlib. تم إجراء تقييم شامل لمختلف طرق ML، بما في ذلك الأساليب التقليدية والأساليب المعتمدة على التجميع، حيث تم اختبار مصنفات RF وXGB المحسّنة بواسطة SGO عبر 10 تكرارات لضمان الموثوقية. تم حساب مقاييس الأداء مثل الدقة (Acc.)، والدقة (Prec.)، والاسترجاع (Rec.)، ودرجة F1 (F1-S)، ومساحة تحت منحنى ROC (RAUC)، مما يوفر تقييمًا شاملاً لفعالية النماذج. تؤكد الدراسة على أهمية هذه المقاييس في تقييم متانة واستقرار المنهجيات المقترحة.

نتائج

تظهر نتائج الدراسة فعالية نماذج مصنف الغابة العشوائية (RF) وXGBoost (XGB) المحسّنة بواسطة SGO في تحقيق أداء مثالي على مجموعات بيانات أمراض القلب، وبشكل خاص مجموعات بيانات كليفلاند وستاتلوج. أظهر المعالجة المسبقة تأثيرًا كبيرًا على أداء النموذج، حيث حقق مصنف XGB المحسّن بواسطة SGO أعلى دقة بلغت 97.62% وتبعه RF المحسّن بواسطة SGO بدقة 95.24%. ومن الجدير بالذكر أن كلا النموذجين أظهر 100% دقة عبر جميع التكوينات، مما يدل على قدرتهما على تحديد الإيجابيات الحقيقية دون أي إيجابيات زائفة. عززت خطوة المعالجة المسبقة الاسترجاع بشكل ملحوظ، حيث تحسن RF المحسّن بواسطة SGO من 81% إلى 90% ومصنف XGB من 86% إلى 95%، مما يبرز زيادة الحساسية.

أظهر التحليل الإضافي أن النماذج المحسّنة بواسطة SGO تفوقت على الأساليب التقليدية والأساليب المعتمدة على التجميع، مثل شبكة الأعصاب لتلقيح الزهور وشبكة CNN وشبكة الاعتقاد العميق، التي سجلت دقة بلغت 89.60% و90.00% على التوالي. كما قدمت الدراسة تقييمًا شاملاً لمختلف طرق التعلم الآلي، حيث حقق مصنف RF وXGB المحسّن بواسطة SGO باستمرار مقاييس متفوقة عبر الدقة، والدقة، والاسترجاع، ودرجة F1، ودرجات RAUC. تؤكد النتائج على إمكانيات خوارزمية SGO في ضبط المعلمات الفائقة وأهمية المعالجة المسبقة في تعزيز أداء النموذج، مما يؤكد قابلية تطبيق النهج المقترح في مهام التصنيف في العالم الحقيقي.

مناقشة

تؤكد قسم المناقشة في ورقة البحث على الحاجة الملحة للكشف المبكر والدقيق عن أمراض القلب (HD)، التي تظل سببًا رئيسيًا للوفيات عالميًا، خاصة في البلدان النامية. على الرغم من التقدم في تقنيات الرعاية الصحية القلبية، غالبًا ما تظهر الطرق التشخيصية التقليدية قيودًا في الحساسية والتنوع. أظهرت دمج خوارزميات التعلم الآلي (ML)، وخاصة الأساليب المعتمدة على التجميع مثل الغابة العشوائية (RF) وXGBoost (XGB)، وعدًا في تعزيز دقة التشخيص. ومع ذلك، يتأثر أداء هذه النماذج بشدة بضبط المعلمات الفائقة، مما قد يكون تحديًا بسبب الطبيعة المعقدة وغير الخطية لمساحة البحث.

لمعالجة هذه التحديات، تقدم الورقة خوارزمية تحسين المجموعة الاجتماعية (SGO)، وهي نهج ميتاهيرستي مستوحى من السلوك الاجتماعي البشري، والذي يوازن بشكل فعال بين الاستكشاف والاستغلال أثناء ضبط المعلمات الفائقة. يسمح آلية التعلم المزدوج لـ SGO بالتنقل عبر مشاكل التحسين المعقدة دون الاعتماد على معلومات المشتقات، مما يجعلها مناسبة بشكل خاص للسيناريوهات المغلقة. توضح الدراسة فعالية SGO عبر تطبيقات متنوعة، بما في ذلك توقع أمراض القلب، من خلال ضبط المعلمات الفائقة لمصنفات RF وXGB باستخدام مجموعات بيانات مرجعية مثل كليفلاند وستاتلوج. تشير النتائج إلى أن SGO لا يحسن فقط دقة التنبؤ ولكن أيضًا يعزز الكفاءة الحاسوبية، مما يساهم في المجال الأوسع للتحليلات التنبؤية في الرعاية الصحية.

القيود

تناقش قسم القيود كيف أن طرق ME-SGO (تحسين التدرج المحاكي المعزز المعدل) في الهندسة وتحسين المركبات الكهربائية (EV) قد عالجت القيود الجوهرية لطرق تحسين التدرج المحاكي القياسية (SGO). بشكل محدد، يبرز المؤلفون، مانك وآخرون، أن تقنيات SGO التقليدية غالبًا ما تواجه صعوبات في سرعة التقارب والقدرة على الهروب من القيعان المحلية، مما قد يعيق الأداء الأمثل في المشكلات الهندسية المعقدة.

من خلال تنفيذ ME-SGO، يظهر المؤلفون تحسينات في كل من معدلات التقارب وجودة الحل، مما يعزز عملية التحسين لتطبيقات EV. ومع ذلك، على الرغم من هذه التقدمات، يعترف المؤلفون بأنه قد لا تزال هناك تحديات متبقية تتعلق بالكفاءة الحاسوبية وقابلية التوسع عند تطبيق ME-SGO على أنظمة أكبر وأكثر تعقيدًا. تؤكد هذه الاعترافات على الحاجة إلى أبحاث مستمرة لمزيد من تحسين هذه التقنيات في السياقات الهندسية العملية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-02525-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40414947
Publication Date: 2025-05-25
Author(s): Anima Naik et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper addresses the pressing global health issue of cardiovascular disease (CVD), which accounts for approximately 31.5% of global deaths, with projections indicating an increase to 24.2 million deaths annually by 2030. The study introduces a heart disease prediction (HDP) model utilizing Random Forest (RF) and eXtreme Gradient Boosting (XGB) classifiers, optimized through hyperparameter tuning via the Social Group Optimization (SGO) algorithm. Validation on the Cleveland and Statlog datasets revealed significant improvements in model performance post-optimization, with RF achieving an accuracy of 95.08% and a ROC-AUC of 95.26% on Cleveland, and XGB reaching 97.62% accuracy with a ROC-AUC of 97.50% on Statlog.

The findings underscore the importance of data preprocessing and SGO-based tuning in enhancing model performance metrics such as accuracy, precision, recall, and F1-score. Notably, the SGO-tuned XGB Classifier demonstrated exceptional precision, achieving a perfect score on the Statlog dataset. However, the study acknowledges limitations, including reliance on only two datasets and the absence of external validation, which may affect the generalizability of the results. Future research is recommended to validate the model across diverse datasets and assess its real-world applicability and computational efficiency, ultimately aiming to improve early diagnosis and management of heart disease, thereby potentially reducing CVD mortality rates.

Introduction

The introduction of the research paper outlines the significance of the study conducted by Naik et al., which focuses on the SGO variant within a specific application domain. The authors emphasize the relevance of their findings in advancing the understanding of the SGO variant’s implications and potential applications. Key findings from the study highlight the effectiveness of the SGO variant in addressing challenges within the chosen domain, demonstrating its utility and offering insights for future research directions. The introduction sets the stage for a detailed exploration of the methodology and results that follow in the subsequent sections of the paper.

Methods

The methodology outlined in this research focuses on developing an effective machine learning (ML) algorithm for heart disease detection. The study employs Random Forest (RF) and XGBoost (XGB) classifiers, optimized through the SGO metaheuristic algorithm to enhance predictive accuracy. The datasets utilized for training and testing were the Cleveland and Statlog datasets from Kaggle, selected for their relevance to the high-dimensional, imbalanced, and noisy nature of medical data. RF is recognized for its robustness and interpretability, while XGB excels in performance through gradient boosting techniques. Hyperparameter tuning was conducted prior to model training, with an 80/20 split of the dataset for training and testing purposes.

The experiments were executed using Python and the Scikit-learn library, alongside other scientific computing tools such as NumPy and matplotlib. A comprehensive evaluation of various ML methods, including traditional and ensemble-based approaches, was performed, with the SGO-tuned RF and XGB classifiers being tested across 10 iterations to ensure reliability. Performance metrics such as accuracy (Acc.), precision (Prec.), recall (Rec.), F1-score (F1-S), and the area under the ROC curve (RAUC) were calculated, providing a thorough assessment of the models’ effectiveness. The study emphasizes the importance of these metrics in evaluating the robustness and stability of the proposed methodologies.

Results

The results of the study demonstrate the effectiveness of the SGO-tuned Random Forest (RF) and XGBoost (XGB) Classifier models in achieving optimal performance on heart disease datasets, specifically the Cleveland and Statlog datasets. Preprocessing was shown to significantly impact model performance, with the SGO-tuned XGB Classifier achieving the highest accuracy of 97.62% and the SGO-tuned RF following closely at 95.24%. Notably, both models exhibited 100% precision across all configurations, indicating their capability to identify true positives without any false positives. The preprocessing step notably enhanced recall, with the SGO-tuned RF improving from 81% to 90% and the XGB Classifier from 86% to 95%, highlighting increased sensitivity.

Further analysis revealed that the SGO-tuned models outperformed traditional and ensemble methods, such as the Flower Pollination Neural Network and CNN & Deep Belief Network, which recorded accuracies of 89.60% and 90.00%, respectively. The study also provided a comprehensive evaluation of various machine learning methods, with the SGO-tuned RF and XGB Classifier consistently achieving superior metrics across accuracy, precision, recall, F1-score, and RAUC scores. The findings underscore the potential of the SGO algorithm for hyperparameter tuning and the importance of preprocessing in enhancing model performance, confirming the proposed approach’s applicability in real-world classification tasks.

Discussion

The discussion section of the research paper emphasizes the critical need for early and accurate detection of heart disease (HD), which remains a leading cause of mortality globally, particularly in developing countries. Despite advancements in cardiac healthcare technologies, traditional diagnostic methods often exhibit limitations in sensitivity and variability. The integration of machine learning (ML) algorithms, particularly ensemble methods like Random Forest (RF) and XGBoost (XGB), has shown promise in enhancing diagnostic accuracy. However, the performance of these models is heavily influenced by hyperparameter tuning, which can be challenging due to the complex and nonlinear nature of the search space.

To address these challenges, the paper introduces the Social Group Optimization (SGO) algorithm, a metaheuristic approach inspired by human social behavior, which effectively balances exploration and exploitation during hyperparameter tuning. SGO’s dual learning mechanism allows it to navigate complex optimization problems without relying on derivative information, making it particularly suitable for black-box scenarios. The study demonstrates SGO’s effectiveness across various applications, including heart disease prediction, by fine-tuning the hyperparameters of RF and XGB classifiers using benchmark datasets like Cleveland and Statlog. The findings suggest that SGO not only improves predictive accuracy but also enhances computational efficiency, thereby contributing to the broader field of predictive analytics in healthcare.

Limitations

The section on limitations discusses how the ME-SGO (Modified Enhanced Simulated Gradient Optimization) engineering and electric vehicle (EV) optimization approaches have addressed the inherent limitations of standard Simulated Gradient Optimization (SGO) methods. Specifically, the authors, Manic et al., highlight that traditional SGO techniques often struggle with convergence speed and the ability to escape local minima, which can hinder optimal performance in complex engineering problems.

By implementing ME-SGO, the authors demonstrate improvements in both convergence rates and solution quality, thereby enhancing the optimization process for EV applications. However, despite these advancements, the authors acknowledge that there may still be residual challenges related to computational efficiency and scalability when applying ME-SGO to larger, more intricate systems. This recognition underscores the need for ongoing research to further refine these optimization techniques in practical engineering contexts.