نهج التعلم الآلي للتنبؤ بأمراض القلب والأوعية الدموية في بنغلاديش: أدلة من دراسة مقطعية في عام 2023 Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023

المجلة: BMC Cardiovascular Disorders، المجلد: 24، العدد: 1
DOI: https://doi.org/10.1186/s12872-024-03883-2
PMID: https://pubmed.ncbi.nlm.nih.gov/38632519
تاريخ النشر: 2024-04-18
المؤلف: Sorif Hossain وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تبحث هذه الورقة البحثية في الزيادة المتزايدة لاضطرابات القلب والأوعية الدموية (CVDs) في بنغلاديش، التي انتقلت من الأمراض المعدية إلى CVDs كأهم سبب للوفاة. استخدمت الدراسة مجموعة بيانات تضم 391 مريضًا باضطرابات القلب والأوعية الدموية و260 موضوعًا ضابطًا، مستخدمة تقنيات تعلم الآلة المختلفة – بما في ذلك الانحدار اللوجستي، ونايف بايز، وشجرة القرار، وأدا بوست، والغابة العشوائية، ومصنفات شجرة التجميع – لتحديد العوامل الحاسمة التي تؤثر على اضطرابات القلب والأوعية الدموية ولتطوير نماذج تنبؤية.

تشير النتائج إلى أن مصنف الغابة العشوائية تفوق على جميع الطرق الأخرى، محققًا معدل دقة قدره 96.15%، ودقة 98.04%، واسترجاع 100%، ومنطقة تحت منحنى خصائص التشغيل المستقبلية (AU-ROC) قدرها 0.989. تشير هذه النتائج إلى أن نموذج الغابة العشوائية هو أداة موثوقة للتنبؤ بمخاطر اضطرابات القلب والأوعية الدموية، مما يقدم تداعيات كبيرة للممارسة السريرية في بنغلاديش. تؤكد الدراسة على إمكانية تقنيات تعلم الآلة لتعزيز التعرف المبكر وإدارة اضطرابات القلب والأوعية الدموية، وبالتالي تحسين رعاية المرضى في البلدان ذات الدخل المنخفض والمتوسط.

مقدمة

تستعرض مقدمة الورقة البحثية العبء الصحي العالمي الكبير الذي تسببه أمراض القلب والأوعية الدموية (CVD)، والتي تشمل مجموعة من الحالات التي تؤثر على القلب والأوعية الدموية، مثل مرض الشريان التاجي وفشل القلب. وفقًا لمنظمة الصحة العالمية، كانت أمراض القلب والأوعية الدموية مسؤولة عن حوالي 17.9 مليون وفاة على مستوى العالم في عام 2016، مما يمثل 31% من جميع الوفيات، حيث كان فشل القلب مساهمًا رئيسيًا. إن انتشار أمراض القلب والأوعية الدموية مقلق بشكل خاص في البلدان ذات الدخل المنخفض والمتوسط، مثل بنغلاديش، حيث أصبحت السبب الرئيسي للوفاة، متجاوزة الأمراض المعدية. تسلط الورقة الضوء على مجموعة من عوامل الخطر السلوكية والفسيولوجية لأمراض القلب والأوعية الدموية، بما في ذلك الأنظمة الغذائية غير الصحية، وقلة النشاط البدني، وارتفاع ضغط الدم، وتؤكد على أهمية الكشف المبكر للإدارة الفعالة.

يقترح المؤلفون أن نماذج تقييم المخاطر التقليدية قد تبسط العلاقات المعقدة بين عوامل الخطر ونتائج أمراض القلب والأوعية الدموية، وغالبًا ما تفترض الخطية. بالمقابل، تقدم خوارزميات تعلم الآلة بديلاً قويًا، قادرًا على التعامل مع عدم الخطية والتفاعلات بين المتغيرات. أظهرت الدراسات السابقة، بما في ذلك دراسة لهوساين وآخرين (2023)، فعالية نماذج تعلم الآلة، وخاصة خوارزمية الغابة العشوائية، في التنبؤ بمخاطر فشل القلب بدقة عالية. تهدف الورقة إلى استكشاف انتشار وعوامل خطر أمراض القلب في بنغلاديش، مع التركيز على تطبيق تقنيات تعلم الآلة وطرق التعلم الجماعي لتعزيز دقة التنبؤ وتحديد المتنبئين الرئيسيين لفشل القلب.

طرق

تستعرض قسم “الطرق” في الورقة البحثية التصميم التجريبي والتقنيات التحليلية المستخدمة للتحقيق في سؤال البحث. استخدمت الدراسة نهجًا كميًا، يتضمن تحليلات إحصائية لتقييم البيانات المجمعة من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لملاحظة آثارها على النتائج المعنية.

شملت جمع البيانات استخدام أدوات وبروتوكولات موحدة لضمان الموثوقية والصلاحية. تم إجراء التحليل باستخدام برامج إحصائية متقدمة، مما سمح بتطبيق تقنيات مثل تحليل الانحدار واختبار الفرضيات. كما يتناول القسم طرق أخذ العينات، وخصائص المشاركين، وأي اعتبارات أخلاقية تم أخذها في الاعتبار خلال عملية البحث. بشكل عام، كانت الطرق المستخدمة مصممة بدقة لتحقيق نتائج قوية وقابلة للتكرار، مما يساهم في مصداقية الدراسة.

نتائج

يقدم قسم “النتائج” في الورقة البحثية النتائج الرئيسية المستمدة من التجارب أو التحليلات التي تم إجراؤها. يوضح بشكل منهجي النتائج، مع تسليط الضوء على نقاط البيانات والاتجاهات المهمة التي لوحظت طوال الدراسة. غالبًا ما تكون النتائج مصحوبة بتحليلات إحصائية ذات صلة، والتي قد تشمل قيم p، وفترات الثقة، أو أحجام التأثير، لدعم النتائج.

بالإضافة إلى ذلك، يتم استخدام أي تمثيلات رسومية، مثل المخططات أو الجداول، لتعزيز وضوح النتائج. تساعد هذه الوسائل البصرية في توضيح العلاقات بين المتغيرات أو تأثير التدخلات، مما يوفر نظرة شاملة على البيانات المجمعة. بشكل عام، تؤكد النتائج على الفرضيات الرئيسية التي تم اختبارها وتقدم رؤى حول تداعيات النتائج ضمن السياق الأوسع لمجال البحث.

مناقشة

تقيّم قسم المناقشة في هذه الورقة البحثية نماذج تعلم الآلة (ML) المختلفة لتنبؤ أمراض القلب والأوعية الدموية (CVD) بين الأفراد البنغلاديشيين الذين تزيد أعمارهم عن 15 عامًا. تقارن الدراسة بين الانحدار اللوجستي، ونايف بايز، وشجرة القرار، وأدا بوست، والغابة العشوائية، وشجرة التجميع، ومصنفات التعلم الجماعي، مع التركيز على فعاليتها في تشخيص فشل القلب. تم جمع مجموعة البيانات، التي تضم 651 عينة (391 مصابًا بـ CVD و260 غير مصاب)، من خلال استبيانات من ثلاث مؤسسات طبية في بنغلاديش. استخدمت التحليلات طرقًا إحصائية، بما في ذلك اختبارات كاي-تربيع، لتحديد العلاقات المهمة بين CVD ومجموعة من المتغيرات المستقلة، مثل الوضع الاجتماعي والاقتصادي، وعوامل نمط الحياة، والتاريخ الطبي.

تشير النتائج إلى أن النماذج أدت أداءً جيدًا، حيث حقق الانحدار اللوجستي دقة قدرها 95.42% ودرجة F1 قدرها 96.1%. كما أظهر مصنف نايف بايز وشجرة القرار أيضًا معدلات دقة عالية بلغت 96.18% و97.37%، على التوالي. أظهر أدا بوست دقة ملحوظة بلغت 96.95% مع استرجاع مثالي قدره 100%. تسلط الدراسة الضوء على إمكانية هذه التقنيات في مساعدة المتخصصين في الرعاية الصحية في تحديد المرضى ذوي المخاطر العالية وتحسين دقة التشخيص لأمراض القلب والأوعية الدموية، مما يسهم في استراتيجيات وقائية أفضل وإدارة المرضى.

القيود

تقدم الدراسة عدة قيود قد تؤثر على تفسير نتائجها. أولاً، هي تحليل مقطعي، تقدم فقط لمحة زمنية بدلاً من رؤية ديناميكية لبيانات المرضى. سيكون من المفيد اتباع نهج طولي لفهم أعمق وتحسين دقة التنبؤ بمخاطر أمراض القلب والأوعية الدموية (CVD). بالإضافة إلى ذلك، قد يحد حجم العينة المكونة من 651 مشاركًا من دقة نماذج تعلم الآلة المستخدمة في التنبؤ بمخاطر CVD. يُوصى بإجراء أبحاث مستقبلية مع مجموعات أكبر لتعزيز قوة النتائج وقابليتها للتعميم.

Journal: BMC Cardiovascular Disorders, Volume: 24, Issue: 1
DOI: https://doi.org/10.1186/s12872-024-03883-2
PMID: https://pubmed.ncbi.nlm.nih.gov/38632519
Publication Date: 2024-04-18
Author(s): Sorif Hossain et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

This research paper investigates the rising prevalence of cardiovascular disorders (CVDs) in Bangladesh, which has transitioned from infectious diseases to CVDs as the leading cause of mortality. The study utilized a dataset of 391 CVD patients and 260 control subjects, employing various machine learning techniques—including Logistic Regression, Naïve Bayes, Decision Tree, AdaBoost, Random Forest, and Bagging Tree classifiers—to identify critical factors influencing CVD and to develop predictive models.

The findings indicate that the Random Forest classifier outperformed all other methods, achieving a precision rate of 96.15%, an accuracy of 98.04%, a recall of 100%, and an area under the receiver operating characteristic curve (AU-ROC) of 0.989. These results suggest that the Random Forest model is a reliable tool for predicting CVD risk, offering significant implications for clinical practice in Bangladesh. The study underscores the potential of machine learning techniques to enhance early identification and management of CVD, thereby improving patient care in low-and middle-income countries.

Introduction

The introduction of the research paper outlines the significant global health burden posed by cardiovascular diseases (CVD), which include a range of conditions affecting the heart and blood vessels, such as coronary artery disease and heart failure. According to the World Health Organization, CVDs accounted for approximately 17.9 million deaths worldwide in 2016, representing 31% of all deaths, with heart failure being a major contributor. The prevalence of CVD is particularly concerning in lower- and middle-income countries, such as Bangladesh, where it has become the leading cause of death, surpassing infectious diseases. The paper highlights various behavioral and physiological risk factors for CVD, including unhealthy diets, physical inactivity, and hypertension, and emphasizes the importance of early detection for effective management.

The authors propose that traditional risk assessment models may oversimplify the complex relationships between risk factors and CVD outcomes, often assuming linearity. In contrast, machine learning algorithms offer a robust alternative, capable of handling nonlinearity and interactions among variables. Previous studies, including one by Hossain et al. (2023), have demonstrated the efficacy of machine learning models, particularly the Random Forest algorithm, in predicting heart failure risk with high accuracy. The paper aims to explore the prevalence and risk factors of cardiac disease in Bangladesh, focusing on the application of machine learning and ensemble learning approaches to enhance predictive accuracy and identify significant predictors of heart failure.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research question. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved the use of standardized instruments and protocols to ensure reliability and validity. The analysis was conducted using advanced statistical software, allowing for the application of techniques such as regression analysis and hypothesis testing. The section also details the sampling methods, participant demographics, and any ethical considerations taken into account during the research process. Overall, the methods employed were rigorously designed to yield robust and reproducible results, contributing to the study’s credibility.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments or analyses. It systematically details the outcomes, highlighting significant data points and trends observed throughout the study. The results are often accompanied by relevant statistical analyses, which may include p-values, confidence intervals, or effect sizes, to substantiate the findings.

Additionally, any graphical representations, such as charts or tables, are utilized to enhance the clarity of the results. These visual aids serve to illustrate the relationships between variables or the impact of interventions, providing a comprehensive overview of the data collected. Overall, the results underscore the main hypotheses tested and offer insights into the implications of the findings within the broader context of the research field.

Discussion

The discussion section of this research paper evaluates various machine learning (ML) models for predicting cardiovascular diseases (CVD) among Bangladeshi individuals aged over 15. The study compares logistic regression, Naïve Bayes, Decision Tree, AdaBoost, Random Forest, Bagging Tree, and Ensemble learning classifiers, focusing on their effectiveness in diagnosing heart failure. The dataset, comprising 651 samples (391 with CVD and 260 without), was collected through questionnaires from three medical institutions in Bangladesh. The analysis employed statistical methods, including chi-square tests, to identify significant associations between CVD and various independent variables, such as socio-economic status, lifestyle factors, and medical history.

The results indicate that the models performed well, with logistic regression achieving an accuracy of 95.42% and an F1 score of 96.1%. The Naïve Bayes classifier and Decision Tree also demonstrated high accuracy rates of 96.18% and 97.37%, respectively. AdaBoost exhibited a remarkable accuracy of 96.95% with a perfect recall of 100%. The study highlights the potential of these ML techniques to assist healthcare professionals in identifying high-risk patients and improving diagnostic accuracy for CVD, ultimately contributing to better preventive strategies and patient management.

Limitations

The study presents several limitations that may affect the interpretation of its findings. Primarily, it is a cross-sectional analysis, offering only a temporal snapshot rather than a dynamic view of patient data. A longitudinal approach would be advantageous for a deeper understanding and improved predictive accuracy regarding cardiovascular disease (CVD) risk. Additionally, the sample size of 651 participants may restrict the precision of the machine learning models employed for CVD risk prediction. Future research with larger cohorts is recommended to enhance the robustness and generalizability of the results.