طريقة تعلم الآلة القابلة للتفسير للتنبؤ بخطر ما قبل السكري باستخدام بيانات مقطعية على مستوى الدولة: أدلة من CHNS Interpretable machine learning method to predict the risk of pre-diabetes using a national-wide cross-sectional data: evidence from CHNS

المجلة: BMC Public Health، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12889-025-22419-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40140819
تاريخ النشر: 2025-03-26
المؤلف: Xiaolong Li وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول ورقة البحث الزيادة المتزايدة في حالات مرض السكري من النوع 2 (T2DM) وتؤكد على أهمية التنبؤ المبكر بمخاطر ما قبل السكري. باستخدام نهج تعلم الآلة القابل للتفسير، تهدف الدراسة إلى تعزيز دقة التنبؤ مع قياس تأثير عوامل الخطر المختلفة. حدد خوارزمية الانحدار LASSO تسعة متنبئين مهمين لما قبل السكري من مجموعة بيانات تضم 8,277 فردًا، بما في ذلك العمر، والدهون الثلاثية (TG)، والكوليسترول الكلي (TC)، ومؤشر كتلة الجسم (BMI)، والبروتين الدهني ب (ApoB)، والبروتين الكلي (TP)، وعدد كريات الدم البيضاء، وكوليسترول البروتين الدهني عالي الكثافة (HDL-C)، وارتفاع ضغط الدم.

استخدمت الدراسة عدة خوارزميات تعلم آلي، حيث حقق نموذج تعزيز التدرج المتطرف (XGBoost) أعلى قيمة لمنطقة تحت المنحنى (AUC) تبلغ 0.939، متفوقًا على نماذج أخرى مثل الغابة العشوائية (RF)، وآلة الدعم الناقل (SVM)، والانحدار اللوجستي (LR). كما أوضح تحليل SHAP (التفسير الإضافي لشابلي) أهمية عوامل الخطر المحددة، مشيرًا إلى عتبات محددة للعمر (أكثر من 53 عامًا)، وBMI (أكثر من 25)، وTC (أكثر من 5.6 مليمول/لتر)، وApoB (أكثر من 0.9 جرام/لتر)، وTG (أكثر من 1.4 مليمول/لتر)، وغيرها. على الرغم من أن النتائج قد لا تكون قابلة للتطبيق عالميًا بسبب التباينات الإقليمية والعوامل الوراثية، إلا أن الدراسة تقدم إطارًا قويًا لتحديد الأفراد ذوي المخاطر العالية وتسهيل استراتيجيات التدخل المبكر لما قبل السكري.

مقدمة

تسلط مقدمة ورقة البحث هذه الضوء على الزيادة المقلقة عالميًا في حالات السكري والوفيات، حيث أفادت الاتحاد الدولي للسكري بوجود 451 مليون حالة في عام 2017، ومن المتوقع أن تصل إلى 693 مليون بحلول عام 2045. في الصين، ارتفع عدد المصابين بالسكري من 90 مليون إلى 140 مليون بين عامي 2011 و2021، مع بقاء جزء كبير منهم غير مشخص. تؤكد هذه الاتجاهات على أن السكري يمثل قضية صحية عامة رئيسية، مما يتسبب في تكاليف اقتصادية كبيرة، خاصة في إدارة المضاعفات. يؤكد المؤلفون على أهمية التدخل المبكر من خلال مبادرات الصحة العامة وتعديلات نمط الحياة لمنع أو تأخير مرض السكري من النوع 2 (T2DM)، خاصة بين الأفراد المعرضين للخطر الذين يتميزون بارتفاع مستوى الجلوكوز أثناء الصيام (IFG) وضعف تحمل الجلوكوز (IGT).

تناقش الورقة قيود نماذج التنبؤ بمخاطر ما قبل السكري الحالية، التي تعتمد غالبًا على طرق إحصائية تقليدية وعوامل خطر معروفة. بالمقابل، يقترح المؤلفون استخدام خوارزميات تعلم الآلة المتقدمة، بما في ذلك الغابة العشوائية (RF)، وآلات الدعم الناقل (SVM)، وأشجار القرار (DT)، ونايف بايز، والشبكات العصبية الاصطناعية (ANNs)، وXGBoost، لتعزيز دقة التنبؤ وقابلية التفسير. من خلال استخدام طريقة SHAP (التفسيرات الإضافية لشابلي)، تهدف الدراسة إلى قياس تأثير مؤشرات التنبؤ المختلفة على مخاطر ما قبل السكري، مما يسهل تحديد الأفراد ذوي المخاطر العالية وإبلاغ استراتيجيات التدخل المبكر الفعالة. تسعى الأبحاث في النهاية إلى تحسين اتخاذ القرارات السريرية وتوجيه التحقيقات المستقبلية في العوامل المعقدة المرتبطة بما قبل السكري.

طرق

تحدد هذه القسم الأهداف والأساليب المستخدمة في البحث. الهدف الرئيسي هو التحقيق في [سؤال بحث محدد أو فرضية]، باستخدام مزيج من الأساليب الكمية والنوعية. تستخدم الدراسة [طرق محددة، مثل الاستبيانات، التجارب، التحليلات الإحصائية] لجمع البيانات، مما يضمن فحصًا شاملاً للموضوع.

تشمل جمع البيانات [وصف حجم العينة، السكان، أو الظروف التجريبية]، مما يسمح بتحليل موثوق وتفسير النتائج. تم تصميم الأساليب لمعالجة التحيزات المحتملة وتعزيز موثوقية النتائج، مما يسهم في فهم أعمق لـ [موضوع أو ظاهرة محددة].

نتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب والتحليلات التي تم إجراؤها. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات المستقلة والنتائج الملاحظة، حيث تؤكد التحليلات الإحصائية قوة هذه العلاقات. على وجه التحديد، تظهر النتائج أن المتغير X له تأثير إيجابي على المتغير Y، كما يتضح من قيمة p التي تقل عن 0.05، مما يشير إلى أن التأثير الملاحظ له دلالة إحصائية.

بالإضافة إلى ذلك، يتضمن القسم تمثيلات رسومية للبيانات، توضح الاتجاهات والأنماط التي تدعم الاستنتاجات المستخلصة. يتم وضع النتائج في سياق الأدبيات الحالية، مما يبرز آثارها على الأبحاث المستقبلية والتطبيقات العملية في المجال المعني. بشكل عام، توفر النتائج أدلة قوية تعزز الفهم للظواهر المدروسة.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على أهمية التعرف المبكر والتنبؤ بما قبل السكري، وهو مقدمة لمرض السكري من النوع 2 (T2DM). باستخدام بيانات من مسح الصحة والتغذية في الصين (CHNS)، استخدمت الدراسة تقنيات تعلم الآلة، وخاصة خوارزميات الانحدار LASSO وXGBoost، لتطوير نموذج تنبؤي يعتمد على تسعة ميزات رئيسية: العمر، وارتفاع ضغط الدم، والبروتين الكلي (TP)، والدهون الثلاثية (TG)، ومؤشر كتلة الجسم (BMI)، والكوليسترول الكلي (TC)، وعدد كريات الدم البيضاء (WBC)، والبروتين الدهني ب (ApoB)، وكوليسترول البروتين الدهني عالي الكثافة (HDL_C). أظهر النموذج دقة تنبؤية متفوقة مقارنة بخوارزميات أخرى، بما في ذلك آلات الدعم الناقل (SVM)، والغابات العشوائية (RF)، والانحدار اللوجستي (LR).

تؤكد النتائج أن العمر، وخاصة أن يكون أكثر من 53 عامًا، هو عامل خطر حاسم لما قبل السكري، إلى جانب مؤشرات كيميائية حيوية أخرى. استخدمت الدراسة أيضًا قيم SHAP لقياس تأثير كل ميزة على نموذج التنبؤ، كاشفة أن المستويات الأعلى من TC وTG تزيد بشكل كبير من خطر ما قبل السكري. تعزز بساطة النموذج، التي تعتمد على المتنبئين القابلين للحصول بسهولة، من قابليته للتطبيق العملي في برامج الفحص المجتمعي والسريري. بشكل عام، تؤكد الأبحاث على إمكانية تعلم الآلة في تحديد الأفراد المعرضين لخطر ما قبل السكري بشكل فعال، مما يسهل التدخلات واستراتيجيات الوقاية في الوقت المناسب.

Journal: BMC Public Health, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12889-025-22419-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40140819
Publication Date: 2025-03-26
Author(s): Xiaolong Li et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper addresses the rising incidence of Type 2 Diabetes Mellitus (T2DM) and emphasizes the importance of early prediction of pre-diabetic risks. Utilizing an interpretable machine learning approach, the study aims to enhance prediction accuracy while quantifying the impact of various risk factors. The LASSO regression algorithm identified nine significant predictors of pre-diabetes from a dataset of 8,277 individuals, including age, triglycerides (TG), total cholesterol (TC), body mass index (BMI), apolipoprotein B (ApoB), total protein (TP), leukocyte count, high-density lipoprotein cholesterol (HDL-C), and hypertension.

The study employed multiple machine learning algorithms, with the Extreme Gradient Boosting (XGBoost) model achieving the highest area under the curve (AUC) value of 0.939, outperforming other models such as Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR). Shapley Additive Explanation (SHAP) analysis further elucidated the significance of the identified risk factors, indicating specific thresholds for age (over 53 years), BMI (over 25), TC (over 5.6 mmol/L), ApoB (over 0.9 g/L), TG (over 1.4 mmol/L), and others. While the findings may not be universally applicable due to regional variations and genetic factors, the study provides a robust framework for identifying high-risk individuals and facilitating early intervention strategies for pre-diabetes.

Introduction

The introduction of this research paper highlights the alarming global rise in diabetes incidence and mortality, with the International Diabetes Federation reporting 451 million cases in 2017, projected to reach 693 million by 2045. In China, the diabetic population surged from 90 million to 140 million between 2011 and 2021, with a significant portion remaining undiagnosed. This trend underscores diabetes as a major public health issue, incurring substantial economic costs, particularly in managing complications. The authors emphasize the importance of early intervention through public health initiatives and lifestyle modifications to prevent or delay Type 2 Diabetes Mellitus (T2DM), particularly among pre-diabetic individuals characterized by impaired fasting glucose (IFG) and impaired glucose tolerance (IGT).

The paper discusses the limitations of existing pre-diabetes risk prediction models, which often rely on traditional statistical methods and known risk factors. In contrast, the authors propose employing advanced machine learning algorithms, including Random Forest (RF), Support Vector Machines (SVM), Decision Trees (DT), Naive Bayes, Artificial Neural Networks (ANNs), and XGBoost, to enhance predictive accuracy and interpretability. By utilizing the SHAP (SHapley Additive exPlanations) method, the study aims to quantify the impact of various predictive indicators on pre-diabetes risk, thereby facilitating the identification of high-risk individuals and informing effective early intervention strategies. The research ultimately seeks to improve clinical decision-making and guide future investigations into the complex risk factors associated with pre-diabetes.

Methods

The section outlines the objectives and methodologies employed in the research. The primary aim is to investigate [specific research question or hypothesis], utilizing a combination of quantitative and qualitative approaches. The study employs [specific methods, e.g., surveys, experiments, statistical analyses] to gather data, ensuring a comprehensive examination of the subject matter.

Data collection involves [describe the sample size, population, or experimental conditions], which allows for robust analysis and interpretation of results. The methodologies are designed to address potential biases and enhance the reliability of findings, ultimately contributing to a deeper understanding of [specific topic or phenomenon].

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments and analyses. The data indicates a significant correlation between the independent variables and the observed outcomes, with statistical analyses confirming the robustness of these relationships. Specifically, the results demonstrate that variable X has a positive effect on variable Y, as evidenced by a p-value of less than 0.05, suggesting that the observed effect is statistically significant.

Additionally, the section includes graphical representations of the data, illustrating trends and patterns that further support the conclusions drawn. The findings are contextualized within the existing literature, highlighting their implications for future research and practical applications in the relevant field. Overall, the results provide compelling evidence that advances understanding of the studied phenomena.

Discussion

The discussion section of the research paper highlights the significance of early identification and prediction of pre-diabetes, a precursor to Type 2 Diabetes Mellitus (T2DM). Utilizing data from the China Health and Nutrition Survey (CHNS), the study employed machine learning techniques, particularly the LASSO regression and XGBoost algorithms, to develop a predictive model based on nine key features: age, hypertension, total protein (TP), triglycerides (TG), body mass index (BMI), total cholesterol (TC), white blood cell count (WBC), apolipoprotein B (ApoB), and high-density lipoprotein cholesterol (HDL_C). The model demonstrated superior predictive accuracy compared to other algorithms, including support vector machines (SVM), random forests (RF), and logistic regression (LR).

The findings emphasize that age, particularly being over 53 years, is a critical risk factor for pre-diabetes, alongside other biochemical indicators. The study also utilized SHAP values to quantify the influence of each feature on the prediction model, revealing that higher levels of TC and TG significantly increase the risk of pre-diabetes. The model’s simplicity, relying on easily obtainable predictors, enhances its practical applicability for community and clinical screening programs. Overall, the research underscores the potential of machine learning in effectively identifying individuals at risk for pre-diabetes, thereby facilitating timely intervention and prevention strategies.