خوارزميات التعلم الآلي للتنبؤ بالاكتئاب لدى كبار السن في الصين: دراسة مقطعية Machine learning algorithms to predict depression in older adults in China: a cross-sectional study

المجلة: Frontiers in Public Health، المجلد: 12
DOI: https://doi.org/10.3389/fpubh.2024.1462387
PMID: https://pubmed.ncbi.nlm.nih.gov/39839428
تاريخ النشر: 2025-01-07
المؤلف: Zhenyun Du وآخرون
الموضوع الرئيسي: الصحة النفسية من خلال الكتابة

نظرة عامة

تبحث هذه الدراسة في القدرات التنبؤية لخوارزميات التعلم الآلي (ML) فيما يتعلق بحدوث الاكتئاب بين كبار السن في الصين، باستخدام بيانات من دراسة الصحة والتقاعد الطويلة الأمد في الصين. تم تقسيم عينة من كبار السن إلى مجموعات تدريب واختبار، وتم استخدام ستة خوارزميات تعلم آلي—الانحدار اللوجستي، أقرب الجيران، آلة الدعم الشعاعي، شجرة القرار، LightGBM، وغابة عشوائية—لتطوير نماذج تنبؤية. تم تقييم أداء النماذج باستخدام المساحة تحت المنحنى (AUC) لتحليل ROC وتحليل منحنى القرار (DCA)، مع تطبيق اختبار Delong لمقارنة منحنيات ROC. أشارت النتائج إلى وجود اختلافات كبيرة في قيم AUC (P < .)، حيث أظهر LightGBM أعلى فائدة صافية عبر عتبات احتمالية مختلفة. تشمل المتنبئات الرئيسية المحددة الصحة الذاتية، نوم الليل، الجنس، العمر، والوظيفة الإدراكية. في الختام، تؤكد الدراسة أن خوارزميات التعلم الآلي يمكن أن تتنبأ بفعالية بالاكتئاب بين كبار السن في الصين وتحدد العوامل الحاسمة التي تسهم في حدوثه. تفوق نموذج LightGBM على الآخرين، مما يشير إلى فائدته في تحديد المجموعات عالية المخاطر للتدخلات المبكرة المستهدفة. تسلط النتائج الضوء على أهمية العوامل المحددة وتقترح أن تركز الأبحاث المستقبلية على تعزيز هذه العناصر في تطوير النماذج اللاحقة.

مقدمة

تسلط مقدمة ورقة البحث الضوء على انتشار وتأثير الاكتئاب الكبيرين بين كبار السن، كما هو محدد في DSM-5. مع تأثر حوالي 300 مليون فرد على مستوى العالم، يحتل الاكتئاب مرتبة بين أكثر الأمراض النفسية شيوعًا ومن المتوقع أن يكون السبب الرئيسي لعبء المرض العالمي بحلول عام 2023. يرتبط هذا المرض بعواقب وخيمة، بما في ذلك ارتفاع معدلات الانتحار، وتدني جودة الحياة، وزيادة الوفيات، وارتفاع تكاليف الرعاية الصحية. على الرغم من انتشاره، لا يزال العلاج للاكتئاب بين كبار السن غير كافٍ، مع إحصائيات مقلقة تشير إلى أن 0.5% فقط من الأفراد المتأثرين في الصين تلقوا رعاية كافية اعتبارًا من عام 2021.

تؤكد الورقة على ضرورة تحديد عوامل الخطر وخصائص الاكتئاب لتسهيل التدخل المبكر. لقد برز التعلم الآلي (ML) كأداة قيمة في أبحاث الصحة النفسية، مما يتيح تحليل مجموعات البيانات المعقدة لتعزيز الفهم والتنبؤ بالاضطرابات النفسية. استخدمت الدراسات السابقة التي استندت إلى بيانات من دراسة الصحة والتقاعد الطويلة الأمد في الصين (CHARLS) خوارزميات تعلم آلي مختلفة للتنبؤ بنتائج الاكتئاب. ومع ذلك، فإن هذه الدراسات تعود إلى ما قبل جائحة COVID-19، التي ارتبطت بزيادة في أعراض الاكتئاب. تهدف الأبحاث الحالية إلى تحليل بيانات CHARLS المحدثة التي تم جمعها خلال الجائحة للتنبؤ بالاكتئاب وتحديد العوامل المؤثرة بين كبار السن في الصين، باستخدام ستة نماذج تنبؤية متميزة: أقرب الجيران (KNN)، الانحدار اللوجستي، شجرة القرار (DT)، الغابة العشوائية (RF)، LightGBM، وآلة الدعم الشعاعي (SVM).

طرق

توضح قسم “بيانات البحث والطرق” النهج المنهجي المستخدم في الدراسة لجمع وتحليل البيانات. يتناول التفاصيل الخاصة بالمنهجيات المستخدمة، بما في ذلك معايير اختيار المشاركين، وتقنيات جمع البيانات، والأطر التحليلية. من المحتمل أن تكون الدراسة قد استخدمت طرقًا كمية و/أو نوعية، مما يضمن فحصًا قويًا للفرضيات المطروحة.

بالإضافة إلى ذلك، قد يصف القسم الأدوات والآلات المستخدمة لقياس البيانات، فضلاً عن أي تحليلات إحصائية تم إجراؤها لتفسير النتائج. تعتبر دقة الطرق أمرًا حيويًا للتحقق من صحة النتائج وضمان أنها تسهم بشكل ذي مغزى في المعرفة الحالية في هذا المجال.

نتائج

حللت الدراسة عينة من 7,880 من كبار السن، تتكون من 4,125 أنثى (52.35%) بمتوسط عمر 69.05 عامًا، من بينهم 2,996 (38.5%) تم تشخيصهم بالاكتئاب. تم تحديد المتغيرات الديموغرافية والصحية الرئيسية، بما في ذلك معدلات الأمية العالية (56.0%)، والاعتلال المتعدد (75.09%)، ونقص المشاركة في الأنشطة الاجتماعية (56.47%). لوحظت اختلافات ذات دلالة إحصائية عبر هذه المتغيرات (P < 0.001). استخدمت الدراسة بيانات CHARLS لعام 2020 للتنبؤ بالاكتئاب باستخدام ستة خوارزميات تعلم آلي (ML)، حيث أظهر نموذج LightGBM أداءً متفوقًا بناءً على مقاييس المساحة تحت المنحنى (AUC). سلط التحليل الضوء على خمسة متنبئات حاسمة للاكتئاب: الصحة الذاتية، مدة نوم الليل، الجنس، العمر، والوظيفة الإدراكية، كما أوضحته تحليل SHAP. من الجدير بالذكر أن الصحة الذاتية ظهرت كأهم متنبئ، مما يؤكد النتائج السابقة التي تربط بين الصحة الذاتية الضعيفة وزيادة خطر الاكتئاب. وجدت الدراسة أيضًا أن النساء من كبار السن أكثر عرضة للاكتئاب من الرجال، مما يتماشى مع الأدبيات الموجودة. ومن المثير للاهتمام، تم تحديد العمر كعامل وقائي محتمل ضد الاكتئاب خلال جائحة COVID-19، مما يشير إلى أن الخبرة الحياتية واستراتيجيات التكيف قد تخفف من خطر الاكتئاب لدى كبار السن. تظل العلاقة بين الوظيفة الإدراكية والاكتئاب مثيرة للجدل، مع أدلة مختلطة بشأن تأثيرها على أعراض الاكتئاب في هذه الفئة السكانية.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على المنهجية والنتائج لدراسة تهدف إلى التنبؤ بالاكتئاب بين كبار السن في الصين باستخدام نماذج التعلم الآلي (ML). استخدمت الدراسة بيانات من دراسة الصحة والتقاعد الطويلة الأمد في الصين (CHARLS)، التي استخدمت طريقة أخذ عينات احتمالية متناسبة مع الحجم لضمان تمثيل العينة عبر مختلف الديموغرافيات. تم تقييم المتغير الناتج، أعراض الاكتئاب، باستخدام مقياس مركز الدراسات الوبائية للاكتئاب-10 (CESD-10)، الذي يقيم جوانب مختلفة من الاكتئاب من خلال استبيان منظم. شملت الدراسة عوامل اجتماعية ديموغرافية، حالة صحية، سلوكية، وعوامل الصحة النفسية، مع التركيز على متغيرات مثل الصحة الذاتية، الوظيفة الإدراكية، وعادات نمط الحياة.

استخدم التحليل عدة خوارزميات تعلم آلي، بما في ذلك LightGBM، الذي أظهر أعلى أداء في التنبؤ بالاكتئاب، كما يتضح من قيمة المساحة تحت المنحنى (AUC) ودرجة Brier. كما استخدمت الدراسة تقنية زيادة العينة للأقليات الاصطناعية (SMOTE) لمعالجة عدم التوازن في مجموعة البيانات، مما يعزز دقة واستقرار النموذج. تم تطبيق طريقة SHapley Additive exPlanations (SHAP) لتفسير تنبؤات النموذج، مما يكشف أن المتنبئات الرئيسية للاكتئاب تشمل الصحة الذاتية، مدة نوم الليل، الجنس، العمر، والوظيفة الإدراكية. تؤكد النتائج على إمكانية تقنيات التعلم الآلي في تحديد المجموعات عالية المخاطر للاكتئاب، مما يساهم في استراتيجيات التدخل المبكر لكبار السن في الصين.

القيود

تقدم الدراسة عدة قيود قد تؤثر على صحة وموثوقية نتائجها. أولاً، تستخدم الدراسة مقياس CES-D لتقييم أعراض الاكتئاب لدى كبار السن، والذي، على الرغم من التحقق منه في السكان الصينيين، ليس المعيار الذهبي لتشخيص الاكتئاب. قد يؤدي الاعتماد على أداة واحدة إلى إدخال تحيز في نتائج التقييم. بالإضافة إلى ذلك، فإن الطبيعة العرضية للبيانات تحد من القدرة على استنتاج العلاقات السببية، مما يستلزم دراسات طولية لفهم أكثر شمولاً للعلاقات الملاحظة.

علاوة على ذلك، حدث جمع البيانات خلال جائحة COVID-19، وهو عامل ضغط فريد قد يؤثر على كل من المتنبئات، مثل أنماط النوم، والنتيجة، الاكتئاب، مما قد يؤدي إلى إدخال تحيز. قد تكون البيانات المبلغ عنها ذاتيًا أيضًا عرضة لتحيز الإبلاغ، حيث قد يعكس المشاركون حالتهم الصحية بشكل غير دقيق بسبب الرغبة الاجتماعية أو تحيز الذاكرة. علاوة على ذلك، قد ترتبط المتغيرات المربكة مثل الجنس، العمر، والوظيفة الإدراكية بعوامل غير مقاسة، مثل الوضع الاجتماعي والاقتصادي، مما يعقد تفسير النتائج. قد يؤدي تطبيق خوارزمية SMOTE لموازنة مجموعة البيانات إلى إنشاء ارتباطات اصطناعية، بينما تتطلب أداء النموذج التحقق مع مجموعة مستقلة لضمان قابليته للتعميم. على الرغم من هذه القيود، تقترح الدراسة أن النموذج المطور يحمل وعدًا في التنبؤ ومعالجة الاكتئاب لدى كبار السن.

Journal: Frontiers in Public Health, Volume: 12
DOI: https://doi.org/10.3389/fpubh.2024.1462387
PMID: https://pubmed.ncbi.nlm.nih.gov/39839428
Publication Date: 2025-01-07
Author(s): Zhenyun Du et al.
Primary Topic: Mental Health via Writing

Overview

This research investigates the predictive capabilities of machine learning (ML) algorithms regarding depression incidence among older adults in China, utilizing data from the China Health and Retirement Longitudinal Study. A sample of older adults was divided into training and testing sets, and six ML algorithms—logistic regression, k-nearest neighbors, support vector machine, decision tree, LightGBM, and random forest—were employed to develop predictive models. The models’ performance was assessed using the area under the curve (AUC) for ROC analysis and decision curve analysis (DCA), with the Delong test applied to compare ROC curves. The results indicated significant differences in AUC values (P < .), with LightGBM demonstrating the highest net benefit across various probability thresholds. Key predictors identified included self-rated health, nighttime sleep, gender, age, and cognitive function. In conclusion, the study establishes that ML algorithms can effectively predict depression among older adults in China and identify critical factors contributing to its occurrence. The LightGBM model outperformed others, suggesting its utility in accurately identifying high-risk groups for targeted early interventions. The findings highlight the importance of the identified factors and suggest that future research should focus on enhancing these elements in subsequent model development.

Introduction

The introduction of the research paper highlights the significant prevalence and impact of depression among older adults, as defined by the DSM-5. With approximately 300 million individuals affected globally, depression ranks among the most common mental illnesses and is projected to be the leading cause of global disease burden by 2023. The condition is associated with severe consequences, including heightened suicide rates, diminished quality of life, increased mortality, and escalating healthcare costs. Despite its prevalence, treatment for depression in older adults remains inadequate, with alarming statistics indicating that only 0.5% of affected individuals in China received sufficient care as of 2021.

The paper emphasizes the necessity of identifying risk factors and characteristics of depression to facilitate early intervention. Machine learning (ML) has emerged as a valuable tool in mental health research, enabling the analysis of complex datasets to enhance understanding and prediction of mental disorders. Previous studies utilizing data from the China Health and Retirement Longitudinal Study (CHARLS) have employed various ML algorithms to predict depression outcomes. However, these studies largely predate the COVID-19 pandemic, which has been linked to an increase in depression symptoms. The current research aims to analyze updated CHARLS data collected during the pandemic to predict depression and identify influencing factors among older adults in China, employing six distinct predictive models: k-nearest neighbors (KNN), logistic regression, decision tree (DT), random forest (RF), LightGBM, and support vector machine (SVM).

Methods

The section on “Research Data and Methods” outlines the systematic approach employed in the study to gather and analyze data. It details the specific methodologies used, including the selection criteria for participants, data collection techniques, and analytical frameworks. The research likely utilized quantitative and/or qualitative methods, ensuring a robust examination of the hypotheses posed.

Additionally, the section may describe the tools and instruments used for data measurement, as well as any statistical analyses performed to interpret the results. The rigor of the methods is crucial for validating the findings and ensuring that they contribute meaningfully to the existing body of knowledge in the field.

Results

The study analyzed a sample of 7,880 older adults, comprising 4,125 females (52.35%) with an average age of 69.05 years, of whom 2,996 (38.5%) were diagnosed with depression. Key demographic and health-related variables were identified, including high rates of illiteracy (56.0%), multimorbidity (75.09%), and lack of social activity participation (56.47%). Statistically significant differences were observed across these variables (P < 0.001). The research utilized the 2020 CHARLS data to predict depression using six machine learning (ML) algorithms, with the LightGBM model demonstrating superior performance based on the area under the curve (AUC) metrics. The analysis highlighted five critical predictors of depression: self-rated health, nighttime sleep duration, gender, age, and cognitive function, as elucidated by SHAP analysis. Notably, self-rated health emerged as the most significant predictor, corroborating previous findings that link poor self-rated health to increased depression risk. The study also found that older adult women are more susceptible to depression than men, aligning with existing literature. Interestingly, age was identified as a potential protective factor against depression during the COVID-19 pandemic, suggesting that life experience and coping strategies may mitigate depression risk in older adults. The relationship between cognitive function and depression remains contentious, with mixed evidence regarding its impact on depressive symptoms in this demographic.

Discussion

The discussion section of the research paper highlights the methodology and findings of a study aimed at predicting depression among older adults in China using machine learning (ML) models. The study utilized data from the China Health and Retirement Longitudinal Study (CHARLS), which employed a stratified probability proportional to size sampling method to ensure representative sampling across various demographics. The outcome variable, depressive symptoms, was assessed using the Center for Epidemiological Studies Depression Scale-10 (CESD-10), which evaluates various facets of depression through a structured questionnaire. The study included sociodemographic, health status, behavioral, and mental health factors, with a focus on variables such as self-rated health, cognitive function, and lifestyle habits.

The analysis employed multiple ML algorithms, including LightGBM, which demonstrated the highest performance in predicting depression, as indicated by the area under the curve (AUC) value and the Brier score. The study also utilized the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance in the dataset, enhancing the model’s accuracy and stability. The SHapley Additive exPlanations (SHAP) method was applied to interpret the model’s predictions, revealing that key predictors of depression included self-rated health, nighttime sleep duration, gender, age, and cognitive function. The findings underscore the potential of ML techniques in identifying high-risk groups for depression, thereby informing early intervention strategies for older adults in China.

Limitations

The research presents several limitations that may affect the validity and reliability of its findings. Firstly, the study employs the CES-D scale to assess depressive symptoms in older adults, which, while validated in the Chinese population, is not the gold standard for diagnosing depression. This reliance on a single tool may introduce bias in the evaluation results. Additionally, the cross-sectional nature of the data restricts the ability to infer causal relationships, necessitating longitudinal studies for a more comprehensive understanding of the associations observed.

Moreover, the data collection occurred during the COVID-19 pandemic, a unique stressor that could have influenced both predictors, such as sleep patterns, and the outcome, depression, potentially introducing collider bias. Self-reported data may also be subject to reporting bias, as participants might inaccurately reflect their health status due to social desirability or recall bias. Furthermore, confounding variables like gender, age, and cognitive function may correlate with unmeasured factors, such as socioeconomic status, complicating the interpretation of results. The application of the SMOTE algorithm to balance the dataset may inadvertently create artificial correlations, while the model’s performance requires validation with an independent cohort to ensure its generalizability. Despite these limitations, the study suggests that the developed model holds promise for predicting and addressing depression in older adults.