استكشاف عميق لطرق التعلم الآلي لاكتشاف حالة الصحة النفسية: مراجعة منهجية وتحليل An in-depth exploration of machine learning methods for mental health state detection: a systematic review and analysis

المجلة: Frontiers in Digital Health، المجلد: 7
DOI: https://doi.org/10.3389/fdgth.2025.1724348
PMID: https://pubmed.ncbi.nlm.nih.gov/41550350
تاريخ النشر: 2026-01-02
المؤلف: Md Jawadul Hasan وآخرون
الموضوع الرئيسي: الصحة النفسية من خلال الكتابة

نظرة عامة

تتناول المراجعة المنهجية المقدمة في هذه الورقة الزيادة المتزايدة في انتشار قضايا الصحة النفسية وإمكانات التعلم الآلي (ML) كأداة تشخيصية. قام المؤلفون بإجراء بحث شامل عبر قواعد البيانات الرئيسية من يناير 2015 إلى ديسمبر 2024، حيث تم تحديد 3,320 مقالة، منها 35 استوفت معايير الإدراج. استخدمت هذه الدراسات مجموعة متنوعة من تقنيات التعلم الآلي، بما في ذلك الأساليب المراقبة وغير المراقبة، حيث استمدت 14 دراسة بياناتها من الشبكات الاجتماعية عبر الإنترنت و21 استخدمت طرق جمع البيانات اليدوية. تشير النتائج إلى أن التعلم الآلي يمكن أن يساعد بشكل فعال في تشخيص حالات الصحة النفسية، على الرغم من الحاجة إلى مزيد من البحث لتحسين طرق العينة، وتنقيح خوارزميات التنبؤ، ومعالجة القضايا الأخلاقية المتعلقة باستخدام البيانات الحساسة.

تسلط المراجعة الضوء على تبادل ملحوظ بين قابلية تفسير النموذج ودقة التنبؤ، حيث تعمل النماذج الأبسط مثل الانحدار اللوجستي غالبًا كخطوط أساسية، بينما تحقق الهياكل الأكثر تعقيدًا في التعلم العميق أداءً متفوقًا عادةً. يدعو المؤلفون إلى استمرار البحث والتعاون مع المتخصصين في الصحة النفسية لتحسين تطبيق التعلم الآلي في هذا المجال الحرج، مقترحين أن دمج تقنيات معالجة الصور يمكن أن يثري المجال أكثر. بشكل عام، تؤكد الدراسة على وعد التعلم الآلي في تحسين الكشف عن الصحة النفسية والتشخيص، مع الاعتراف أيضًا بالتحديات والقيود الموجودة في المنهجيات الحالية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الانتشار الكبير لقضايا الصحة النفسية، مشيرة إلى أن حوالي 20% من الأمريكيين و12.5% من السكان العالميين يتأثرون بمختلف الاضطرابات النفسية، بما في ذلك القلق والاكتئاب واضطراب ما بعد الصدمة. على الرغم من وجود “خطة العمل الشاملة للصحة النفسية 2013-2023” من منظمة الصحة العالمية، التي تهدف إلى تحسين خدمات الصحة النفسية وتقليل الوصمة، لا تزال إمكانية الوصول إلى العلاج تمثل تحديًا، لا سيما في البلدان ذات الدخل المنخفض. غالبًا ما تكون طرق التقييم التقليدية للصحة النفسية غير فعالة ومتحيزة، مما يدفع نحو التحول إلى تقنيات التعلم الآلي (ML) للحصول على كشف أكثر دقة ودعم العلاج.

تؤكد الورقة على التقدم في التعلم الآلي، لا سيما في استخراج النصوص وتحليل المشاعر، مما حسّن القدرة على تقييم حالات الصحة النفسية باستخدام بيانات من مصادر متنوعة، بما في ذلك وسائل التواصل الاجتماعي. قام المؤلفون بإجراء مراجعة شاملة للأدبيات الحديثة حول تطبيقات التعلم الآلي في التنبؤ بحالات الصحة النفسية، حيث تم تحليل 14 مقالة استخدمت بيانات من منصات مثل تويتر وفيسبوك. تحدد الدراسة مختلف قضايا الصحة النفسية التي يمكن التنبؤ بها من خلال التعلم الآلي، وتناقش تطور تقنيات التقييم، وتقيّم نقاط قوتها وضعفها، مما يوفر في النهاية توصيات للبحث المستقبلي في هذا المجال. يتم تفصيل المنهجية والنتائج والاستنتاجات الخاصة بالدراسة في الأقسام التالية.

الطرق

تركز منهجية هذه الدراسة على تقييم فعالية تقنيات التعلم الآلي في التعرف على قضايا الصحة النفسية. تهدف إلى تحديد كل من المزايا والتحديات المحتملة المرتبطة بهذه التقنيات بشكل منهجي. وفقًا لإرشادات PRISMA (عناصر التقرير المفضلة للمراجعات المنهجية والتحليلات التلوية)، ستلتزم الدراسة بإطار عمل منظم، كما هو موضح في الشكل 1، لضمان توثيق شامل وتنظيم لنهجها المنهجي. ستسهل هذه المراجعة المنهجية فهمًا شاملاً للمشهد الحالي لتطبيقات التعلم الآلي في التعرف على الصحة النفسية.

النتائج

في هذه المراجعة الأدبية المنهجية، تم جمع ما مجموعه 3,440 ورقة نشرت بين عامي 2015 و2024 في البداية، تم اختيار 35 ورقة في النهاية بناءً على معايير إدراج محددة. شمل عملية الاختيار فحص الأوراق من خلال ملخصاتها وعناوينها لتحديد الأكثر صلة بموضوع المراجعة.

تقدم المقالات المختارة، الملخصة في الجدول 2، رؤى حول أنواع مختلفة من قضايا الصحة النفسية التي تم تحديدها باستخدام تقنيات التعلم الآلي. بالإضافة إلى ذلك، يتم تفصيل معلومات مجموعة البيانات ومعايير الإدراج والاستبعاد في الجدول 1، مما يبرز الصرامة المنهجية لعملية المراجعة.

المناقشة

في هذا القسم، تناقش البحث المنهجيات والنتائج لمراجعة الأدبيات التي تركز على تحديد حالات الصحة النفسية من خلال البيانات المستمدة من وسائل التواصل الاجتماعي والمنصات عبر الإنترنت. استخدمت الدراسة معالجة اللغة الطبيعية وتقنيات التعلم الآلي لتحليل مجموعة بيانات كبيرة مستمدة من مكتبات متنوعة، بما في ذلك IEEE Xplore وScopus وGoogle Scholar. أسفرت عمليات البحث الأولية عن 3,440 دراسة، والتي تم تضييقها إلى 354 بعد تطبيق معايير الإدراج والاستبعاد، مما أدى في النهاية إلى اختيار 35 مقالة للتحليل الشامل. درست الدراسات بشكل أساسي قضايا الصحة النفسية مثل الاكتئاب والقلق والتوتر، مع تركيز ملحوظ على البيانات المجمعة من منصات وسائل التواصل الاجتماعي مثل تويتر ورديت.

تسلط المراجعة الضوء على المنهجيات المتنوعة المستخدمة عبر الدراسات المختارة، بما في ذلك خوارزميات التعلم الآلي المختلفة مثل الانحدار اللوجستي، وآلات الدعم الناقل (SVM)، وتقنيات التعلم العميق مثل الشبكات العصبية التلافيفية (CNN) وشبكات الذاكرة طويلة وقصيرة المدى (LSTM). تشير النتائج إلى أن هذه الخوارزميات يمكن أن تصنف بفعالية حالات الصحة النفسية المختلفة بناءً على المحتوى الذي ينشئه المستخدم، مع اختلاف الأداء وفقًا لمجموعة البيانات والخوارزمية المستخدمة. من الجدير بالذكر أن الدراسات أظهرت أن نماذج التعلم العميق، لا سيما تلك التي تستخدم نماذج مدربة مسبقًا مثل BERT، أظهرت أداءً قويًا في تحليل النصوص، بينما تفوقت النماذج التقليدية في مجموعات البيانات الأبسط والمهيكلة. بشكل عام، تؤكد المراجعة على إمكانيات الاستفادة من بيانات وسائل التواصل الاجتماعي وتقنيات التعلم الآلي المتقدمة لتقييم الصحة النفسية والتدخل.

القيود

تسلط قسم القيود الضوء على عدة قضايا منهجية وأنماط خطر التحيز السائدة في الدراسات المعتمدة على وسائل التواصل الاجتماعي التي تقيم الصحة النفسية. تعتبر الاعتماد على التشخيصات المبلغ عنها ذاتيًا واستراتيجيات التسمية الضعيفة مصدر قلق كبير، حيث يمكن أن يؤدي ذلك إلى إدخال ضوضاء وتصنيف خاطئ في البيانات. تستخدم العديد من الدراسات أحجام عينات متواضعة مع توزيعات فئات غير متوازنة، وغالبًا ما تبلغ عن مقاييس الأداء فقط من حيث الدقة العامة أو المنطقة تحت المنحنى (AUC)، مما قد يبالغ في تقدير فعالية النموذج على الفئات الأقل. علاوة على ذلك، فإن عدم وجود تحقق خارجي أو زمني يثير تساؤلات حول قابلية تعميم النتائج، حيث تعتمد معظم الدراسات على التحقق الداخلي. يحد التركيز الديموغرافي الضيق للعديد من مجموعات البيانات من التمثيل وقد يحجب الفروق في الأداء عبر مجموعات فرعية مختلفة.

بالإضافة إلى ذلك، تؤكد البحث على أهمية دمج الأنشطة غير المتصلة بالإنترنت وأنواع البيانات المتنوعة، مثل الصور والصوت، لتعزيز دقة التنبؤ. تشكل وجود حسابات مزيفة على منصات مثل ردیت وتويتر وفيسبوك تحديات لصدق البيانات، مما يمكن أن يؤدي إلى نتائج متحيزة. استخدمت الدراسات بشكل أساسي بيانات نصية، مما قد يقيد قدراتها التنبؤية. يجب أن تعطي الأبحاث المستقبلية الأولوية لإجراءات التسمية القوية، وتوظيف مقاييس حساسة للفئات مثل F1-score والاسترجاع، وضمان التحقق الخارجي عبر مجموعات سكانية متنوعة. تعتبر الاعتبارات الأخلاقية المتعلقة بالخصوصية وأمان البيانات أيضًا ذات أهمية قصوى، مما يتطلب التعامل بعناية مع المعلومات الشخصية والسجلات الصحية طوال عملية البحث.

Journal: Frontiers in Digital Health, Volume: 7
DOI: https://doi.org/10.3389/fdgth.2025.1724348
PMID: https://pubmed.ncbi.nlm.nih.gov/41550350
Publication Date: 2026-01-02
Author(s): Md Jawadul Hasan et al.
Primary Topic: Mental Health via Writing

Overview

The systematic review presented in this paper addresses the increasing prevalence of mental health issues and the potential of machine learning (ML) as a diagnostic tool. The authors conducted a comprehensive search across major databases from January 2015 to December 2024, identifying 3,320 articles, of which 35 met the inclusion criteria. These studies utilized a variety of ML techniques, including both supervised and unsupervised methods, with 14 studies sourcing data from online social networks and 21 employing manual data collection methods. The findings indicate that ML can effectively assist in diagnosing mental health conditions, although further research is necessary to enhance sampling methods, refine prediction algorithms, and address ethical concerns regarding sensitive data usage.

The review highlights a notable trade-off between model interpretability and predictive accuracy, with simpler models like logistic regression often serving as baselines, while more complex deep learning architectures typically achieve superior performance. The authors advocate for continued research and collaboration with mental health professionals to optimize the application of ML in this critical area, suggesting that incorporating image processing techniques could further enrich the field. Overall, the study underscores the promise of ML in improving mental health detection and diagnosis, while also acknowledging the challenges and limitations inherent in current methodologies.

Introduction

The introduction of this research paper highlights the significant prevalence of mental health issues, noting that approximately 20% of Americans and 12.5% of the global population are affected by various mental disorders, including anxiety, depression, and PTSD. Despite the existence of the “Comprehensive Mental Health Action Plan 2013-2023” by the WHO, which aims to improve mental health services and reduce stigma, treatment accessibility remains a challenge, particularly in low-income countries. Traditional assessment methods for mental health are often inefficient and biased, prompting a shift towards machine learning (ML) techniques for more accurate detection and treatment support.

The paper emphasizes the advancements in ML, particularly in text mining and sentiment analysis, which have improved the ability to assess mental health states using data from diverse sources, including social media. The authors conducted a comprehensive review of recent literature on ML applications in predicting mental health conditions, analyzing 14 articles that utilized data from platforms like Twitter and Facebook. The study identifies various mental health issues that can be predicted through ML, discusses the evolution of assessment techniques, and evaluates their strengths and weaknesses, ultimately providing recommendations for future research in this domain. The methodology, results, and conclusions of the study are detailed in subsequent sections.

Methods

The methodology of this study focuses on evaluating the effectiveness of machine learning techniques in the recognition of mental health issues. It aims to systematically identify both the advantages and potential challenges associated with these techniques. Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, the study will adhere to a structured framework, as illustrated in Figure 1, to ensure comprehensive documentation and organization of its methodological approach. This systematic review will facilitate a thorough understanding of the current landscape of machine learning applications in mental health recognition.

Results

In this systematic literature review, a total of 3,440 papers published between 2015 and 2024 were initially collected, from which 35 papers were ultimately selected based on specific inclusion criteria. The selection process involved screening the papers through their abstracts and titles to identify those most relevant to the review topic.

The selected articles, summarized in Table 2, provide insights into various types of mental health issues identified using machine learning techniques. Additionally, the dataset information and the criteria for inclusion and exclusion are detailed in Table 1, highlighting the methodological rigor of the review process.

Discussion

In this section, the research discusses the methodologies and findings of a literature review focused on identifying mental health conditions through data sourced from social media and online platforms. The study utilized natural language processing and machine learning techniques to analyze a substantial dataset derived from various libraries, including IEEE Xplore, Scopus, and Google Scholar. Initial searches yielded 3,440 studies, which were narrowed down to 354 after applying inclusion and exclusion criteria, ultimately resulting in 35 articles selected for comprehensive analysis. The studies primarily examined mental health issues such as depression, anxiety, and stress, with a notable emphasis on data collected from social media platforms like Twitter and Reddit.

The review highlights the diverse methodologies employed across the selected studies, including various machine learning algorithms such as logistic regression, support vector machines (SVM), and deep learning techniques like convolutional neural networks (CNN) and long short-term memory (LSTM) networks. The findings indicate that these algorithms can effectively classify different mental health states based on user-generated content, with performance varying according to the dataset and algorithm used. Notably, studies demonstrated that deep learning models, particularly those utilizing pre-trained models like BERT, showed strong performance in text analysis, while traditional models excelled in simpler, structured datasets. Overall, the review underscores the potential of leveraging social media data and advanced machine learning techniques for mental health assessment and intervention.

Limitations

The section on limitations highlights several methodological issues and risk-of-bias patterns prevalent in social media-based studies assessing mental health. A significant concern is the reliance on self-reported diagnoses and weak labeling strategies, which can introduce noise and misclassification in the data. Many studies utilize modest sample sizes with imbalanced class distributions, often reporting performance metrics solely in terms of overall accuracy or area under the curve (AUC), potentially overestimating the model’s effectiveness on minority classes. Furthermore, the lack of external or temporal validation raises questions about the generalizability of the findings, as most studies depend on internal cross-validation. The narrow demographic focus of many datasets limits representativeness and may obscure performance disparities across different subgroups.

Additionally, the research emphasizes the importance of incorporating offline activities and diverse data types, such as images and audio, to enhance prediction accuracy. The presence of fake accounts on platforms like Reddit, Twitter, and Facebook poses challenges to data authenticity, which can lead to biased results. The studies predominantly utilized text data, potentially constraining their predictive capabilities. Future research should prioritize robust labeling procedures, employ class-sensitive metrics like F1-score and recall, and ensure external validation across varied populations. Ethical considerations regarding privacy and data security are also paramount, necessitating careful handling of personal information and health records throughout the research process.