نموذج تنبؤي قابل للتفسير لفكرة الانتحار خلال COVID-19: دراسة خطاب وسائل التواصل الاجتماعي Explainable Predictive Model for Suicidal Ideation During COVID-19: Social Media Discourse Study

المجلة: Journal of Medical Internet Research، المجلد: 27
DOI: https://doi.org/10.2196/65434
PMID: https://pubmed.ncbi.nlm.nih.gov/39823631
تاريخ النشر: 2025-01-17
المؤلف: Salah Bouktif وآخرون
الموضوع الرئيسي: الصحة النفسية من خلال الكتابة

نظرة عامة

تبحث ورقة البحث في تأثير COVID-19 على الصحة النفسية، مع التركيز بشكل خاص على الأفكار الانتحارية كما تم التعبير عنها على منصات وسائل التواصل الاجتماعي. تستخدم الدراسة تقنيات متقدمة في معالجة اللغة الطبيعية (NLP) لتحليل المحتوى النصي من منشورات Reddit، بهدف تصنيف المنشورات على أنها انتحارية أو غير انتحارية وتحديد العوامل المساهمة في الأفكار الانتحارية. تم تطوير نموذج تعلم عميق هجين يجمع بين تمثيلات الترميز ثنائية الاتجاه من المحولات (BERT)، والشبكات العصبية التلافيفية (CNN)، وشبكات الذاكرة طويلة وقصيرة المدى (LSTM)، محققًا دقة بنسبة 94%، واسترجاع بنسبة 95%، ودرجة F1 بنسبة 94%، ودقة بنسبة 93.65% من مجموعة بيانات تضم 348,110 سجلات.

تشير النتائج إلى تحول كبير في أنماط اللغة المرتبطة بالأفكار الانتحارية خلال الجائحة، كما تم الكشف عنه من خلال تقنيات الذكاء الاصطناعي القابل للتفسير (XAI) مثل التفسيرات القابلة للتفسير المحلية (LIME) والتفسيرات المضافة لشابلي (SHAP). سلطت هذه التقنيات الضوء على ظهور ميزات جديدة مرتبطة بالميول الانتحارية خلال COVID-19. تؤكد الدراسة على ضرورة استمرار البحث واستراتيجيات التدخل لمعالجة أزمة الصحة النفسية التي تفاقمت بسبب الجائحة، مع التأكيد على إمكانية النموذج في إبلاغ صانعي القرار بالعوامل المؤثرة على خطر الانتحار وتعزيز فعالية التدخلات.

مقدمة

تناقش مقدمة ورقة البحث المشهد المتطور لتقييم الصحة النفسية، خاصة فيما يتعلق بالاكتئاب والأفكار الانتحارية. اعتمدت الدراسات التقليدية على الاختبارات النفسية والتقييمات السريرية، بينما تحولت الدراسات الحديثة إلى وسائل التواصل الاجتماعي كمنصة لفهم أنماط اللغة المرتبطة بالأفكار الانتحارية. تهدف هذه التحولات إلى التخفيف من الوصمة المحيطة بقضايا الصحة النفسية، مما يسمح للأفراد بالتعبير عن مشاعرهم بحرية أكبر عبر الإنترنت. تسلط الورقة الضوء على أهمية الاستبيانات في تقييم الحالات النفسية مع الإشارة إلى أن تحليل وسائل التواصل الاجتماعي أصبح أداة قيمة للتنبؤ بالاكتئاب وفحص سلوكيات الأفراد الذين قد يكونون انتحاريين.

مكنت التقدمات في معالجة اللغة الطبيعية (NLP) وتعلم الآلة من أتمتة التنبؤ بالمحتوى الانتحاري من منشورات وسائل التواصل الاجتماعي. تستخدم الدراسة تقنيات مختلفة لاختيار الميزات والتعلم العميق، بما في ذلك تكرار المصطلحات – تكرار الوثائق العكسية (TF-IDF)، وWord2vec، وتمثيلات الترميز ثنائية الاتجاه من المحولات (BERT)، والشبكات العصبية التلافيفية (CNNs)، وشبكات الذاكرة طويلة وقصيرة المدى (LSTM). تم تصميم هذه الأساليب لالتقاط الطبيعة الديناميكية للأفكار الانتحارية المعبر عنها في مقاطع نصية أطول. توضح الورقة الأسس النظرية لهذه الأساليب، مع التأكيد على فعاليتها في تحديد الأنماط المعقدة والمعلومات السياقية ضمن البيانات النصية، والتي يتم توضيحها بشكل أكبر في الملاحق متعددة الوسائط المرفقة.

الطرق

تحدد قسم “الطرق” في ورقة البحث التصميم التجريبي والتقنيات التحليلية المستخدمة للتحقيق في سؤال البحث. استخدمت الدراسة نهجًا كميًا، يتضمن تحليلات إحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لمراقبة تأثيراتها على النتائج ذات الصلة.

شمل جمع البيانات استخدام أدوات وبروتوكولات موحدة لضمان الموثوقية والصلاحية. تم إجراء التحليل باستخدام أدوات برمجية قادرة على إجراء اختبارات إحصائية معقدة، بما في ذلك تحليل الانحدار واختبار الفرضيات، لاستخلاص استنتاجات ذات مغزى من البيانات. يبرز القسم أهمية القابلية للتكرار والشفافية في عملية البحث، موضحًا الخطوات المتخذة للتخفيف من التحيزات وضمان نتائج قوية.

النتائج

تكشف نتائج هذه الدراسة عن تغييرات كبيرة في أنماط اللغة المرتبطة بالأفكار الانتحارية قبل وخلال جائحة COVID-19. أظهرت التحليلات المقارنة أن الأفراد استخدموا مجموعات مفردات مميزة قبل الجائحة، مع حدوث تحول ملحوظ خلال فترة الجائحة. كانت الدراسة تهدف إلى اختبار الفرضية الصفرية (H₀) التي تفيد بأن COVID-19 لم يؤثر على مستويات الأفكار الانتحارية. ومع ذلك، أظهرت النتائج زيادة كبيرة في الأفكار الانتحارية المرتبطة بالجائحة، مع قيمة P أقل من 0.01، مما أدى إلى رفض H₀ لصالح الفرضية البديلة (H₁)، التي تؤكد أن COVID-19 قد أثر بالفعل على الأفكار الانتحارية.

تؤكد هذه النتائج على الحاجة الملحة للتدخلات في الصحة النفسية استجابة للأزمات الصحية العامة، موضحة العلاقة الحرجة بين الأحداث العالمية والرفاه النفسي الفردي. تسلط الدراسة الضوء على أهمية مراقبة استخدام اللغة واتجاهات الصحة النفسية خلال مثل هذه الأوقات غير المسبوقة لإبلاغ استراتيجيات الدعم الفعالة.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقاطع الحرج بين الصحة النفسية والتكنولوجيا، خاصة في سياق جائحة COVID-19. تؤكد على الإحصائيات المقلقة المحيطة بالانتحار، مشيرة إلى أن حوالي 800,000 فرد يموتون بالانتحار سنويًا، مع انتشار كبير بين الشباب، وخاصة النساء. تؤكد الورقة على أهمية التعرف على الأفكار الانتحارية كعملية يمكن أن تتصاعد من الأفكار إلى المحاولات، وتناقش دور الوصمة الاجتماعية في عرقلة الأفراد عن طلب المساعدة. تهدف الدراسة إلى الاستفادة من تعلم الآلة وخوارزميات التعلم العميق لتحليل بيانات وسائل التواصل الاجتماعي، خاصة من Reddit، لتحديد الأنماط والمؤشرات للأفكار الانتحارية التي تفاقمت بسبب الجائحة.

تستخدم البحث منهجية شاملة تشمل جمع البيانات، وهندسة الميزات، وتطبيق مصنفات متقدمة مثل BERT وCNN وLSTM. تكشف النتائج أن نموذج BERT+CNN+LSTM المقترح يتفوق بشكل كبير على الأساليب الحالية، محققًا دقة تصل إلى 95% ودقة إجمالية تبلغ 93.65%. تسلط الدراسة أيضًا الضوء على فائدة الذكاء الاصطناعي القابل للتفسير (XAI) في تعزيز شفافية التقييمات النفسية المدفوعة بالذكاء الاصطناعي، مما يعزز الثقة بين الأطباء والمرضى. من خلال تحديد المصطلحات الرئيسية المرتبطة بالأفكار الانتحارية خلال COVID-19، تسهم الدراسة في تقديم رؤى قيمة لتطوير تدخلات واستراتيجيات دعم مستهدفة، بهدف تحسين نتائج الصحة النفسية في مواجهة التحديات الاجتماعية المستمرة.

القيود

في قسم القيود من ورقة البحث، يناقش المؤلفون نقاط القوة والضعف في ثلاثة نماذج مختلفة—CNN وLSTM وBERT—في تحليل منشورات وسائل التواصل الاجتماعي بحثًا عن مؤشرات الأفكار الانتحارية. كانت CNN فعالة في تحديد الأنماط المحلية، مثل اللغة العاطفية، لكنها واجهت صعوبة في الفهم السياقي، مما أدى إلى تصنيفات خاطئة عندما كان المعنى يعتمد على تسلسل الكلمات. على سبيل المثال، صنفت “أحتاج فقط إلى استراحة” على أنها غير انتحارية. من ناحية أخرى، تفوقت LSTM في معالجة البيانات التسلسلية، حيث حددت المشاعر المتطورة بشكل صحيح، لكنها واجهت مخاطر الإفراط في التكيف، خاصة مع بيانات تدريب محدودة أو غير متنوعة، مما أدى إلى نتائج سلبية كاذبة مثل تصنيف “لا أستطيع التعامل مع هذا بعد الآن، أنا فقط متعب من القتال” بشكل خاطئ.

سمح المعالجة الثنائية الاتجاه لـ BERT بفهم أكثر شمولاً للسياق، لكنه تطلب مجموعات بيانات كبيرة لتحقيق الأداء الأمثل. لمعالجة قيود كل نموذج، طور المؤلفون نموذجًا هجينًا يدمج CNN وLSTM وBERT. سمح هذا الجمع للنموذج بالاستفادة من نقاط القوة في كل بنية—استخراج الميزات المحلية من CNN، والفهم السياقي من LSTM، والسياق الشامل للكلمات من BERT—مما أدى إلى دقة إجمالية تبلغ 93.65%. ساهم النهج الهجين بشكل فعال في التخفيف من نقاط الضعف الفردية وزيادة قدرة النموذج على التقاط الأنماط اللغوية المعقدة المرتبطة بالسلوك الانتحاري.

Journal: Journal of Medical Internet Research, Volume: 27
DOI: https://doi.org/10.2196/65434
PMID: https://pubmed.ncbi.nlm.nih.gov/39823631
Publication Date: 2025-01-17
Author(s): Salah Bouktif et al.
Primary Topic: Mental Health via Writing

Overview

The research paper investigates the impact of COVID-19 on mental health, particularly focusing on suicidal ideation as expressed on social media platforms. The study employs advanced natural language processing (NLP) techniques to analyze textual content from Reddit posts, aiming to classify posts as suicidal or nonsuicidal and identify contributing factors to suicidal thoughts. A hybrid deep learning model combining Bidirectional Encoder Representations from Transformers (BERT), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks was developed, achieving a precision of 94%, recall of 95%, F1-score of 94%, and an accuracy of 93.65% from a dataset of 348,110 records.

The findings indicate a significant shift in language patterns associated with suicidal ideation during the pandemic, as revealed through Explainable Artificial Intelligence (XAI) techniques like Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP). These techniques highlighted the emergence of new features linked to suicidal tendencies during COVID-19. The study underscores the necessity for ongoing research and intervention strategies to address the mental health crisis exacerbated by the pandemic, emphasizing the model’s potential to inform decision-makers about the factors influencing suicide risk and enhance intervention effectiveness.

Introduction

The introduction of the research paper discusses the evolving landscape of mental health assessment, particularly in relation to depression and suicidal ideation. Traditionally reliant on psychological tests and clinical evaluations, recent studies have shifted focus to social media as a platform for understanding language patterns associated with suicidal thoughts. This shift aims to mitigate the stigma surrounding mental health issues, allowing individuals to express their feelings more freely online. The paper highlights the importance of questionnaires in assessing mental states while noting that social media analysis has become a valuable tool for predicting depression and examining the behaviors of potentially suicidal individuals.

Advancements in natural language processing (NLP) and machine learning have enabled the automation of predicting suicidal content from social media posts. The research employs various feature selection and deep learning techniques, including term frequency-inverse document frequency (TF-IDF), Word2vec, Bidirectional Encoder Representations from Transformers (BERT), convolutional neural networks (CNNs), and long short-term memory (LSTM) networks. These methods are designed to capture the dynamic nature of suicidal ideation expressed in longer text segments. The paper outlines the theoretical foundations of these approaches, emphasizing their effectiveness in identifying complex patterns and contextual information within textual data, which is further elaborated in the accompanying multimedia appendices.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research question. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved the use of standardized instruments and protocols to ensure reliability and validity. The analysis was conducted using software tools capable of performing complex statistical tests, including regression analysis and hypothesis testing, to draw meaningful conclusions from the data. The section emphasizes the importance of replicability and transparency in the research process, detailing the steps taken to mitigate biases and ensure robust findings.

Results

The results of this study reveal significant alterations in language patterns associated with suicidal ideation before and during the COVID-19 pandemic. A comparative analysis indicated that individuals employed distinct vocabulary sets prior to the pandemic, with a notable shift occurring during the pandemic period. The research aimed to test the null hypothesis (H₀) that COVID-19 did not affect levels of suicidal thoughts. However, the findings demonstrated a significant increase in suicidal ideation linked to the pandemic, with a P value of < 0.01, leading to the rejection of H₀ in favor of the alternative hypothesis (H₁), which asserts that COVID-19 has indeed impacted suicidal thoughts. These results emphasize the pressing need for mental health interventions in response to public health crises, illustrating the critical relationship between global events and individual psychological well-being. The study highlights the importance of monitoring language use and mental health trends during such unprecedented times to inform effective support strategies.

Discussion

The discussion section of the research paper highlights the critical intersection of mental health and technology, particularly in the context of the COVID-19 pandemic. It underscores the alarming statistics surrounding suicide, noting that approximately 800,000 individuals die by suicide annually, with a significant prevalence among young people, especially women. The paper emphasizes the importance of recognizing suicidal ideation as a process that can escalate from thoughts to attempts, and it discusses the role of social stigma in hindering individuals from seeking help. The study aims to leverage machine learning and deep learning algorithms to analyze social media data, particularly from Reddit, to identify patterns and indicators of suicidal ideation exacerbated by the pandemic.

The research employs a comprehensive methodology that includes data collection, feature engineering, and the application of advanced classifiers such as BERT, CNN, and LSTM. The findings reveal that the proposed BERT+CNN+LSTM model significantly outperforms existing methods, achieving a precision of up to 95% and an overall accuracy of 93.65%. The study also highlights the utility of Explainable Artificial Intelligence (XAI) in enhancing the transparency of AI-driven mental health assessments, fostering trust among clinicians and patients. By identifying key terms associated with suicidal ideation during COVID-19, the research contributes valuable insights for developing targeted interventions and support strategies, ultimately aiming to improve mental health outcomes in the face of ongoing societal challenges.

Limitations

In the limitations section of the research paper, the authors discuss the strengths and weaknesses of three different models—CNN, LSTM, and BERT—in analyzing social media posts for indicators of suicidal ideation. The CNN was effective in identifying localized patterns, such as emotional language, but struggled with contextual understanding, leading to misclassifications when the meaning depended on word sequence. For instance, it misclassified “I just need a break” as nonsuicidal. Conversely, the LSTM excelled in processing sequential data, correctly identifying evolving sentiments, but faced risks of overfitting, particularly with limited or non-diverse training data, resulting in false negatives like misclassifying “I can’t handle this anymore, I’m just tired of fighting.”

BERT’s bidirectional processing allowed for a more comprehensive understanding of context, yet it required large datasets for optimal performance. To address the limitations of each model, the authors developed a hybrid model that integrated CNN, LSTM, and BERT. This combination allowed the model to leverage the strengths of each architecture—localized feature extraction from CNN, contextual understanding from LSTM, and comprehensive word context from BERT—resulting in an overall accuracy of 93.65%. The hybrid approach effectively mitigated individual weaknesses and enhanced the model’s ability to capture the complex language patterns associated with suicidal behavior.