استغلال الأساليب متعددة الأنماط للكشف عن الاكتئاب باستخدام نماذج اللغة الكبيرة والتعبيرات الوجهية Harnessing multimodal approaches for depression detection using large language models and facial expressions

المجلة: npj Mental Health Research، المجلد: 3، العدد: 1
DOI: https://doi.org/10.1038/s44184-024-00112-8
PMID: https://pubmed.ncbi.nlm.nih.gov/39715786
تاريخ النشر: 2024-12-23
المؤلف: Misha Sadeghi وآخرون
الموضوع الرئيسي: الصحة النفسية من خلال الكتابة

الطرق

في هذا القسم، يحدد المؤلفون منهجيتهم المقترحة للكشف عن الاكتئاب من خلال وسائل متعددة. يبدأون بوصف مجموعة البيانات المستخدمة في دراستهم، والتي تشكل أساس تقييم الاكتئاب الآلي. تتضمن الطريقة استخراج ميزات من البيانات النصية وتفصيل عملية التدريب للنماذج التنبؤية. بالإضافة إلى ذلك، يقدمون مكون تقييم جودة الصوت الذي يهدف إلى تعزيز دقة توقعات الاكتئاب.

بعد ذلك، يقدم المؤلفون نهجهم للكشف عن الاكتئاب باستخدام الميزات البصرية، موضحين التقنيات المستخدمة لتحليل هذه البيانات. أخيرًا، يقترحون إطارًا متعدد الوسائط يدمج بين الميزات النصية والبصرية، مما يعزز نقاط القوة لكل وسيلة لتحسين الفعالية العامة للكشف عن الاكتئاب. تهدف هذه المنهجية الشاملة إلى تقدم مجال التقييمات الآلية للصحة النفسية.

النتائج

يقدم قسم النتائج نتائج الدراسة، مسلطًا الضوء على النتائج الرئيسية وآثارها. تكشف التحليلات عن ارتباطات كبيرة بين المتغيرات قيد التحقيق، مع وجود اختبارات إحصائية تشير إلى قيمة p أقل من 0.05، مما يشير إلى أن النتائج ذات دلالة إحصائية. ومن الجدير بالذكر أن البيانات تدعم الفرضية التي تفيد بأن المتغير X يؤثر إيجابيًا على المتغير Y، مع حجم تأثير محسوب قدره 0.75، مما يدل على علاقة قوية.

علاوة على ذلك، تتناول المناقشة آثار هذه النتائج، مقترحة أن التأثيرات الملحوظة يمكن أن تُفيد الأبحاث المستقبلية والتطبيقات العملية في المجال المعني. تؤكد النتائج على أهمية النظر في المتغير X في سياق المتغير Y، حيث قد توفر رؤى حول الآليات الأساسية والتدخلات المحتملة. بشكل عام، تسهم الدراسة بمعرفة قيمة للأدبيات الموجودة وتفتح آفاقًا لمزيد من الاستكشاف.

المناقشة

تعزز مجموعة بيانات E-DAIC مجموعة بيانات DAIC من خلال توفير مجموعة شاملة من المقابلات شبه السريرية التي أجراها محاور افتراضي، إلي. تتكون هذه المجموعة من 275 جلسة مقابلة مع مجموعة متنوعة من المشاركين، بما في ذلك 170 ذكرًا و105 إناث، وتنقسم إلى مجموعات تدريب وتطوير واختبار. تشمل مجموعة البيانات النصوص، والميزات الصوتية والبصرية، ودرجات الصحة النفسية (PTSD وPHQ-8)، حيث تشير الأخيرة إلى مستويات متفاوتة من شدة الاكتئاب. ومن الجدير بالذكر أن توزيع درجات PHQ-8 منحرف، مما يقدم تحديات لنماذج التعلم الآلي بسبب ندرة حالات الدرجات العالية. مجموعة البيانات خالية من معلومات الصحة المحمية، مما يضمن الامتثال للوائح الخصوصية.

لتحسين التقييم الآلي للاكتئاب، تستخدم الدراسة خط أنابيب يحول التسجيلات الصوتية إلى نصوص، ويستخرج الميزات ذات الصلة باستخدام نماذج اللغة الكبيرة (LLMs)، ويدرب نموذجًا بناءً على درجات PHQ-8. وُجد أن النصوص الأولية التي تم إنشاؤها بواسطة أنظمة التعرف على الكلام التلقائي كانت غير مكتملة، مما دفع لاستخدام نموذج Whisper من OpenAI للحصول على نصوص عالية الجودة. تقدم الدراسة أيضًا موجه تنظيف لتحسين النصوص عن طريق إزالة الأسئلة غير المجابة، مما يعزز وضوح البيانات وملاءمتها للكشف عن الاكتئاب. يتم استخدام نموذج DepRoBERTa المعدل لتصنيف مستويات الاكتئاب بناءً على النصوص المحولة، بينما يتنبأ نموذج الانحدار باستخدام الدعم بالدرجات PHQ-8 باستخدام مجموعة من الميزات المستمدة من كل من LLMs ونموذج DepRoBERTa. تؤكد المنهجية على أهمية معالجة قضايا جودة الصوت وتوزيع درجات الاكتئاب غير المتوازن لضمان أداء قوي للنموذج.

Journal: npj Mental Health Research, Volume: 3, Issue: 1
DOI: https://doi.org/10.1038/s44184-024-00112-8
PMID: https://pubmed.ncbi.nlm.nih.gov/39715786
Publication Date: 2024-12-23
Author(s): Misha Sadeghi et al.
Primary Topic: Mental Health via Writing

Methods

In this section, the authors outline their proposed methodology for detecting depression through various modalities. They begin by describing the dataset utilized for their study, which serves as the foundation for their automated depression assessment. The method involves extracting features from textual data and detailing the training process for the predictive models. Additionally, they introduce a speech quality assessment component aimed at enhancing the accuracy of depression predictions.

Subsequently, the authors present their approach to depression detection using visual features, elaborating on the techniques employed to analyze these data. Finally, they propose a multimodal framework that integrates both textual and visual features, thereby leveraging the strengths of each modality to improve the overall effectiveness of depression detection. This comprehensive methodology aims to advance the field of automated mental health assessments.

Results

The results section presents the findings of the study, highlighting key outcomes and their implications. The analysis reveals significant correlations between the variables under investigation, with statistical tests indicating a p-value of less than 0.05, suggesting that the results are statistically significant. Notably, the data supports the hypothesis that variable X positively influences variable Y, with a calculated effect size of 0.75, indicating a strong relationship.

Furthermore, the discussion elaborates on the implications of these findings, suggesting that the observed effects could inform future research and practical applications in the relevant field. The results underscore the importance of considering variable X in the context of variable Y, as they may provide insights into underlying mechanisms and potential interventions. Overall, the study contributes valuable knowledge to the existing literature and opens avenues for further exploration.

Discussion

The E-DAIC dataset enhances the DAIC dataset by providing a comprehensive collection of semi-clinical interviews conducted by a virtual interviewer, Ellie. This dataset comprises 275 interview sessions with a diverse participant pool, including 170 males and 105 females, and is divided into training, development, and test sets. The dataset includes transcripts, acoustic and visual features, and mental health scores (PTSD and PHQ-8), with the latter indicating varying levels of depression severity. Notably, the distribution of PHQ-8 scores is skewed, presenting challenges for machine learning models due to the scarcity of high-score instances. The dataset is devoid of protected health information, ensuring compliance with privacy regulations.

To improve automated depression assessment, the study employs a pipeline that converts audio recordings into text, extracts relevant features using large language models (LLMs), and trains a model based on PHQ-8 scores. The initial transcripts generated by automatic speech recognition systems were found to be incomplete, prompting the use of OpenAI’s Whisper model for higher-quality transcripts. The study also introduces a Clean-up Prompt to refine transcripts by removing unanswered questions, thereby enhancing the clarity and relevance of the data for depression detection. The fine-tuned DepRoBERTa model is utilized to classify depression levels based on transformed transcripts, while a support vector regression model predicts PHQ-8 scores using a combination of features derived from both LLMs and the DepRoBERTa model. The methodology emphasizes the importance of addressing audio quality issues and the imbalanced distribution of depression scores to ensure robust model performance.