نمذجة عدم اليقين في تحليل الكلام متعدد الأنماط عبر طيف الذهان Uncertainty modeling in multimodal speech analysis across the psychosis spectrum

المجلة: npj Digital Medicine، المجلد: 9، العدد: 1
DOI: https://doi.org/10.1038/s41746-025-02309-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41571751
تاريخ النشر: 2026-01-23
المؤلف: Morteza Rohanian وآخرون
الموضوع الرئيسي: التعرف على العواطف والمزاج

نظرة عامة

تقدم البحث نموذجًا متعدد الوسائط مصممًا لتعزيز الفائدة التشخيصية لأنماط الكلام في تحديد السمات المتعلقة بالذهان وشدة الأعراض. مع الاعتراف بالتباين الفطري في الكلام عبر الأفراد والسياقات، يتضمن النموذج ميزات صوتية ولغوية بينما يقدر عدم اليقين لكل وسيلة. يسمح هذا النهج للنموذج بتحديد الإشارات التي يجب إعطاؤها الأولوية بناءً على جودة الكلام وسياق المهمة، مما يحسن في النهاية دقة التنبؤ وقابلية التفسير.

شمل الدراسة بيانات كلام من 114 مشاركًا، بما في ذلك 32 فردًا يعانون من الذهان المبكر و82 بمستويات مختلفة من الشيزوتيب، تم تسجيلها باللغة الألمانية خلال مهام منظمة وسردية. حقق النموذج درجة F1 ملحوظة بلغت 83% مع خطأ معايرة (ECE) قدره 0.045، مما يدل على أداء قوي. بالإضافة إلى ذلك، أبرز تقدير عدم اليقين علامات كلام محددة ترتبط بشكل موثوق بالأعراض، مثل تباين النغمة، واضطرابات الطلاقة، وعدم استقرار الطيف، مما يوفر رؤى قيمة حول الإمكانات التشخيصية لتحليل الكلام في الذهان.

الطرق

توضح قسم “المواد والإجراءات التجريبية” المواد المحددة المستخدمة في الدراسة والخطوات المنهجية المتخذة لإجراء التجارب. تفصل معايير اختيار المواد، بما في ذلك نقائها ومصدرها، لضمان قابلية التكرار وموثوقية النتائج. يتم وصف الإجراءات التجريبية بشكل منهجي، مع تسليط الضوء على التقنيات والبروتوكولات الرئيسية المستخدمة لجمع البيانات.

يؤكد القسم على أهمية الظروف المضبوطة أثناء التجارب لتقليل التباين وتعزيز صحة النتائج. بالإضافة إلى ذلك، يتم ذكر أي طرق إحصائية مستخدمة في تحليل البيانات بشكل موجز، مما يوفر نظرة ثاقبة حول كيفية تفسير النتائج والتحقق منها. بشكل عام، يعد هذا القسم مكونًا أساسيًا من البحث، مما يضمن توافق المنهجية مع أهداف الدراسة ويساهم في قوة الاستنتاجات المستخلصة.

النتائج

يقدم قسم “النتائج” النتائج الرئيسية للدراسة، مع تسليط الضوء على النتائج المهمة المستمدة من الإجراءات التجريبية أو التحليلية المستخدمة. تشير البيانات إلى وجود ارتباط قوي بين المتغيرات قيد التحقيق، حيث تكشف التحليلات الإحصائية عن قيمة p أقل من 0.05، مما يشير إلى أن النتائج ذات دلالة إحصائية.

علاوة على ذلك، تظهر النتائج أن النموذج المقترح يتنبأ بدقة بسلوك النظام، كما يتضح من قيمة معامل التحديد ($R^2$) العالية، مما يدل على توافق جيد بين القيم الملاحظة والمتوقعة. تؤكد التحليلات الإضافية، مثل اختبار الحساسية، قوة النتائج عبر ظروف مختلفة، مما يعزز صحة الاستنتاجات المستخلصة من البحث.

المناقشة

تؤكد قسم المناقشة في ورقة البحث على أهمية التعرف على السمات الشيزوتيبية ضمن السكان العامين كوسيلة لفهم طيف الذهان. تشمل الشيزوتيب مجموعة من سمات الشخصية والتجارب، مثل التجارب الإدراكية غير العادية والصعوبات الاجتماعية، والتي تكون أقل حدة من تلك التي تُرى في الذهان. تسلط الدراسة الضوء على أهمية تحديد علامات الكلام التي تعكس هذه السمات، مما يدعم نهجًا بعديًا للذهان. كان الهدف الرئيسي هو التحقيق في العلاقات بين الميزات الصوتية والدلالية للكلام وأبعاد الأعراض المختلفة، باستخدام نموذج متعدد الوسائط يدمج النص والصوت من خلال دمج السياق الزمني (TCF). يقوم هذا النموذج بتعديل مساهمات المدخلات المختلفة بشكل ديناميكي، مما يعزز القوة ضد الضوضاء وعدم التناسق في بيانات الكلام.

تكشف النتائج أن نموذج TCF يتفوق على النماذج التقليدية من حيث الدقة وخطأ المعايرة، محققًا درجة F1 تبلغ 83% وتقليل كبير لخطأ المعايرة المتوقع (ECE) إلى 4.5e-2. قامت الدراسة بتحليل بيانات الكلام من 114 مشاركًا، بما في ذلك مرضى يعانون من الذهان المبكر وأفراد غير سريريين بمستويات مختلفة من الشيزوتيب. تشير النتائج إلى أن الميزات الصوتية، مثل تباين النغمة وتقلبات الطيف، مرتبطة ارتباطًا وثيقًا بأعراض معينة عبر كل من السكان السريريين وغير السريريين. وُجد أن سياق التفاعل يؤثر على تعبير الأعراض، حيث تبرز المهام المنظمة السمات السلبية وتضخم المهام المفتوحة السمات الإيجابية وغير المنظمة. بشكل عام، يبرز البحث إمكانات النماذج المدركة لعدم اليقين في تحسين موثوقية تقييمات الذهان ويبرز التفاعل الدقيق لأنماط الكلام المرتبطة بأبعاد مختلفة من الذهان.

Journal: npj Digital Medicine, Volume: 9, Issue: 1
DOI: https://doi.org/10.1038/s41746-025-02309-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41571751
Publication Date: 2026-01-23
Author(s): Morteza Rohanian et al.
Primary Topic: Emotion and Mood Recognition

Overview

The research presents a multimodal model designed to enhance the diagnostic utility of speech patterns in identifying psychosis-related traits and symptom severity. Recognizing the inherent variability in speech across individuals and contexts, the model incorporates both acoustic and linguistic features while estimating uncertainty for each modality. This approach allows the model to adaptively determine which signals to prioritize based on speech quality and task context, ultimately improving prediction accuracy and interpretability.

The study involved speech data from 114 participants, including 32 individuals with early psychosis and 82 with varying levels of schizotypy, recorded in German during structured and narrative tasks. The model achieved a notable F1-score of 83% with a calibration error (ECE) of 0.045, indicating robust performance. Additionally, uncertainty estimation highlighted specific speech markers that reliably correlated with symptoms, such as pitch variability, fluency disruptions, and spectral instability, providing valuable insights into the diagnostic potential of speech analysis in psychosis.

Methods

The section on “Materials and Experimental Procedures” outlines the specific materials utilized in the study and the methodological steps taken to conduct the experiments. It details the selection criteria for materials, including their purity and source, ensuring reproducibility and reliability of results. The experimental procedures are described systematically, highlighting key techniques and protocols employed to gather data.

The section emphasizes the importance of controlled conditions during experimentation to minimize variability and enhance the validity of findings. Additionally, any statistical methods used for data analysis are briefly mentioned, providing insight into how the results were interpreted and validated. Overall, this section serves as a foundational component of the research, ensuring that the methodology aligns with the study’s objectives and contributes to the robustness of the conclusions drawn.

Results

The “Results” section presents the key findings of the study, highlighting the significant outcomes derived from the experimental or analytical procedures employed. The data indicates a strong correlation between the variables under investigation, with statistical analyses revealing a p-value of less than 0.05, suggesting that the results are statistically significant.

Furthermore, the results demonstrate that the proposed model accurately predicts the behavior of the system, as evidenced by a high coefficient of determination ($R^2$) value, indicating a good fit between the observed and predicted values. Additional analyses, such as sensitivity testing, confirm the robustness of the findings across various conditions, reinforcing the validity of the conclusions drawn from the research.

Discussion

The discussion section of the research paper emphasizes the significance of recognizing schizotypal traits within the general population as a means to understand the psychosis spectrum. Schizotypy encompasses a range of personality traits and experiences, such as unusual perceptual experiences and social difficulties, which are less severe than those seen in psychosis. The study highlights the importance of identifying speech markers that reflect these traits, thereby supporting a dimensional approach to psychosis. The primary objective was to investigate the relationships between acoustic and semantic speech features and various symptom dimensions, utilizing a multimodal model that integrates text and audio through Temporal Context Fusion (TCF). This model dynamically adjusts the contributions of different inputs, enhancing robustness against noise and inconsistencies in speech data.

The findings reveal that the TCF model outperforms traditional models in accuracy and calibration error, achieving an F1-score of 83% and significantly reducing the Expected Calibration Error (ECE) to 4.5e-2. The study analyzed speech data from 114 participants, including patients with early psychosis and non-clinical individuals with varying levels of schizotypy. Results indicate that acoustic features, such as pitch variability and spectral fluctuations, are closely linked to specific symptomatology across both clinical and subclinical populations. The interaction context was found to influence symptom expression, with structured tasks emphasizing negative traits and open-ended tasks amplifying positive and disorganized features. Overall, the research underscores the potential of uncertainty-aware models in improving the reliability of psychosis assessments and highlights the nuanced interplay of speech patterns associated with different dimensions of psychosis.