التحقق من صحة ملاحظات GenAI في تدريب الوقاية من الانتحار: دراسة مختلطة لتقييم مهارات QPR Validating GenAI feedback in suicide prevention training: a mixed-methods study of QPR skill assessment

المجلة: Frontiers in Medicine، المجلد: 12
DOI: https://doi.org/10.3389/fmed.2025.1709743
PMID: https://pubmed.ncbi.nlm.nih.gov/41625774
تاريخ النشر: 2026-01-15
المؤلف: Yuval Haber وآخرون
الموضوع الرئيسي: دراسات الانتحار وإيذاء النفس

نظرة عامة

تبحث ورقة البحث في موثوقية الذكاء الاصطناعي التوليدي (GenAI) كأداة للتغذية الراجعة في تدريب الوقاية من الانتحار، تحديدًا من خلال نموذج السؤال، الإقناع، والإحالة (QPR). تقتصر طرق تدريب QPR التقليدية على نقص الممارسة التفاعلية والتقييمات الموثوقة. شمل البحث ثلاثة خبراء مستقلين قاموا بتقييم 54 محادثة QPR محاكاة لتحديد الموثوقية التجريبية لتغذية GenAI الراجعة. أظهر التحليل الأساسي وجود ارتباط إيجابي معتدل إلى قوي (r = 0.519-0.776) بين درجات الالتزام بـ GenAI وتقييمات البشر، مما يشير إلى دليل أولي على موثوقية الأداة. بالإضافة إلى ذلك، لم يتم العثور على اختلافات ذات دلالة إحصائية بناءً على الجنس في التقييمات، مما يدعم الهدف من وجود أداة تقييم غير متحيزة.

تشير النتائج إلى أن GenAI يمكنه تحديد مكونات QPR الرئيسية بفعالية وتقديم تغذية راجعة دقيقة، مما يمثل خطوة مهمة نحو استخدام GenAI لتقييم المهارات القابلة للتوسع في الوقاية من الانتحار. بينما تؤكد الدراسة على إمكانية GenAI في ديمقراطية الوصول إلى التدريب والتقييم، فإنها تؤكد أيضًا على الحاجة إلى مزيد من المعايرة لتعزيز توافقه مع الحكم البشري الخبير. بشكل عام، تسلط الأبحاث الضوء على وعد GenAI في التغلب على الحواجز التقليدية في الوصول إلى التدريب، مما يعزز كفاءة وثقة حراس البوابة في التدخل في الأزمات.

مقدمة

تتناول مقدمة ورقة البحث القضية الحرجة للصحة العامة المتعلقة بالانتحار، الذي لا يزال سببًا رئيسيًا للوفاة القابلة للتجنب على مستوى العالم، مما يستلزم استراتيجيات فعالة للوقاية. ظهر تدريب حراس البوابة كنهج حيوي، حيث يزود الأفراد في الخطوط الأمامية – مثل المعلمين ومقدمي الرعاية الصحية – بالمهارات اللازمة لتحديد علامات التحذير، والانخراط في حوارات داعمة، وإحالة الأفراد المعرضين للخطر إلى الخدمات المناسبة. يتم تسليط الضوء على نموذج السؤال، الإقناع، والإحالة (QPR) كإطار تدريب يتم تنفيذه على نطاق واسع ومدعوم تجريبيًا، والذي يستخدم عملية من ثلاث خطوات لتسهيل التدخل. ومع ذلك، فإن تدريب QPR التقليدي محدود بسبب اعتماده على الطرق التعليمية، والمشاركة النشطة المحدودة، والتحديات المرتبطة بلعب الأدوار شخصيًا، مما يمكن أن يعيق اكتساب المهارات بشكل فعال.

لمعالجة هذه القيود، تقترح الورقة استخدام الذكاء الاصطناعي التوليدي (GenAI) المدعوم من نماذج اللغة الكبيرة كحل واعد. يمكن لـ GenAI إنشاء محاكاة تفاعلية تحاكي حالات الأزمات في العالم الحقيقي، مما يسمح للمشاركين بممارسة المهارات بين الأشخاص مع شخصيات افتراضية تستجيب عاطفيًا. لا يقتصر هذا النهج على ديمقراطية المشاركة من خلال تمكين جميع المستخدمين من الانخراط بنشاط فحسب، بل يوفر أيضًا تغذية راجعة فورية وشخصية ويجمع بيانات مفصلة عن الأداء. تشير الدراسات الحديثة إلى أن التدريب القائم على GenAI يعزز ثقة المشاركين وتأملهم الذاتي مقارنة بالطرق التقليدية. ومع ذلك، لا تزال هناك مخاوف بشأن الأصالة العاطفية وتصميم مستند إلى الصدمات لهذه البيئات التعليمية الرقمية، مما يبرز الحاجة إلى التنفيذ الدقيق في سياقات تدريب الصحة النفسية.

النتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب والتحليلات التي تم إجراؤها. تشير البيانات إلى وجود ارتباط كبير بين المتغير المستقل والنتائج التابعة، حيث تكشف التحليلات الإحصائية عن قيم p أقل من 0.05، مما يشير إلى دليل قوي ضد فرضية العدم. علاوة على ذلك، تظهر النتائج أن تطبيق المنهجية المقترحة يؤدي إلى تحسين في مقاييس الأداء، مثل الدقة والكفاءة، مقارنة بالأساليب الحالية.

بالإضافة إلى ذلك، يتضمن القسم تمثيلات بيانية للبيانات، توضح الاتجاهات والأنماط التي تدعم الفرضيات. يتم التحقق من النتائج بشكل أكبر من خلال فحوصات القوة، مما يؤكد موثوقية النتائج عبر ظروف ومجموعات بيانات مختلفة. بشكل عام، تؤكد النتائج على فعالية المنهج المقترح وإمكانياته للتطبيقات العملية في المستقبل.

المناقشة

يقيم قسم المناقشة في الدراسة إمكانية الذكاء الاصطناعي التوليدي (GenAI) في تقديم التغذية الراجعة لتدريب حراس البوابة QPR (السؤال، الإقناع، الإحالة) الموجه نحو الوقاية من الانتحار. تشير النتائج إلى وجود ارتباط معتدل إلى قوي بين درجات الالتزام التي تم إنشاؤها بواسطة GenAI وتقييمات من مقيمين بشريين مدربين، مما يشير إلى أن GenAI يمكن أن يعمل كأداة تقييم تكوينية موثوقة. ومع ذلك، تسلط الدراسة الضوء أيضًا على “تحيز الكرم” المنهجي، حيث كانت درجات GenAI، في المتوسط، أعلى بمقدار 1.2 نقطة من تلك الخاصة بالمقيمين البشر. قد تنشأ هذه الفجوة من تركيز GenAI على العلامات اللفظية الصريحة للالتزام، على عكس المقيمين البشر الذين يأخذون في الاعتبار الإشارات العلائقية الضمنية.

يكشف التحليل أيضًا عن عدم وجود تحيزات ذات دلالة إحصائية بناءً على الجنس في تقييمات الالتزام، حيث أظهر كل من GenAI والمقيمين البشر أنماط تسجيل متسقة عبر مجموعات الجنس. بالإضافة إلى ذلك، بينما ارتبطت المحادثات الأطول إيجابيًا مع تقييمات البشر، ظلت تقييمات GenAI مستقرة بغض النظر عن طول المحادثة، مما يشير إلى اختلاف في الحساسية التقييمية. أظهرت التقييمات النوعية لتغذية GenAI الراجعة قدرتها على تحديد مكونات QPR بدقة وتقديم توصيات مخصصة وقابلة للتنفيذ، على الرغم من أنها تفتقر إلى القدرة على تفسير الإشارات غير اللفظية الحيوية في حالات الأزمات. تؤكد الدراسة على أهمية دمج تغذية GenAI الراجعة ضمن إطار تدريب أوسع، حيث تظل الإشراف البشري ضرورية لمعالجة الديناميات العلائقية المتأصلة في محادثات الوقاية من الانتحار. بشكل عام، تشير النتائج إلى أن GenAI يمكن أن يعزز قابلية التدريب للتوسع والوصول، مما يدعم في النهاية استعداد حراس البوابة في تحديد وإحالة الأفراد المعرضين للخطر إلى الرعاية المهنية.

القيود

تقدم الدراسة عدة قيود تبرز مجالات البحث المستقبلي. يتمثل القيد الرئيسي في حجم العينة الصغيرة المكونة من 54 مشاركًا، المستمدة من ندوة واحدة، مما يحد من إمكانية تعميم النتائج على الرغم من تلبية الحدود الدنيا لدراسات موثوقية تقييم المتداخلين. تم تصميم البحث كدليل على المفهوم لتقييم جدوى الذكاء الاصطناعي التوليدي (GenAI) كأداة تقييم، بدلاً من تحقيق تطبيق واسع. يجب أن تهدف الدراسات المستقبلية إلى عينات أكبر وأكثر تنوعًا وتصاميم متعددة المواقع لتعزيز صلاحية وفعالية GenAI عبر مجموعات سكانية وسياقات تدريبية متنوعة.

بالإضافة إلى ذلك، فإن اعتماد الدراسة على ثلاثة مقيمين بشريين فقط للتحقق، بينما أسفر عن معامل ارتباط داخلي مرتفع (ICC = 0.897)، يشير إلى أن توسيع لجنة المقيمين يمكن أن يعزز عملية التحقق. يحد غياب مجموعات التحكم أيضًا من القدرة على مقارنة فعالية GenAI ضد طرق التدريب التقليدية. ركز التحليل فقط على نموذج تغذية راجعة AI التلخيصية، دون استكشاف بدائل لطرق التغذية الراجعة أو مقارنة تغذية AI الراجعة بتغذية الخبراء البشر. يجب أن تبحث الأبحاث المستقبلية في موثوقية GenAI في تقييم المكونات الفردية للجودة، وتقديم تحليل محتوى منهجي لتغذيته الراجعة، واستكشاف أدائه في طرق تعتمد على الصوت، فضلاً عن تجربة المستخدم في تلقي التغذية الراجعة الآلية.

Journal: Frontiers in Medicine, Volume: 12
DOI: https://doi.org/10.3389/fmed.2025.1709743
PMID: https://pubmed.ncbi.nlm.nih.gov/41625774
Publication Date: 2026-01-15
Author(s): Yuval Haber et al.
Primary Topic: Suicide and Self-Harm Studies

Overview

The research paper investigates the reliability of Generative AI (GenAI) as a feedback tool in suicide prevention training, specifically through the Question, Persuade, and Refer (QPR) model. Traditional QPR training methods are limited by their lack of interactive practice and reliable assessments. The study involved three independent experts who rated 54 simulated QPR conversations to establish the empirical reliability of GenAI feedback. The primary analysis revealed a moderate-to-strong positive correlation (r = 0.519-0.776) between GenAI adherence scores and human ratings, indicating initial evidence of the tool’s reliability. Additionally, no significant gender-based differences were found in ratings, supporting the aim for an unbiased assessment tool.

The findings suggest that GenAI can effectively identify key QPR components and provide nuanced feedback, marking a significant step toward using GenAI for scalable skill evaluation in suicide prevention. While the study underscores the potential of GenAI to democratize access to training and assessment, it also emphasizes the need for further calibration to enhance its alignment with expert human judgment. Overall, the research highlights GenAI’s promise in overcoming traditional barriers in training accessibility, thereby strengthening the competence and confidence of gatekeepers in crisis intervention.

Introduction

The introduction of the research paper addresses the critical public health issue of suicide, which remains a leading cause of preventable death globally, necessitating effective prevention strategies. Gatekeeper training has emerged as a vital approach, equipping frontline individuals—such as educators and healthcare providers—with the skills to identify warning signs, engage in supportive dialogues, and refer at-risk individuals to appropriate services. The Question, Persuade, and Refer (QPR) model is highlighted as a widely implemented and empirically supported training framework that employs a three-step process to facilitate intervention. However, traditional QPR training is limited by its reliance on didactic methods, restricted active participation, and challenges associated with in-person role-play, which can hinder effective skill acquisition.

To address these limitations, the paper proposes the use of Generative Artificial Intelligence (GenAI) powered by large language models as a promising solution. GenAI can create interactive simulations that mimic real-world crisis situations, allowing participants to practice interpersonal skills with emotionally responsive virtual characters. This approach not only democratizes participation by enabling all users to engage actively but also provides immediate, personalized feedback and collects detailed data on performance. Recent studies indicate that GenAI-based training enhances participants’ confidence and self-reflection compared to conventional methods. However, concerns remain regarding the emotional authenticity and trauma-informed design of these digital learning environments, underscoring the need for careful implementation in mental health training contexts.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments and analyses. The data indicates a significant correlation between the independent variable and the dependent outcomes, with statistical analyses revealing p-values less than 0.05, suggesting strong evidence against the null hypothesis. Furthermore, the results demonstrate that the application of the proposed methodology leads to an improvement in performance metrics, such as accuracy and efficiency, compared to existing approaches.

Additionally, the section includes graphical representations of the data, illustrating trends and patterns that support the hypotheses. The findings are further validated through robustness checks, confirming the reliability of the results across different conditions and datasets. Overall, the results underscore the effectiveness of the proposed method and its potential implications for future research and practical applications in the field.

Discussion

The discussion section of the study evaluates the potential of Generative AI (GenAI) in providing feedback for QPR (Question, Persuade, Refer) gatekeeper training aimed at suicide prevention. The findings indicate a moderate to strong correlation between GenAI-generated adherence scores and evaluations from trained human raters, suggesting that GenAI can serve as a reliable formative assessment tool. However, the study also highlights a systematic “generosity bias,” where GenAI scores were, on average, 1.2 points higher than those of human raters. This discrepancy may arise from GenAI’s focus on explicit verbal markers of adherence, contrasting with human raters who consider implicit relational cues.

The analysis further reveals no significant gender-based biases in adherence evaluations, with both GenAI and human raters demonstrating consistent scoring patterns across gender groups. Additionally, while longer conversations correlated positively with human ratings, GenAI evaluations remained stable regardless of conversation length, indicating a difference in evaluative sensitivity. Qualitative assessments of GenAI feedback showed its ability to accurately identify QPR components and provide tailored, actionable recommendations, although it lacks the capacity to interpret nonverbal cues critical in crisis situations. The study emphasizes the importance of integrating GenAI feedback within a broader training framework, where human oversight remains essential to address the relational dynamics inherent in suicide prevention conversations. Overall, the findings suggest that GenAI can enhance training scalability and accessibility, ultimately supporting the preparedness of gatekeepers in identifying and referring at-risk individuals to professional care.

Limitations

The study presents several limitations that highlight areas for future research. The primary limitation is the small sample size of 54 participants, drawn from a single webinar, which restricts the generalizability of the findings despite meeting the minimum thresholds for inter-rater reliability studies. The research was designed as a proof-of-concept to assess the feasibility of Generative AI (GenAI) as an assessment tool, rather than to achieve broad applicability. Future studies should aim for larger, more diverse samples and multi-site designs to enhance the validity and effectiveness of GenAI across various populations and training contexts.

Additionally, the study’s reliance on only three human raters for validation, while yielding a high inter-class correlation coefficient (ICC = 0.897), suggests that expanding the evaluator panel could strengthen the validation process. The absence of control groups further limits the ability to compare the effectiveness of GenAI against traditional training methods. The analysis focused solely on a summative AI feedback model, without exploring alternative feedback modalities or comparing AI feedback to human expert feedback. Future research should investigate the reliability of GenAI in assessing individual components of quality, provide a systematic content analysis of its feedback, and explore its performance in voice-based modalities, as well as the user experience of receiving automated feedback.