هل يمكن أن تحسن الممارسة والمحاكاة المدعومة من LLM مهارات المستشارين البشريين؟ دراسة عشوائية مع أكثر من 90 مستشار مبتدئ Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors

المجلة: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3772318.3791821
تاريخ النشر: 2026-04-13
المؤلف: Ryan Louie وآخرون
الموضوع الرئيسي: الابتكارات في التعليم القانوني والممارسة

نظرة عامة

تبحث الدراسة في فعالية نظام ممارسة وتغذية راجعة محاكي لنموذج لغة كبير (LLM) لتدريب المستشارين المبتدئين في رعاية الصحة النفسية. مقارنة دراسة عشوائية شملت 94 مشاركًا بين مجموعتين: واحدة تتلقى الممارسة فقط والأخرى تتلقى ممارسة مدعومة بتغذية راجعة مولدة بواسطة الذكاء الاصطناعي. أشارت النتائج إلى أن مجموعة الممارسة والتغذية الراجعة تحسنت بشكل كبير في المهارات الدقيقة المتمحورة حول العميل، مثل الانعكاسات والأسئلة، بينما لم تظهر مجموعة الممارسة فقط أي تحسين بل حتى تراجع في التعاطف مع مرور الوقت. أكدت المقابلات النوعية هذه النتائج، مشيرة إلى أن التغذية الراجعة سهلت نهج الاستماع المتمحور حول العميل، مما يتناقض مع العقلية الموجهة نحو الحلول لدى المشاركين في مجموعة الممارسة فقط.

تمثل هذه الدراسة مساهمة كبيرة في هذا المجال، حيث إنها أول تقييم على نطاق واسع لنظام تدريب يعتمد على LLM للمستشارين المبتدئين. تؤكد النتائج على عدم كفاية الممارسة المحاكية وحدها لتطوير المهارات، خاصة في التعاطف، وتبرز أهمية دمج التغذية الراجعة المنظمة لتعزيز المهارات الأساسية في الاستشارة. من خلال الجمع بين محاكاة المرضى الواقعية مع التغذية الراجعة المستهدفة، يمكن أن تعزز أنظمة التدريب المعتمدة على LLM بشكل فعال الكفاءات المطلوبة للعلاج المتمحور حول العميل، مما يقدم نهجًا قابلًا للتوسع وقائمًا على الأدلة لتدريب الصحة النفسية.

مقدمة

في مقدمة ورقة البحث، يتم تسليط الضوء على أن جزءًا كبيرًا من البالغين في الولايات المتحدة – حوالي 22.8% – عانى من مرض نفسي في عام 2023، ومع ذلك لا يزال الوصول إلى رعاية الصحة النفسية مقيدًا بسبب نقص مقدمي الخدمات المؤهلين. يشير المؤلفون إلى أنه على الرغم من تزايد الاهتمام بأنظمة الذكاء الاصطناعي للتفاعل المباشر مع المرضى، من المتوقع أن يستمر الطلب على الدعم المقدم من البشر. إن تدريب المتخصصين في الصحة النفسية يتطلب موارد كبيرة، حيث يعتمد على طرق تتطلب إشراف خبراء وتفاعلات محاكية مع العملاء، مما يحد من إمكانية التوسع. لقد فتحت التطورات الأخيرة في نماذج اللغة الكبيرة (LLMs) آفاقًا لمحاكاة تفاعلات المرضى، على غرار الممارسات المعمول بها في التعليم الطبي، مما يعزز فعالية التدريب.

تقدم الورقة نظام CARE، وهو نظام تدريب جديد يجمع بين محاكاة المرضى الواقعية مع تغذية راجعة منظمة مولدة بواسطة الذكاء الاصطناعي لتحسين مهارات الاستشارة. تم تقييم تأثير حالتي تدريب من خلال تجربة عشوائية شملت 94 مستشارًا مبتدئًا: الممارسة مع مرضى محاكيين بواسطة LLM فقط مقابل الممارسة مع مرضى محاكيين بواسطة LLM بالإضافة إلى تغذية راجعة من الذكاء الاصطناعي. أشارت النتائج إلى أن المجموعة التي تلقت التغذية الراجعة أظهرت تحسينات كبيرة في المهارات الأساسية للاستشارة مثل الانعكاسات والأسئلة، وزيادة ملحوظة في التعاطف مقارنة بمجموعة الممارسة فقط، التي شهدت تراجعًا في التعاطف. تؤكد النتائج على أهمية دمج التغذية الراجعة المنظمة في التدريب المحاكي بواسطة LLM لتعزيز نهج متمحور حول العميل وتطوير المستشارين بشكل فعال، مع معالجة التحديات المتعلقة بالحفاظ على الكفاءة الذاتية والاستجابات المناسبة في السيناريوهات عالية المخاطر.

الطرق

في قسم الطرق، يصف المؤلفون استخدام ثلاثة مرضى محاكيين بواسطة الذكاء الاصطناعي في دراسة مختبرية عشوائية عبر الإنترنت، كما هو موضح في الجداول 4 و5 و6. تم التحقق من صحة هذه الملفات الشخصية من قبل خبراء المجال، حيث حققت متوسط درجة واقعية قدرها 6 من 7. تم تصميم اختيار هؤلاء المرضى المحاكيين بواسطة الذكاء الاصطناعي بشكل استراتيجي لضمان التنوع في العمر والجنس والمخاوف المقدمة مع الحفاظ على الاتساق في السلوكيات التحدي. تم صياغة كل طلب مريض ليجسد مبادئ مماثلة، لا سيما التردد في تقديم أو قبول الحلول، مما يوفر للمستشارين المبتدئين مستوى مشابه من المقاومة عبر سيناريوهات متنوعة. يتم تسليط الضوء على المبادئ التي تعكس هذه المقاومة باللون الأزرق ضمن المواد التكميلية.

النتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب أو التحليلات التي تم إجراؤها. يسلط الضوء على النتائج المهمة التي تدعم الفرضيات أو أسئلة البحث المطروحة سابقًا في الدراسة. يتم عادةً توضيح البيانات من خلال الجداول أو الرسوم البيانية أو الأشكال، والتي توفر تمثيلًا بصريًا للنتائج.

في هذا القسم، قد يقوم المؤلفون بالإبلاغ عن التحليلات الإحصائية، بما في ذلك قيم p، وفترات الثقة، أو أحجام التأثير، لدعم ادعاءاتهم. بالإضافة إلى ذلك، يتم مناقشة أي اتجاهات أو أنماط ملحوظة في البيانات، جنبًا إلى جنب مع آثارها على السياق البحثي الأوسع. بشكل عام، تعتبر النتائج حاسمة لفهم صلاحية وموثوقية استنتاجات الدراسة.

المناقشة

يستعرض قسم المناقشة في ورقة البحث السياق الأساسي وأهمية نظام التدريب CARE، الذي يستخدم نماذج اللغة الكبيرة (LLMs) لتعزيز تدريب المستشارين المبتدئين. يسلط الضوء على تطور منهجيات التدريب في المهارات السريرية والتواصل، مؤكدًا على قيود الأساليب التقليدية مثل المرضى المحاكيين وأدوار الأقران، التي غالبًا ما تتطلب موارد كبيرة وصعبة التوسع. تشير الورقة إلى أنه بينما قدمت الأنظمة المدفوعة بالذكاء الاصطناعي في البداية بعض التغذية الراجعة حول أنماط التواصل، إلا أنها كانت تفتقر إلى العمق السياقي اللازم لممارسة الاستشارة الفعالة. على النقيض من ذلك، يهدف نظام CARE إلى خلق تجارب تدريب أكثر أصالة من خلال دمج المبادئ السلوكية المدفوعة بالخبراء في تفاعلات المرضى المحاكيين بواسطة LLM وتقديم تغذية راجعة قابلة للتنفيذ حول مهارات الاستشارة.

يناقش القسم أيضًا أهمية تقييم أنظمة الإنسان-الذكاء الاصطناعي بما يتجاوز الدقة البسيطة، مع التركيز على نتائج المتعلمين مثل اكتساب المهارات، والكفاءة الذاتية، وانطباعات المستخدمين. يحدد الفجوات في أطر التقييم الحالية، لا سيما في المجالات الحساسة مثل الاستشارة، حيث تعتبر الاعتبارات الأخلاقية ووكالة المتعلم أمرًا بالغ الأهمية. تم تصميم نظام CARE لمعالجة هذه الفجوات من خلال تقديم نهج تقييم شامل يجمع بين مقاييس الأداء الموضوعية مع رؤى نوعية من المشاركين. من خلال استخدام LLMs لكل من تفاعلات المرضى المحاكيين وتوليد التغذية الراجعة، يسعى نظام CARE إلى تقديم حل تدريبي قابل للتوسع وفعال يعزز مهارات المستشارين المبتدئين مع تعزيز بيئة تعلم تأملية.

القيود

في قسم القيود، يعترف المؤلفون بعدة قيود قد تؤثر على نتائج بحثهم. أولاً، يسلطون الضوء على القيود المنهجية في نهج التقييم الخاص بهم، والتي قد تؤثر على موثوقية وصلاحية النتائج. بالإضافة إلى ذلك، تثير تمثيلية السياق التعليمي الذي أجريت فيه الدراسة تساؤلات حول قابلية تطبيق النتائج على إعدادات أخرى. أخيرًا، يشير المؤلفون إلى مخاوف بشأن قابلية تعميم نتائجهم عبر أنماط علاجية مختلفة، مما يشير إلى أن الاستنتاجات المستخلصة قد لا تكون قابلة للتطبيق عالميًا. تشير هذه القيود إلى مجالات للبحث المستقبلي لتعزيز قوة وملاءمة نتائج الدراسة.

Journal: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3772318.3791821
Publication Date: 2026-04-13
Author(s): Ryan Louie et al.
Primary Topic: Legal Education and Practice Innovations

Overview

The research investigates the efficacy of a large language model (LLM)-simulated practice and feedback system for training novice counselors in mental health care. A randomized study involving 94 participants compared two groups: one receiving practice alone and the other receiving practice supplemented with AI-generated feedback. The findings indicated that the practice-and-feedback group significantly improved in client-centered microskills, such as reflections and questions, while the practice-alone group exhibited no improvement and even a decline in empathy over time. Qualitative interviews corroborated these results, highlighting that feedback facilitated a client-centered listening approach, contrasting with the solution-oriented mindset of the practice-alone participants.

This study represents a significant contribution to the field, being the first large-scale evaluation of an LLM-based training system for novice counselors. The results underscore the inadequacy of simulated practice alone for skill development, particularly in empathy, and emphasize the importance of integrating structured feedback to foster essential counseling skills. By combining realistic patient simulations with targeted feedback, LLM-based training systems can effectively enhance the competencies required for client-centered therapy, presenting a scalable and evidence-based approach to mental health training.

Introduction

In the introduction of the research paper, it is highlighted that a significant portion of U.S. adults—approximately 22.8%—experienced mental illness in 2023, yet access to mental health care remains constrained due to a shortage of qualified providers. The authors note that while there is growing interest in AI systems for direct patient interaction, the demand for human-delivered support is expected to persist. The training of mental health professionals is resource-intensive, relying on methods that require expert supervision and simulated client interactions, which limits scalability. Recent advancements in large language models (LLMs) have opened avenues for simulating patient interactions, akin to established practices in medical education, thereby enhancing training effectiveness.

The paper introduces CARE, a novel training system that combines realistic patient simulations with structured AI-generated feedback to improve counseling skills. A randomized experiment involving 94 novice counselors assessed the impact of two training conditions: practice with LLM-simulated patients alone versus practice with LLM-simulated patients plus AI feedback. Results indicated that the group receiving feedback showed significant improvements in key counseling skills such as reflections and questions, and a notable enhancement in empathy compared to the practice-only group, which experienced a decline in empathy. The findings underscore the importance of integrating structured feedback in LLM-simulated training to foster a client-centered approach and effective counselor development, while also addressing the challenges of maintaining self-efficacy and appropriate responses in high-risk scenarios.

Methods

In the Methods section, the authors describe the use of three AI-simulated patients in a randomized online lab study, as detailed in Tables 4, 5, and 6. These profiles were validated by domain experts, achieving an average realism score of 6 out of 7. The selection of these AI patients was strategically designed to ensure diversity in age, gender, and presenting concerns while maintaining consistency in challenging behaviors. Each patient prompt was crafted to embody similar principles, particularly a reluctance to offer or accept solutions, thereby providing novice counselors with a comparable level of resistance across varied scenarios. The principles reflecting this resistance are highlighted in blue within the supplementary materials.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments or analyses. It highlights the significant outcomes that support the hypotheses or research questions posed earlier in the study. The data is typically illustrated through tables, graphs, or figures, which provide a visual representation of the results.

In this section, the authors may report statistical analyses, including p-values, confidence intervals, or effect sizes, to substantiate their claims. Additionally, any observed trends or patterns in the data are discussed, along with their implications for the broader research context. Overall, the results are critical for understanding the validity and reliability of the study’s conclusions.

Discussion

The discussion section of the research paper outlines the foundational context and significance of the CARE training system, which utilizes large language models (LLMs) to enhance the training of novice counselors. It highlights the evolution of training methodologies in clinical and communication skills, emphasizing the limitations of traditional approaches such as simulated patients and peer role-plays, which are often resource-intensive and difficult to scale. The paper notes that while early AI-driven systems provided some feedback on communication styles, they lacked the contextual depth necessary for effective counseling practice. In contrast, CARE aims to create more authentic training experiences by integrating expert-driven behavioral principles into LLM-simulated patient interactions and providing actionable feedback on counseling skills.

The section further discusses the importance of evaluating human-AI systems beyond mere accuracy, focusing on learner outcomes such as skill acquisition, self-efficacy, and user perceptions. It identifies gaps in existing evaluation frameworks, particularly in sensitive domains like counseling, where ethical considerations and learner agency are paramount. The CARE system is designed to address these gaps by offering a holistic evaluation approach that combines objective performance metrics with qualitative insights from participants. By employing LLMs for both simulated patient interactions and feedback generation, CARE seeks to provide a scalable, effective training solution that enhances novice counselors’ skills while fostering a reflective learning environment.

Limitations

In the section on limitations, the authors acknowledge several constraints that may impact the findings of their research. Firstly, they highlight methodological limitations in their assessment approach, which could affect the reliability and validity of the results. Additionally, the representativeness of the educational context in which the study was conducted raises questions about the applicability of the findings to other settings. Lastly, the authors note concerns regarding the generalizability of their results across different therapeutic modalities, suggesting that the conclusions drawn may not be universally applicable. These limitations indicate areas for future research to enhance the robustness and applicability of the study’s outcomes.