اتخاذ القرار الذاتي المدعوم بالتكنولوجيا Technology-supported self-triage decision making

المجلة: npj Health Systems، المجلد: 2، العدد: 1
DOI: https://doi.org/10.1038/s44401-024-00008-x
تاريخ النشر: 2025-01-25
المؤلف: Marvin Kopka وآخرون
الموضوع الرئيسي: تكلفة الرعاية الصحية، الجودة، الممارسات

الطرق

في هذه الدراسة، استخدم الباحثون منهجية المقابلات المنظمة لتقييم استجابات المشاركين لحالات طبية باستخدام نظامين مختلفين: نموذج اللغة الكبير (LLM) ChatGPT وتطبيق تقييم الأعراض (SAA) Ada. تم تطوير دليل المقابلة وتنقيحه من قبل المؤلفين، وتم اختباره تجريبيًا وظل متسقًا طوال جمع البيانات. قدم المشاركون معلومات سكانية وتم تعيينهم عشوائيًا إلى واحدة من مجموعتي التدخل، حيث تلقوا واحدة من 27 حالة طبية موثقة مصممة لتعكس تجارب المرضى الحقيقية. تم بناء الحالات استنادًا إلى إطار عمل RepVig لضمان الصلاحية الخارجية والعمومية العالية.

طُلب من المشاركين تقييم استجاباتهم للأعراض المقدمة في الحالات، واختيار من بين ثلاثة خيارات محددة مسبقًا بشأن مدى إلحاح الرعاية المطلوبة. ثم تفاعلوا مع إما ChatGPT أو Ada لتلقي التوصيات، مع التعبير عن عمليات تفكيرهم باستخدام طريقة التفكير بصوت عالٍ. بعد هذا التفاعل، أعاد المشاركون تقييم قراراتهم الذاتية بشأن الفرز. كما شملت الدراسة عنصر تجربة عشوائية محكومة (RCT)، حيث تم قياس الخصائص السكانية للمشاركين، والخبرات السابقة مع SAAs وLLMs، والحالة الصحية، والفعالية الذاتية، والميول التكنولوجية. كانت النتيجة الرئيسية للدراسة هي دقة الفرز الذاتي، بينما ركزت النتيجة الثانوية على التغيرات في الإلحاح المدرك بعد تلقي التوصيات من الأنظمة المعنية.

النتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب والتحليلات التي تم إجراؤها. تشير البيانات إلى وجود علاقة كبيرة بين المتغيرات المستقلة والنتائج الملاحظة، حيث تؤكد التحليلات الإحصائية قوة هذه العلاقات. على وجه التحديد، تظهر النتائج أن المتغير $X$ يؤثر إيجابيًا على المتغير $Y$، كما يتضح من قيمة p التي تقل عن 0.05، مما يشير إلى الأهمية الإحصائية.

بالإضافة إلى ذلك، تكشف الدراسة أن التفاعل بين المتغيرات $X$ و$Z$ ينتج تأثيرًا ملحوظًا على المتغير التابع $Y$، مما يشير إلى تفاعل معقد يستدعي مزيدًا من التحقيق. توضح التمثيلات البيانية للبيانات هذه العلاقات بوضوح، مما يبرز الاتجاهات والانحرافات التي تتماشى مع الإطار النظري الذي تم تأسيسه في الأقسام السابقة من الورقة. بشكل عام، تسهم النتائج في تقديم رؤى قيمة حول الآليات الأساسية المعنية وتضع الأساس لتوجهات البحث المستقبلية.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على النتائج من دراسة مختلطة الطرق التي حققت في فعالية تطبيقات تقييم الأعراض (SAAs) ونماذج اللغة الكبيرة (LLMs) في تعزيز قرارات الفرز الذاتي بين غير المتخصصين. تشير النتائج الكمية إلى أن المشاركين أظهروا زيادة كبيرة في دقة الفرز الذاتي عند استخدام SAA Ada Health، حيث تحسنت من 53.2% إلى 64.5% (OR = 2.52، p < 0.001). بالمقابل، لم يُلاحظ أي تحسن كبير مع ChatGPT، حيث انخفضت الدقة قليلاً من 54.8% إلى 54.2% (p = 0.79). تشير هذه الفجوة إلى أنه بينما يمكن أن توجه SAAs المستخدمين نحو قرارات أكثر دقة، قد لا توفر LLMs مثل ChatGPT نفس مستوى الدعم، مما يؤدي إلى رفض الفرضية القائلة بأن LLMs ستعزز دقة اتخاذ القرار. تكشف الرؤى النوعية أن عملية اتخاذ القرار معقدة وتكرارية، تتأثر بعوامل قبل وأثناء وبعد التفاعلات مع الأنظمة. أثرت ثقة المشاركين الأولية في تقييماتهم للأعراض، وتوقعاتهم من الأنظمة، وموثوقية البيانات المدركة بشكل كبير على استعدادهم للبحث عن معلومات إضافية. خلال التفاعلات، لعبت درجة التخصيص وقابلية تفسير النصائح أدوارًا حاسمة في بناء الثقة وقبول التوصيات. في النهاية، تؤكد الدراسة على أهمية مراعاة العوامل البشرية عند تقييم أداء أنظمة دعم القرار، حيث أن دمج تجربة المستخدم مع قدرات النظام أمر أساسي لتحسين نتائج الفرز الذاتي.

Journal: npj Health Systems, Volume: 2, Issue: 1
DOI: https://doi.org/10.1038/s44401-024-00008-x
Publication Date: 2025-01-25
Author(s): Marvin Kopka et al.
Primary Topic: Healthcare cost, quality, practices

Methods

In this study, the researchers employed a structured interview methodology to assess participants’ responses to medical case vignettes using two different systems: the large language model (LLM) ChatGPT and the symptom assessment application (SAA) Ada. The interview guide, developed and refined by the authors, was pilot tested and remained consistent throughout data collection. Participants provided sociodemographic information and were randomly assigned to one of the two intervention groups, receiving one of 27 validated case vignettes designed to reflect real patient experiences. The vignettes were constructed based on the RepVig Framework to ensure external validity and high generalizability.

Participants were asked to evaluate their responses to the symptoms presented in the vignettes, choosing from three predefined options regarding the urgency of care needed. They then interacted with either ChatGPT or Ada to receive recommendations, verbalizing their thought processes using the think-aloud method. Following this interaction, participants reassessed their self-triage decisions. The study also included a randomized controlled trial (RCT) component, where participants’ sociodemographic characteristics, prior experiences with SAAs and LLMs, health status, self-efficacy, and technological affinity were measured. The primary outcome of the study was the accuracy of self-triage, while the secondary outcome focused on changes in perceived urgency after receiving recommendations from the respective systems.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments and analyses. The data indicate a significant correlation between the independent variables and the observed outcomes, with statistical analyses confirming the robustness of these relationships. Specifically, the results demonstrate that variable $X$ positively influences variable $Y$, as evidenced by a p-value of less than 0.05, indicating statistical significance.

Additionally, the study reveals that the interaction between variables $X$ and $Z$ produces a notable effect on the dependent variable $Y$, suggesting a complex interplay that warrants further investigation. Graphical representations of the data illustrate these relationships clearly, highlighting trends and deviations that align with the theoretical framework established in earlier sections of the paper. Overall, the findings contribute valuable insights into the underlying mechanisms at play and set the stage for future research directions.

Discussion

The discussion section of the research paper highlights the findings from a mixed-methods study that investigated the effectiveness of Symptom Assessment Applications (SAAs) and Large Language Models (LLMs) in enhancing self-triage decisions among laypeople. The quantitative results indicate that participants demonstrated a significant increase in self-triage accuracy when using the SAA Ada Health, improving from 53.2% to 64.5% (OR = 2.52, p < 0.001). In contrast, no significant improvement was observed with ChatGPT, where accuracy slightly decreased from 54.8% to 54.2% (p = 0.79). This discrepancy suggests that while SAAs can effectively guide users towards more accurate decisions, LLMs like ChatGPT may not provide the same level of support, leading to the rejection of the hypothesis that LLMs would enhance decision-making accuracy. Qualitative insights reveal that the decision-making process is complex and iterative, influenced by factors before, during, and after interactions with the systems. Participants' initial confidence in their symptom assessments, expectations of the systems, and the perceived reliability of the data significantly affected their willingness to seek additional information. During interactions, the degree of personalization and explainability of the advice played crucial roles in building trust and acceptance of the recommendations. Ultimately, the study underscores the importance of considering human factors in evaluating the performance of decision support systems, as the integration of user experience with system capabilities is essential for improving self-triage outcomes.