تقييم ChatGPT لتحويل رسائل العيادة إلى لغة سهلة الفهم للمرضى: دراسة كمية Evaluating ChatGPT for converting clinic letters into patient-friendly language: a quantitative study

المجلة: BJGP Open، المجلد: 9، العدد: 3
DOI: https://doi.org/10.3399/bjgpo.2024.0300
PMID: https://pubmed.ncbi.nlm.nih.gov/40164490
تاريخ النشر: 2025-03-31
المؤلف: Simon C. Cork وآخرون
الموضوع الرئيسي: التواصل بين المرضى ومقدمي الرعاية الصحية

نظرة عامة

تستكشف هذه الدراسة فعالية ChatGPT-4 Classic في ترجمة رسائل العيادات إلى لغة أكثر سهولة للمرضى، بهدف تعزيز فهمهم دون المساس بالمعلومات السريرية. تشير الأبحاث السابقة إلى أن التواصل الواضح يحسن من فهم المرضى ولكنه قد يكون مستهلكًا للوقت بالنسبة للأطباء. تضمنت الدراسة تصميمًا كميًا أحادي التعمية، حيث تم ترجمة 23 رسالة عيادة من ثماني تخصصات باستخدام ChatGPT-4 Classic. قام ممثلو المرضى بتقييم فهمهم للرسائل المترجمة مقارنة بالإصدارات الأصلية المكتوبة من قبل الأطباء.

أظهرت النتائج أن الترجمات حافظت على النزاهة السريرية بينما حسنت بشكل كبير من فهم المرضى ورضاهم. بالإضافة إلى ذلك، كان هناك انخفاض ملحوظ في حاجة المرضى لطلب المساعدة الطبية لتفسير الرسائل. تشير النتائج إلى أن ChatGPT-4 Classic هو أداة قيمة لتحويل اللغة الطبية المعقدة إلى مصطلحات سهلة للمرضى، مما يعزز في النهاية تجربة المرضى وفهمهم في البيئات السريرية. توسع هذه الأبحاث على الأعمال السابقة من خلال عرض قدرة النموذج على ترجمة الرسائل عبر تخصصات مختلفة وتأثيرها الإيجابي على فهم المرضى.

مقدمة

تسلط المقدمة الضوء على الحاجة الملحة للتواصل الفعال بين المهنيين الصحيين والمرضى، وهي قضية تم التأكيد عليها في سياسة الرعاية الصحية في المملكة المتحدة لأكثر من عقدين. نصت خطة NHS لعام 2000 على أن يتلقى المرضى نسخًا من جميع المراسلات المتعلقة برعايتهم، وعززت التوجيهات اللاحقة من أكاديمية الكليات الملكية الطبية في عام 2018 أهمية كتابة رسائل عيادة المرضى مباشرة للمرضى. يهدف هذا التحول إلى تعزيز فهم المرضى وتقليل الارتباك، حيث أن العديد من الرسائل الحالية غالبًا ما تكون تقنية وصعبة الفهم، مما قد يؤدي إلى تفاقم الفوارق الصحية وزيادة عبء العمل على الأطباء العامين.

تستكشف الدراسة إمكانيات الذكاء الاصطناعي التوليدي، وبشكل خاص ChatGPT-4 Classic، لتحويل رسائل العيادات إلى نسخ سهلة الفهم للمرضى مع الاحتفاظ بالمعلومات السريرية الأساسية. نظرًا للحجم الكبير من المواعيد الخارجية – 124.5 مليون في إنجلترا خلال 2022-23 – تتناول هذه الأبحاث التحدي الكبير المتمثل في إنشاء تواصل سهل الوصول. يهدف المؤلفون إلى تقييم ما إذا كانت الرسائل التي تم إنشاؤها بواسطة الذكاء الاصطناعي يمكن أن تحسن الوضوح كما تقيسه اختبارات القراءة الموضوعية وتعليقات المرضى، مما يوفر طريقًا واعدًا لتعزيز مشاركة المرضى وفهمهم في التواصل في الرعاية الصحية.

النتائج

تشير النتائج إلى أن تحويل رسائل العيادات القياسية إلى نسخ سهلة الفهم للمرضى باستخدام الذكاء الاصطناعي حافظ على نزاهة المعلومات السريرية دون إدخال أي هلاوس. كان متوسط طول الرسائل التي تم إنشاؤها بواسطة الذكاء الاصطناعي أكبر بكثير من تلك الخاصة بالرسائل الأصلية، بمتوسط عدد كلمات يبلغ 348 ± 13 مقارنة بـ 244 ± 23 للأصلية. على الرغم من هذه الزيادة في الطول، أظهرت تقييمات القراءة الموضوعية باستخدام نماذج مختلفة (Flesch-Kincaid، SMOG، Coleman-Liau، وقراءة تلقائية) عدم وجود تغييرات كبيرة (قيم P تتراوح من 0.14 إلى 0.7). ومع ذلك، لوحظ انخفاض ملحوظ في مؤشر Gunning Fog، مما يشير إلى تحول من مستوى قراءة “طالب في السنة الأخيرة من الثانوية” (12.44) إلى مستوى “طالب في السنة الثانية من الثانوية” (11.35)، مع قيمة P ذات دلالة إحصائية تبلغ 0.02.

كشفت التحليلات الذاتية عن تحسن ملحوظ في فهم المرضى للرسائل التي تم إنشاؤها بواسطة الذكاء الاصطناعي مقارنة بالأصلية، حيث ارتفعت الدرجات من 2.79 ± 0.23 للرسائل الأصلية إلى 4.38 ± 0.11 للإصدارات التي تم إنشاؤها بواسطة الذكاء الاصطناعي (P < 0.0001). وهذا يشير إلى أن الرسائل التي تم إنشاؤها بواسطة الذكاء الاصطناعي لم تحتفظ فقط بالمعلومات السريرية الأساسية ولكنها أيضًا حسنت من فهم المرضى بشكل كبير.

المناقشة

استقصت الدراسة فعالية ChatGPT-4 Classic في ترجمة رسائل العيادات من المهنيين الطبيين إلى لغة أكثر سهولة للمرضى، دون المساس بالمعلومات السريرية. تم تحليل 23 رسالة من تخصصات مختلفة، مما كشف أنه بينما أشار مؤشر قراءة موضوعي واحد فقط إلى انخفاض كبير في التعقيد، أظهرت التقييمات الذاتية من قبل ممثلي المرضى تحسنًا في الفهم والرضا عن الرسائل التي تم إنشاؤها بواسطة الذكاء الاصطناعي. تشير هذه الفجوة إلى أن مقاييس القراءة التقليدية قد لا تلتقط تمامًا الفروق الدقيقة في فهم المرضى، مما يبرز أهمية دمج تعليقات المستخدم النهائي في تقييم مخرجات الذكاء الاصطناعي.

تتوافق النتائج مع الأدبيات الحالية التي تدعم استخدام الذكاء الاصطناعي التوليدي لإنشاء اتصالات تركز على المرضى. على عكس الدراسات السابقة التي اعتمدت على أدوات غير موثوقة أو أنشأت رسائل دون نظائر أصلية، جمعت هذه الأبحاث بشكل فريد بين رسائل العيادات الفعلية مع تحليل المستخدم النهائي عبر تخصصات متعددة. تشير النتائج إلى أن الذكاء الاصطناعي التوليدي يمكن أن ينتج بفعالية رسائل شخصية وسهلة الفهم للمرضى مع الحفاظ على نزاهة المعلومات السريرية، مما قد يعزز التواصل والرضا لدى المرضى. ومع ذلك، تعترف الدراسة أيضًا بمخاطر “الهلاوس” – المعلومات غير الدقيقة أو الخيالية التي ينتجها الذكاء الاصطناعي – مما يبرز ضرورة التحقق اليدوي من مخرجات الذكاء الاصطناعي لضمان الدقة والسلامة في سياقات الرعاية الصحية. تشير تداعيات هذه الأبحاث إلى طريق واعد لدمج الذكاء الاصطناعي في الممارسة السريرية، شريطة الالتزام بقوانين حماية البيانات.

القيود

تمثل هذه الدراسة جهدًا رائدًا في استخدام تحليل المستخدم النهائي لمحتوى الرسائل لتقييم الفهم، متجاوزة مقاييس القراءة الموضوعية التقليدية، التي قد لا تتماشى مع تصورات المرضى. ومع ذلك، تحتوي الأبحاث على قيود ملحوظة. قد لا يعكس الملف الديموغرافي لممثلي المرضى – الذين هم في الغالب أكبر سنًا (60% فوق 65 عامًا)، وبيضاء (86%)، ومتعلمين بشكل عالٍ (53% يحملون درجات علمية) – بشكل دقيق السكان المرضى الأوسع. بالإضافة إلى ذلك، على الرغم من الجهود المبذولة لإخفاء هوية المشاركين من أصول الرسائل، قد يكون العنوان المحدد للرسائل التي تم إنشاؤها بواسطة الذكاء الاصطناعي للمرضى قد أدى إلى تحيز غير مقصود في الردود.

تتطلب الأبحاث المستقبلية التحقق مما إذا كانت النتائج، التي تشير إلى تحسين فهم المرضى للرسائل السريرية، ستؤدي إلى نتائج ملموسة، مثل انخفاض في الاستشارات للحصول على توضيحات. علاوة على ذلك، بينما أدت موجهات الأوامر المستخدمة في هذه الدراسة إلى نتائج مرضية، قد تؤدي الاختلافات في تطبيقها إلى نتائج مختلفة، مما يستدعي مزيدًا من التحقيق. يجب أن تأخذ الدراسات الإضافية أيضًا في الاعتبار الآثار المترتبة على عبء العمل المرتبط بتنفيذ ومراقبة مثل هذه الأنظمة الذكية في البيئات السريرية.

Journal: BJGP Open, Volume: 9, Issue: 3
DOI: https://doi.org/10.3399/bjgpo.2024.0300
PMID: https://pubmed.ncbi.nlm.nih.gov/40164490
Publication Date: 2025-03-31
Author(s): Simon C. Cork et al.
Primary Topic: Patient-Provider Communication in Healthcare

Overview

This study investigates the effectiveness of ChatGPT-4 Classic in translating clinic letters into more accessible language for patients, aiming to enhance their understanding without compromising clinical information. Previous research indicates that clear communication improves patient comprehension but can be time-consuming for clinicians. The study involved a single-blinded quantitative design, where 23 clinic letters from eight specialties were translated using ChatGPT-4 Classic. Patient representatives rated their understanding of the translated letters compared to the original clinician-written versions.

The results demonstrated that the translations maintained clinical integrity while significantly improving patient understanding and satisfaction. Additionally, there was a notable reduction in the need for patients to seek medical assistance to interpret the letters. The findings suggest that ChatGPT-4 Classic is a valuable tool for converting complex medical language into patient-friendly terms, ultimately enhancing patient experience and comprehension in clinical settings. This research expands on previous work by showcasing the model’s ability to translate letters across various specialties and its positive impact on patient understanding.

Introduction

The introduction highlights the critical need for effective communication between health professionals and patients, a concern that has been emphasized in UK healthcare policy for over two decades. The NHS Plan of 2000 mandated that patients receive copies of all correspondence regarding their care, and subsequent guidance from the Academy of Medical Royal Colleges in 2018 reinforced the importance of writing outpatient clinic letters directly to patients. This shift aims to enhance patient understanding and reduce confusion, as many existing letters are often technical and difficult to comprehend, potentially exacerbating health inequalities and increasing the workload for general practitioners (GPs).

The study explores the potential of generative AI, specifically ChatGPT-4 Classic, to transform clinic letters into patient-friendly versions while retaining essential clinical information. Given the substantial volume of outpatient appointments—124.5 million in England during 2022-23—this research addresses the significant challenge of creating accessible communication. The authors aim to assess whether AI-generated letters can improve clarity as measured by objective readability tests and patient feedback, thus providing a promising avenue for enhancing patient engagement and understanding in healthcare communication.

Results

The results indicate that the conversion of standard clinic letters into patient-friendly versions using AI maintained clinical information integrity without introducing hallucinations. The average length of AI-generated letters was significantly greater than that of the original letters, with an average word count of 348 ± 13 compared to 244 ± 23 for the originals. Despite this increase in length, objective readability assessments using various models (Flesch-Kincaid, SMOG, Coleman-Liau, and automated readability) showed no significant changes (P values ranging from 0.14 to 0.7). However, a notable decrease in the Gunning Fog index was observed, indicating a shift from a ‘high school senior’ reading level (12.44) to a ‘high school junior’ level (11.35), with a statistically significant P value of 0.02.

Subjective analysis revealed a marked improvement in patient understanding of the AI-generated letters compared to the originals, with scores rising from 2.79 ± 0.23 for the original letters to 4.38 ± 0.11 for the AI versions (P < 0.0001). This suggests that the AI-generated letters not only retained essential clinical information but also enhanced patient comprehension significantly.

Discussion

The study investigated the effectiveness of ChatGPT-4 Classic in translating clinic letters from medical professionals into more accessible language for patients, without compromising clinical information. A total of 23 letters from various specialties were analyzed, revealing that while only one objective readability index indicated a significant reduction in complexity, subjective assessments by patient representatives showed improved understanding and satisfaction with the AI-generated letters. This discrepancy suggests that traditional readability metrics may not fully capture the nuances of patient comprehension, highlighting the importance of incorporating end-user feedback in evaluating AI outputs.

The findings align with existing literature that supports the use of generative AI for creating patient-centered communications. Unlike previous studies that relied on unvalidated tools or generated letters without original counterparts, this research uniquely combined actual clinic letters with end-user analysis across multiple specialties. The results indicate that generative AI can effectively produce personalized, understandable letters for patients while maintaining the integrity of clinical information, potentially enhancing patient communication and satisfaction. However, the study also acknowledges the risk of ‘hallucinations’—inaccurate or fictitious information generated by AI—which underscores the necessity for manual verification of AI outputs to ensure accuracy and safety in healthcare contexts. The implications of this research suggest a promising avenue for integrating AI into clinical practice, provided that data protection regulations are adhered to.

Limitations

This study represents a pioneering effort in employing end-user analysis of letter contents to evaluate comprehension, moving beyond traditional objective readability indexes, which may not align with patient perceptions. However, the research has notable limitations. The demographic profile of patient representatives—predominantly older (60% over 65 years), White (86%), and highly educated (53% with degrees)—may not accurately reflect the broader patient population. Additionally, despite efforts to blind participants to the letter origins, the specific addressing of AI-generated letters to patients could have led to unintentional bias in responses.

Future research is necessary to validate whether the findings, which indicate enhanced patient understanding of clinical letters, will lead to tangible outcomes, such as a decrease in consultations for clarifications. Furthermore, while the command prompt utilized in this study yielded satisfactory results, variations in its application could produce different outcomes, warranting further investigation. Additional studies should also consider the implications for workload associated with the implementation and oversight of such AI systems in clinical settings.