تحقيق التوازن بين الدقة ورضا المستخدم: دور هندسة الطلب في حلول الرعاية الصحية المدفوعة بالذكاء الاصطناعي Balancing accuracy and user satisfaction: the role of prompt engineering in AI-driven healthcare solutions

المجلة: Frontiers in Artificial Intelligence، المجلد: 8
DOI: https://doi.org/10.3389/frai.2025.1517918
PMID: https://pubmed.ncbi.nlm.nih.gov/40017484
تاريخ النشر: 2025-02-13
المؤلف: Han Wang وآخرون
الموضوع الرئيسي: تتبع النظر والتكنولوجيا المساعدة

نظرة عامة

تبحث الورقة البحثية في دمج تقنيات إنترنت الأشياء (IoT) والذكاء الاصطناعي (AI) في الرعاية الصحية العامة، مع التركيز بشكل خاص على الكشف وإدارة مرض جفاف العين (DED). باستخدام واجهات برمجة التطبيقات OpenAI GPT-4.0 و ERNIE Bot-4.0، طورت الدراسة آلية مخصصة لتقييم مدى إلحاح الرعاية الطبية بناءً على مجموعة بيانات تضم 5,747 شكوى مريض محاكية. تم استخدام نموذج تمثيلات الترميز ثنائية الاتجاه من المحولات (BERT) لتصنيف النصوص، محققًا تحسينًا كبيرًا في الدقة من 80.1% إلى 99.6% مع الاستفسارات المحفزة، على الرغم من أن ذلك جاء على حساب زيادة أوقات الاستجابة وانخفاض درجات تجربة الخدمة (SE).

تؤكد النتائج على أهمية هندسة الاستفسارات في تعزيز خدمات الرعاية الصحية المدفوعة بالذكاء الاصطناعي، كاشفة عن وجود توازن بين الدقة ورضا المستخدم. يسهل دمج أجهزة إنترنت الأشياء، مثل الكاميرات الذكية وأجهزة الاستشعار البيئية، المراقبة المستمرة للصحة العينية والظروف البيئية، مما يسمح بتعديلات علاجية مخصصة. تؤكد الورقة على الحاجة إلى مزيد من البحث في المستقبل لتحسين هياكل الاستفسارات واستكشاف أساليب التحفيز الديناميكية مع معالجة الاعتبارات الأخلاقية مثل التحيز وخصوصية المستخدم. بشكل عام، تسلط الدراسة الضوء على إمكانيات الذكاء الاصطناعي وإنترنت الأشياء في تحسين تقديم الرعاية الصحية، لا سيما في طب العيون، مع الدعوة إلى تصميم يركز على المستخدم في تطوير أنظمة الذكاء الاصطناعي التفاعلية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على التأثير التحويلي لنماذج اللغة الواسعة (LLMs)، وخاصة GPT-4، في مجال الذكاء الاصطناعي (AI)، مع التركيز بشكل خاص على تطبيقاتها في طب العيون. بينما تم استكشاف قدرات GPT-4 على نطاق واسع، لا يزال إمكانه في تشخيص الأمراض العينية النادرة غير مستكشف إلى حد كبير. تهدف هذه الدراسة إلى معالجة هذه الفجوة من خلال تقييم فعالية GPT-4 في التعرف على مرض جفاف العين (DED) من خلال تفاعلات محاكية للمرضى تشمل مختلف الأطراف المعنية، بما في ذلك المرضى، وأطباء الأسرة، وأطباء العيون المبتدئين.

جانب رئيسي من البحث هو التركيز على هندسة الاستفسارات، والتي تتضمن التصميم الدقيق وتحسين استفسارات الإدخال لتحسين مخرجات LLM. تفترض الدراسة أن هندسة الاستفسارات الفعالة يمكن أن تعزز بشكل كبير أداء LLMs، مما يؤدي إلى تحسين تجارب المستخدمين واستجابات طبية أكثر دقة. من خلال محاكاة 5,747 شكوى مريض واستخدام كل من GPT-4 ونموذج تمثيلات الترميز ثنائية الاتجاه من المحولات (BERT) لتصنيف النصوص، تهدف الدراسة إلى تقييم مدى إلحاح الرعاية الطبية المطلوبة لمرض جفاف العين. من المتوقع أن تسهم النتائج في تقدم معالجة اللغة الطبيعية (NLP) وتطوير مساعدين افتراضيين متطورين في طب العيون، مما يعزز في النهاية جودة التفاعل بين المستخدمين وأنظمة الذكاء الاصطناعي.

طرق البحث

في هذه الدراسة، تم الحصول على الموافقة الأخلاقية من لجنة مراجعة الأخلاقيات في مستشفى الشعب في تشوهاي. تم جمع البيانات باللغتين الإنجليزية والصينية باستخدام واجهات برمجة التطبيقات لـ GPT-4.0 و ERNIE Bot-4.0، والتي تتطلب اشتراكًا مدفوعًا، بينما تم استخدام Bard كبديل متاح مجانًا. تركز الدراسة على مرض جفاف العين (DED)، كما أبرزها وانغ وآخرون (2024a)، لاستكشاف التقدم في المساعدين الافتراضيين في طب العيون، لا سيما من خلال عدسة تعزيز تفاعل المستخدم في التطبيقات المعتمدة على GPT.

تم وضع تركيز كبير على تصميم الاستفسارات، والذي كان حاسمًا لتحسين أداء وموثوقية أنظمة الذكاء الاصطناعي في تشخيص مرض جفاف العين. تم تنفيذ تقنيات متنوعة في عملية تصميم الاستفسارات لضمان أن النماذج الذكية تولد استجابات دقيقة، محددة بالسياق، وذات صلة سريرية، مما يسهل التفاعل الفعال مع المستخدمين والمساعدين الافتراضيين.

النتائج

في هذه الدراسة، تم إنشاء ما مجموعه 5,747 شكوى مريض فريدة محاكية باللغتين الإنجليزية والصينية، بناءً على استبيانات مؤشر مرض سطح العين (OSDI). من بين هذه الشكاوى، كانت 5,624 استجابة باللغة الإنجليزية و5,745 استجابة باللغة الصينية فعالة للتحليل، بينما فقدت بعض الاستجابات بسبب أخطاء حد معدل واجهة برمجة التطبيقات. تم تصنيف الاستجابات إلى فئات “عاجلة” و”غير عاجلة” باستخدام نموذج DETR المدرب مسبقًا، والذي حقق دقة مثيرة للإعجاب بنسبة 98% ومنطقة تحت المنحنى (AUC) بنسبة 96%.

تشير النتائج إلى تحسين كبير في توقع مستويات الإلحاح عند استخدام الاستفسارات المحفزة، حيث ارتفعت الدقة من 74.1% إلى 94.6%. ومع ذلك، كانت هذه الزيادة في الدقة مصحوبة بزيادة كبيرة في وقت الاستجابة، من 0.84 ثانية للاستفسارات غير المحفزة إلى 7.81 ثانية للاستفسارات المحفزة. بالإضافة إلى ذلك، انخفض رضا المستخدم العام قليلاً من 80.3 إلى 77.85، مما يبرز وجود توازن بين الدقة ووقت الاستجابة. كشفت تحليل مفصل عن انخفاض ملحوظ في الرضا عن فعالية النظام (SE) من 95.5 إلى 64.0، بينما تحسن الرضا عن جودة النموذج (MQ) بشكل كبير من 65.1 إلى 91.7 للاستفسارات المحفزة. تؤكد هذه النتائج على أهمية تحقيق التوازن بين الدقة ووقت الاستجابة ورضا المستخدم في تطوير أنظمة الذكاء الاصطناعي التفاعلية للتطبيقات الطبية. تم تقديم مزيد من التفاصيل في الجداول التكميلية S5.

المناقشة

تؤكد قسم المناقشة في الورقة البحثية على إمكانيات أنظمة التشخيص المدفوعة بالذكاء الاصطناعي، لا سيما في الكشف عن مرض جفاف العين (DED)، من خلال دمج البيانات في الوقت الحقيقي، والتعلم التكيفي، والمدخلات متعددة الوسائط. تعزز هذه العناصر دقة التشخيص وتدعم إدارة الأمراض بشكل استباقي من خلال السماح بالتعديلات المستمرة بناءً على معلومات المرضى الحالية والعوامل البيئية. تسلط الدراسة الضوء على أهمية استراتيجيات التحفيز المنظمة، التي حسنت بشكل كبير دقة تقييمات الإلحاح من 74.1% إلى 94.6% وزادت رضا المستخدم عن الجودة الطبية للاستجابات من 65.1 إلى 91.7. ومع ذلك، تشير أيضًا إلى زيادة مقلقة في وقت الاستجابة من 0.84 ثانية إلى 7.81 ثانية، مما قد يؤثر سلبًا على تجربة المستخدم.

تشمل اتجاهات البحث المستقبلية الحاجة إلى تجارب سريرية في العالم الحقيقي للتحقق من أداء النظام عبر سياقات الرعاية الصحية المتنوعة وتوسيع القدرات متعددة اللغات لتعزيز الوصول. تقترح الدراسة أن تعديل الإطار ليشمل حالات عينية أخرى ودمج مصادر البيانات متعددة الوسائط يمكن أن يحسن دقة التشخيص بشكل أكبر. بالإضافة إلى ذلك، يُوصى باستراتيجيات لتخفيف زيادة أوقات الاستجابة، مثل تحسين هياكل الاستفسارات واستخدام المعالجة المتوازية، لتحقيق توازن بين الدقة ورضا المستخدم. بشكل عام، تؤكد النتائج على وعد الذكاء الاصطناعي في طب العيون مع تحديد مجالات حاسمة لمزيد من البحث والتنقيح.

Journal: Frontiers in Artificial Intelligence, Volume: 8
DOI: https://doi.org/10.3389/frai.2025.1517918
PMID: https://pubmed.ncbi.nlm.nih.gov/40017484
Publication Date: 2025-02-13
Author(s): Han Wang et al.
Primary Topic: Gaze Tracking and Assistive Technology

Overview

The research paper investigates the integration of Internet of Things (IoT) and Artificial Intelligence (AI) technologies in public healthcare, specifically focusing on the detection and management of Dry Eye Disease (DED). Utilizing OpenAI GPT-4.0 and ERNIE Bot-4.0 APIs, the study developed a specialized prompt mechanism to evaluate the urgency of medical attention based on a dataset of 5,747 simulated patient complaints. A Bidirectional Encoder Representations from Transformers (BERT) model was employed for text classification, achieving a significant accuracy improvement from 80.1% to 99.6% with prompted queries, although this came at the cost of increased response times and a decrease in Service Experience (SE) scores.

The findings underscore the importance of prompt engineering in enhancing AI-driven healthcare services, revealing a trade-off between accuracy and user satisfaction. The integration of IoT devices, such as smart cameras and environmental sensors, facilitates continuous monitoring of ocular health and environmental conditions, allowing for personalized treatment adjustments. The paper emphasizes the need for future research to optimize prompt structures and explore dynamic prompting approaches while addressing ethical considerations like bias and user privacy. Overall, the study highlights the potential of AI and IoT in improving healthcare delivery, particularly in ophthalmology, while advocating for a user-centric design in the development of conversational AI systems.

Introduction

The introduction of this research paper highlights the transformative impact of expansive language models (LLMs), particularly GPT-4, in the field of artificial intelligence (AI), with a specific focus on their application in ophthalmology. While the capabilities of GPT-4 have been widely explored, its potential in diagnosing rare ophthalmic diseases remains largely unexamined. This study aims to address this gap by evaluating GPT-4’s effectiveness in recognizing Dry Eye Disease (DED) through simulated patient interactions involving various stakeholders, including patients, family physicians, and junior ophthalmologists.

A key aspect of the research is the emphasis on prompt engineering, which involves the careful design and refinement of input queries to optimize LLM outputs. The study posits that effective prompt engineering can significantly enhance the performance of LLMs, leading to improved user experiences and more accurate medical responses. By simulating 5,747 patient complaints and utilizing both GPT-4 and a Bidirectional Encoder Representations from Transformers (BERT) model for text classification, the research aims to assess the urgency of medical attention required for DED. The findings are expected to contribute to the advancement of natural language processing (NLP) and the development of sophisticated virtual assistants in ophthalmology, ultimately enhancing the interaction quality between users and AI systems.

Methods

In this study, ethical clearance was obtained from the ethics review board at Zhuhai People’s Hospital. Data was collected in both English and Chinese using the APIs of GPT-4.0 and ERNIE Bot-4.0, which require a paid subscription, while Bard was utilized as a freely accessible alternative. The research focuses on Dry Eye Disease (DED), as highlighted by Wang et al. (2024a), to explore advancements in ophthalmic virtual assistants, particularly through the lens of user interaction enhancement in GPT-based applications.

A significant emphasis was placed on prompt design, which was crucial for improving the performance and reliability of the AI systems in diagnosing DED. Various techniques were implemented in the prompt design process to ensure that the AI models generated responses that were accurate, context-specific, and clinically relevant, thereby facilitating effective user engagement and interaction with the virtual assistants.

Results

In this study, a total of 5,747 unique simulated patient complaints were generated in both English and Chinese, based on the Ocular Surface Disease Index (OSDI) questionnaires. Out of these, 5,624 English and 5,745 Chinese responses were effective for analysis, while some responses were lost due to API rate limit errors. The responses were classified into “urgent” and “non-urgent” categories using a fine-tuned pre-trained DETR model, which achieved an impressive accuracy of 98% and an area under the curve (AUC) of 96%.

The results indicate a significant improvement in predicting urgency levels when using prompted queries, with accuracy rising from 74.1% to 94.6%. However, this increase in accuracy was accompanied by a substantial increase in response time, from 0.84 seconds for non-prompted queries to 7.81 seconds for prompted queries. Additionally, overall user satisfaction decreased slightly from 80.3 to 77.85, highlighting a trade-off between accuracy and response time. A detailed analysis revealed a marked decline in satisfaction with the system’s effectiveness (SE) from 95.5 to 64.0, while satisfaction with the model’s quality (MQ) improved significantly from 65.1 to 91.7 for prompted queries. These findings emphasize the importance of balancing accuracy, response time, and user satisfaction in the development of conversational AI systems for medical applications. Further details are provided in Supplementary Tables S5.

Discussion

The discussion section of the research paper emphasizes the potential of AI-driven diagnostic systems, particularly in detecting Dry Eye Disease (DED), through the integration of real-time data, adaptive learning, and multimodal inputs. These elements enhance diagnostic accuracy and support proactive disease management by allowing continuous adjustments based on current patient information and environmental factors. The study highlights the importance of structured prompting strategies, which significantly improved the accuracy of urgency assessments from 74.1% to 94.6% and increased user satisfaction with the medical quality of responses from 65.1 to 91.7. However, it also notes a concerning increase in response time from 0.84 seconds to 7.81 seconds, which could negatively impact user experience.

Future research directions include the need for real-world clinical trials to validate the system’s performance across diverse healthcare contexts and the expansion of multilingual capabilities to enhance accessibility. The study suggests that adapting the framework to other ophthalmic conditions and integrating multimodal data sources could further improve diagnostic precision. Additionally, strategies to mitigate increased response times, such as optimizing prompt structures and employing parallel processing, are recommended to balance accuracy and user satisfaction. Overall, the findings underscore the promise of AI in ophthalmology while identifying critical areas for further investigation and refinement.