نظام أسئلة وأجوبة قائم على الدردشة لتشخيص الأمراض المزمنة بمساعدة نموذج لغوي كبير A chatbot based question and answer system for the auxiliary diagnosis of chronic diseases based on large language model

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-67429-4
PMID: https://pubmed.ncbi.nlm.nih.gov/39054346
تاريخ النشر: 2024-07-25
المؤلف: Sainan Zhang وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تقدم هذه الدراسة نظامًا جديدًا قائمًا على الدردشة للإجابة على الأسئلة يهدف إلى المساعدة في التشخيص المساعد للحالات الطبية المزمنة، مستفيدًا من التقدم في نماذج اللغة الكبيرة. يعتمد النظام على مجموعة شاملة من المعرفة الطبية ومصمم لتعزيز سير العمل السريري من خلال دعم، وليس استبدال، اتخاذ القرار البشري. تشير التقييمات التجريبية، بما في ذلك اختبار CUQ، إلى ردود فعل إيجابية بشأن قابلية الاستخدام وتبرز الإمكانية التي يمتلكها النظام لتحسين تجربة المرضى، وتسريع أوقات استجابة الخدمة الطبية، وتقليل تكاليف الرعاية الصحية.

تؤكد الأبحاث على أهمية بيانات التدريب عالية الجودة لدمج ميزات الإدخال الصوتي والتعرف على الصور، والتي تم التخطيط لها في التحديثات المستقبلية. تم استخدام تقنيات تعزيز البيانات لتعزيز قدرات تعميم النموذج، وتم تنفيذ بنية نظام معيارية لتسهيل تبادل البيانات عبر أنظمة مختلفة من خلال واجهات برمجة التطبيقات. تؤكد الدراسة على أهمية دمج البيانات متعددة الوسائط، من خلال دمج المعلومات من النصوص والصوت والصور لتحسين دقة التشخيص وملاءمة الاستجابة. في النهاية، تشير النتائج إلى أن تطبيق التعلم العميق ونماذج اللغة الكبيرة في هذا السياق يمكن أن يمتد إلى ما هو أبعد من تشخيص الأمراض المزمنة إلى تطوير أنظمة طبية أوسع.

الطرق

في هذه الدراسة، استخدم المؤلفون طرق الضبط الدقيق على نموذج GPT-2 لتعزيز قدرته على تشخيص الأمراض المزمنة بناءً على أوصاف أعراض المرضى. استخدم عملية الضبط الدقيق مجموعة بيانات طبية مسبقة المعالجة، حيث تضمنت كل إدخال وصفًا للأعراض وعلامة المرض المقابلة، مما أطر المهمة كمشكلة تصنيف متعددة الفئات. تم اختيار دالة فقدان الانتروبيا المتقاطعة للتدريب نظرًا لفعاليتها في السيناريوهات متعددة الفئات. تم تقييم أداء النموذج باستخدام مقاييس مثل الدقة، والاسترجاع، والدقة، ودرجة F1، والمساحة تحت المنحنى (AUC).

شمل نظام التدريب تقسيم مجموعة البيانات إلى مجموعات تدريب واختبار، وتم اختيار مُحسِّن آدم لأدائه القوي عبر مهام التعلم العميق. تم تعيين معدل التعلم الأولي عند $3 \times 10^{-5}$، مع تنفيذ استراتيجية انخفاض معدل التعلم. تم استخدام حجم دفعة قدره 32 لتحسين استخدام موارد GPU واستقرار التدريب. تم تصميم عملية التدريب لإكمال خمس جولات كاملة، مع دمج استراتيجية إسقاط لتخفيف الإفراط في التكيف. تم وضع معايير الإيقاف المبكر، حيث يتم إنهاء التدريب إذا ظلت متوسط خطأ المربعات (MSE) أقل من عتبة 0.5 لثلاثة عصور متتالية، وهي عتبة تم تحديدها من خلال اختبارات تحسين أولية. تم أيضًا تعيين معدل الإسقاط عند 0.5 لتعزيز أداء النموذج بشكل أكبر.

النتائج

تشير نتائج نموذج التشخيص المساعد إلى أن نموذج GPT-2 المعدل يظهر أداءً استثنائيًا في تشخيص الأمراض المزمنة، كما يتضح من مقاييس التقييم مثل الدقة، والدقة، والاسترجاع، ودرجات F1، التي تتجاوز جميعها 0.97. ومن الجدير بالذكر أن المساحة تحت المنحنى (AUC) تقترب من 1، مما يعكس قدرة النموذج القوية على التمييز بين مختلف الأمراض المزمنة. تؤكد هذه النتائج أن النموذج يعالج بفعالية أوصاف أعراض المرضى ويقدم توصيات تشخيص موثوقة، مما يدعم تقدم أنظمة الرعاية الصحية الذكية.

فيما يتعلق بتطوير الواجهة الخلفية، استخدمت الأبحاث إطار عمل Flask لمرونته في إدارة تطبيقات الويب، مما يعالج بفعالية طلبات HTTP ويسهل تبادل البيانات بين مكونات الواجهة الخلفية والأمامية من خلال استخدام وحدات `render_template` و `jsonify`.

كشفت دراسة استقصائية أجريت مع 64 مشاركًا بالغًا عن تصورات إيجابية بشكل عام حول الدردشة، Chat Ella. كانت أعلى التقييمات للجوانب المتعلقة بقابلية الاستخدام مثل سهولة الاستخدام أثناء الإعداد (متوسط درجة 3.76)، بينما تم الإشارة إلى مخاوف بشأن الارتباك في الاستخدام، مع متوسط درجة 3.4 للأسئلة ذات الصلة. كان متوسط درجة قابلية الاستخدام العامة 68.31، مع تباينات بناءً على الجنس، والخلفية التعليمية، والجهاز المستخدم، مما يشير إلى مجالات للتحسين المستقبلي في تصميم النظام ووظائفه.

المناقشة

في هذه الدراسة، طور المؤلفون Chat Ella، وهو روبوت محادثة مصمم للمساعدة في تشخيص الأمراض المزمنة من خلال تفسير أوصاف أعراض المرضى باستخدام نموذج التحويل المدرب مسبقًا (GPT-2). تم تدريب النموذج على مجموعة بيانات مأخوذة من Kaggle، تتكون من 1200 حالة من أوصاف الأعراض المصنفة إلى 24 نوعًا من الأمراض المزمنة. خضعت البيانات لعمليات معالجة صارمة، بما في ذلك إزالة التكرار، وتنظيف النصوص، والتجزئة، لضمان الجودة وتسهيل تدريب النموذج بشكل فعال. تم تقييم قابلية استخدام Chat Ella من خلال استبيان، مما أسفر عن متوسط درجة استبيان قابلية استخدام الروبوت (CUQ) قدره 68.31، مما يشير إلى رضا قوي من المستخدم وإمكانية التطبيق العملي في التشخيص عن بُعد.

تدمج بنية Chat Ella نظامًا خلفيًا يعالج مدخلات المستخدم ويسترجع البيانات ذات الصلة، بينما يوفر الواجهة الأمامية واجهة محادثة بديهية. يركز تصميم النظام على التفاعل السلس، مما يسمح للمستخدمين بإدخال الأعراض واستقبال ردود تشخيصية بكفاءة. على الرغم من النتائج الواعدة، تعترف الدراسة بالقيود، بما في ذلك مجموعة البيانات الصغيرة نسبيًا ودعم النموذج الحالي للغة الإنجليزية فقط. يقترح المؤلفون أن تشمل التحسينات المستقبلية توسيع مجموعة البيانات، ودمج قواعد المعرفة الخارجية، ودمج مدخلات البيانات متعددة الوسائط مثل الصوت والصور لتحسين دقة التشخيص وتجربة المستخدم. بشكل عام، تسلط النتائج الضوء على إمكانيات الروبوتات المدفوعة بالذكاء الاصطناعي في الرعاية الصحية، لا سيما في تعزيز مشاركة المرضى وتحسين تقديم الخدمات الطبية.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-67429-4
PMID: https://pubmed.ncbi.nlm.nih.gov/39054346
Publication Date: 2024-07-25
Author(s): Sainan Zhang et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

This study presents a novel chatbot-based question-and-answer system aimed at assisting in the auxiliary diagnosis of chronic medical conditions, leveraging advancements in large language models. The system is built upon a comprehensive corpus of medical knowledge and is designed to enhance clinical workflows by supporting, rather than replacing, human decision-making. Empirical evaluations, including a CUQ test, indicate positive usability feedback and highlight the system’s potential to improve patient experience, expedite medical service response times, and reduce healthcare costs.

The research emphasizes the importance of high-quality training data for the integration of voice input and image recognition features, which are planned for future upgrades. Data augmentation techniques were utilized to enhance the model’s generalization capabilities, and a modular system architecture was implemented to facilitate data exchange across various systems through application programming interfaces. The study underscores the significance of multimodal data fusion, integrating information from text, voice, and images to improve diagnostic accuracy and response relevance. Ultimately, the findings suggest that the application of deep learning and large language models in this context could extend beyond chronic disease diagnosis to broader medical system development.

Methods

In this study, the authors employed fine-tuning methods on the GPT-2 model to enhance its ability to diagnose chronic diseases based on patient symptom descriptions. The fine-tuning process utilized a preprocessed medical dataset, where each entry included a symptom description and its corresponding disease label, framing the task as a multi-class classification problem. The cross-entropy loss function was selected for training due to its effectiveness in multi-class scenarios. The model’s performance was evaluated using metrics such as accuracy, recall, precision, F1 score, and area under the curve (AUC).

The training regimen involved partitioning the dataset into training and test sets, with the Adam optimizer chosen for its robust performance across deep learning tasks. The initial learning rate was set at $3 \times 10^{-5}$, with a learning rate decay strategy implemented. A batch size of 32 was used to optimize GPU resource utilization and training stability. The training process was designed to complete five full rounds, incorporating a dropout strategy to mitigate overfitting. Early stopping criteria were established, terminating training if the mean squared error (MSE) remained below a threshold of 0.5 for three consecutive epochs, a threshold determined through preliminary optimization tests. The dropout rate was also set at 0.5 to further enhance model performance.

Results

The results of the auxiliary diagnostic model indicate that the fine-tuned GPT-2 model exhibits exceptional performance in chronic disease diagnosis, as evidenced by evaluation metrics such as accuracy, precision, recall, and F1 scores, all exceeding 0.97. Notably, the area under the curve (AUC) approaches 1, reflecting the model’s strong ability to differentiate between various chronic diseases. These findings affirm that the model effectively processes patient symptom descriptions and provides reliable diagnostic recommendations, thereby supporting the advancement of intelligent healthcare systems.

In terms of backend development, the research utilized the Flask framework for its flexibility in managing web applications, effectively handling HTTP requests and facilitating data exchange between backend and frontend components through the use of the `render_template` and `jsonify` modules.

A survey conducted with 64 adult participants revealed generally positive perceptions of the chatbot, Chat Ella. The highest ratings were for usability aspects such as user-friendliness during setup (average score of 3.76), while concerns about confusion in usage were noted, with an average score of 3.4 for related questions. The overall average usability score was 68.31, with variations based on gender, educational background, and device used, suggesting areas for future improvement in the system’s design and functionality.

Discussion

In this study, the authors developed Chat Ella, a chatbot designed to assist in the diagnosis of chronic diseases by interpreting patient symptom descriptions using a generative pre-trained transformer model (GPT-2). The model was trained on a dataset sourced from Kaggle, comprising 1200 instances of symptom descriptions categorized into 24 chronic disease types. The data underwent rigorous preprocessing, including deduplication, text cleaning, and tokenization, to ensure quality and facilitate effective model training. The usability of Chat Ella was evaluated through a survey, yielding an average Chatbot Usability Questionnaire (CUQ) score of 68.31, indicating strong user satisfaction and potential for practical application in remote diagnostics.

The architecture of Chat Ella integrates a backend system that processes user inputs and retrieves relevant data, while the frontend offers an intuitive conversational interface. The system’s design emphasizes seamless interaction, allowing users to input symptoms and receive diagnostic feedback efficiently. Despite the promising results, the study acknowledges limitations, including a relatively small dataset and the model’s current English-only support. The authors suggest that future enhancements could involve expanding the dataset, integrating external knowledge bases, and incorporating multimodal data inputs such as voice and images to improve diagnostic accuracy and user experience. Overall, the findings highlight the potential of AI-driven chatbots in healthcare, particularly in enhancing patient engagement and optimizing medical service delivery.