الذكاء الاصطناعي الشامل في الطب؛ تحسين الأداء والقدرة على التفسير Holistic AI in medicine; improved performance and explainability

المجلة: npj Digital Medicine، المجلد: 9، العدد: 1
DOI: https://doi.org/10.1038/s41746-025-02298-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41495177
تاريخ النشر: 2026-01-06
المؤلف: Periklis Petridis وآخرون
الموضوع الرئيسي: تعلم الآلة في الرعاية الصحية

نظرة عامة

تقدم البحث xHAIM (HAIM القابل للتفسير)، وهو تقدم لإطار HAIM (الذكاء الاصطناعي الشامل في الطب) الذي تم تأسيسه سابقًا، والذي يدمج البيانات متعددة الأنماط للمهام السريرية ولكنه يفتقر إلى القابلية للتفسير واستخدام البيانات الخاصة بالمهام. يعالج xHAIM هذه النواقص من خلال استخدام الذكاء الاصطناعي التوليدي في عملية منظمة من أربع خطوات: (1) تحديد بيانات المرضى ذات الصلة عبر أنماط متعددة، (2) إنشاء ملخصات مفصلة للمرضى، (3) تحسين النمذجة التنبؤية باستخدام هذه الملخصات، و(4) تقديم تفسيرات سريرية تربط التنبؤات بمعرفة طبية محددة للمرضى.

تم تقييم xHAIM على مجموعة بيانات HAIM-MIMIC-MM، حيث أظهر تحسينًا كبيرًا في الأداء التنبؤي، مما زاد من متوسط المساحة تحت المنحنى (AUC) من 79.9% إلى 91.3% للمهام المتعلقة بأمراض الصدر والإجراءات الجراحية. يقوم هذا الإطار بتحويل الذكاء الاصطناعي من متنبئ غير شفاف إلى نظام دعم قرار قابل للتفسير، مما يسمح للأطباء بتتبع التنبؤات إلى بيانات المرضى ذات الصلة، وبالتالي تعزيز التطبيق السريري لتقنيات الذكاء الاصطناعي في الطب.

مقدمة

تسلط مقدمة البحث الضوء على التقدم السريع في تعلم الآلة (ML) في الرعاية الصحية على مدار العقدين الماضيين، مع التأكيد على إمكانياته في التطبيقات الحرجة على الرغم من الشكوك الأولية بشأن السلامة وقابلية التفسير. لقد أثارت التطورات الأخيرة في سلامة النماذج، والعدالة، وظهور النماذج الأساسية (FMs) والنماذج اللغوية الكبيرة (LLMs) اهتمامًا بين المهنيين في الرعاية الصحية. ومع ذلك، لا تزال هناك حواجز كبيرة، حيث نجحت 5% فقط من منظمات الرعاية الصحية في تنفيذ حلول الذكاء الاصطناعي في الممارسة السريرية. تشمل التحديات الرئيسية الأداء التنبؤي غير الكافي وافتقار القابلية للتفسير، مما يقوض مصداقية نماذج ML بين الممارسين.

لمعالجة هذه القيود، يقترح المؤلفون xHAIM (الذكاء الاصطناعي الشامل القابل للتفسير في الطب)، وهو امتداد لإطار HAIM الحالي الذي يدمج الذكاء الاصطناعي التوليدي لتعزيز كل من الأداء التنبؤي والقابلية للتفسير. يستخدم إطار xHAIM عملية من أربع خطوات: تحديد بيانات المرضى ذات الصلة، إنشاء ملخصات سريرية مختصرة، تحسين الأداء التنبؤي من خلال هذه الملخصات، وتقديم تفسيرات مستندة للتنبؤات. يهدف هذا النهج الهجين إلى دمج نقاط القوة في النماذج التمييزية التقليدية مع القدرات سهلة الاستخدام للذكاء الاصطناعي التوليدي، مما يعزز الفائدة السريرية وقابلية التفسير. يعد البحث بتقديم أدلة تجريبية تدعم فعالية xHAIM، إلى جانب منهجية مفصلة ومناقشة لتداعياته على الممارسة السريرية.

الطرق

يعزز إطار xHAIM خط أنابيب HAIM الأصلي من خلال عملية منظمة من أربع خطوات تهدف إلى تحسين الأداء التنبؤي وقابلية التفسير. تتضمن الخطوة الأولى تحديد واسترجاع معلومات المرضى ذات الصلة بالمهمة السريرية. تولد الخطوة الثانية ملخصات مختصرة، خاصة بالمهمة، لهذه المعلومات. في الخطوة الثالثة، يتم دمج هذه الملخصات عبر أنماط بيانات متعددة وتستخدم كمدخلات لنموذج تنبؤي. أخيرًا، ينتج الإطار تفسيرات قابلة للتفسير توضح تنبؤات النموذج.

استخدمت هذه الدراسة قاعدة بيانات MIMIC-IV المتاحة للجمهور، مع منح الوصول بموجب اتفاقيات تدريب واستخدام بيانات معتمدة من عملية اعتماد PhysioNet. من المهم أن البحث لا يتضمن أي معلومات إضافية تتعلق بالمشاركين البشريين.

النتائج

في هذا القسم، يقدم المؤلفون النتائج التجريبية لدراستهم في جزئين متميزين. يركز الجزء الأول على مقارنة الأداء التنبؤي لـ xHAIM ضد خطين أساسيين: التعلم بدون تدريب ونموذج HAIM التقليدي. تهدف هذه المقارنة إلى إثبات فعالية xHAIM في إجراء تنبؤات دقيقة في السيناريوهات التي تتوفر فيها بيانات تدريب محدودة.

يقيم الجزء الثاني من النتائج قابلية تفسير xHAIM. يتم إجراء هذا التقييم من خلال منهجيتين: التقييم اليدوي، حيث يقوم المقيمون البشريون بتقييم مخرجات النموذج، ونهج آلي يستخدم نموذج لغة كبير (LLM) كحكم. تهدف النتائج من هذه التقييمات إلى تسليط الضوء على قدرة النموذج على تقديم تنبؤات قابلة للتفسير ومفهومة، مما يعزز قابليته العملية في السيناريوهات الواقعية.

المناقشة

يمثل إطار xHAIM تقدمًا كبيرًا في الذكاء الاصطناعي السريري من خلال دمج النماذج التوليدية والتمييزية لتعزيز الأداء التنبؤي والقابلية للتفسير في المهام الطبية. باستخدام مجموعة بيانات HAIM-MIMIC-MM، التي تشمل بيانات مرضى متنوعة من قاعدة بيانات MIMIC-IV للرعاية الحرجة، حقق xHAIM متوسط مساحة تحت منحنى التشغيل (AUC) يبلغ 91.3% عبر خمس مهام تنبؤية سريرية حرجة، مما يمثل تحسينًا كبيرًا عن AUC البالغ 79.9% لإطار HAIM الأصلي. يُعزى هذا التحسين إلى قدرة خط الأنابيب على إنشاء ملخصات مرضى خاصة بالمهام، مما يبسط بيانات المدخلات للنمذجة التنبؤية ويحسن فعالية الضبط الدقيق. من الجدير بالذكر أن الإطار يتفوق في مهام اكتشاف الأمراض، مع تحقيق مكاسب كبيرة في تشخيص الانصباب الجنبي، وتضخم القلب، والالتهاب الرئوي.

بالإضافة إلى تحسينات الأداء، يوفر xHAIM تفسيرات عالية الجودة وقابلة للتفسير لتنبؤاته، مما يعالج تحديات شفافية النموذج في الإعدادات السريرية. يستخدم الإطار عملية تقييم من مرحلتين باستخدام نماذج لغوية كبيرة (LLMs) لتقييم جودة التفسير عبر أبعاد مثل دقة الاقتباس والصحة الواقعية، مما يظهر توافقًا قويًا مع تقييمات الخبراء البشريين. من خلال تأصيل التفسيرات في سجلات المرضى المحددة والمعرفة الطبية ذات الصلة، يعزز xHAIM ثقة الأطباء ويسهل التعاون الفعال بين الإنسان والذكاء الاصطناعي. بشكل عام، لا يحسن الإطار دقة التنبؤ السريري فحسب، بل يعالج أيضًا التحديات الحرجة في معالجة البيانات وتوليد التفسيرات، مما يضعه كأداة قيمة لتعزيز رعاية المرضى في بيئات الرعاية الصحية المعقدة.

Journal: npj Digital Medicine, Volume: 9, Issue: 1
DOI: https://doi.org/10.1038/s41746-025-02298-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41495177
Publication Date: 2026-01-06
Author(s): Periklis Petridis et al.
Primary Topic: Machine Learning in Healthcare

Overview

The research introduces xHAIM (Explainable HAIM), an advancement of the previously established HAIM (Holistic AI in Medicine) framework, which integrates multimodal data for clinical tasks but lacks explainability and task-specific data utilization. xHAIM addresses these shortcomings by employing Generative AI in a structured four-step process: (1) identifying relevant patient data across various modalities, (2) generating detailed patient summaries, (3) enhancing predictive modeling with these summaries, and (4) providing clinical explanations that connect predictions to patient-specific medical knowledge.

Evaluated on the HAIM-MIMIC-MM dataset, xHAIM demonstrates a significant improvement in predictive performance, increasing the average area under the curve (AUC) from 79.9% to 91.3% for tasks related to chest pathology and operative procedures. This framework effectively transforms AI from a black-box predictor into an explainable decision support system, allowing clinicians to trace predictions back to pertinent patient data, thereby enhancing the clinical applicability of AI technologies in medicine.

Introduction

The introduction of the research paper highlights the rapid advancement of machine learning (ML) in healthcare over the past two decades, emphasizing its potential in critical applications despite initial skepticism regarding safety and interpretability. Recent developments in model safety, fairness, and the emergence of foundational models (FMs) and large language models (LLMs) have sparked interest among healthcare professionals. However, significant barriers remain, with only 5% of healthcare organizations successfully implementing AI solutions in clinical practice. Key challenges include inadequate predictive performance and a lack of explainability, which undermine the credibility of ML models among practitioners.

To address these limitations, the authors propose xHAIM (Explainable Holistic AI in Medicine), an extension of the existing HAIM framework that integrates generative AI to enhance both predictive performance and explainability. The xHAIM framework employs a four-step process: identifying relevant patient data, generating concise clinical summaries, improving predictive performance through these summaries, and providing grounded explanations for predictions. This hybrid approach aims to combine the strengths of traditional discriminative models with the user-friendly capabilities of generative AI, thereby enhancing clinical utility and interpretability. The paper promises to present experimental evidence supporting xHAIM’s effectiveness, along with a detailed methodology and discussion of its implications for clinical practice.

Methods

The xHAIM framework enhances the original HAIM pipeline through a structured four-step process aimed at improving predictive performance and interpretability. The first step involves identifying and retrieving patient information pertinent to the clinical task. The second step generates concise, task-specific summaries of this information. In the third step, these summaries are integrated across various data modalities and utilized as inputs for a predictive model. Finally, the framework produces interpretable explanations that elucidate the model’s predictions.

This study utilized the publicly available MIMIC-IV database, with access granted under credentialed training and data use agreements approved by the PhysioNet credentialing process. Importantly, the research does not include any additional information regarding human participants.

Results

In this section, the authors present the experimental results of their study in two distinct parts. The first part focuses on comparing the predictive performance of xHAIM against two baselines: zero-shot learning and the traditional HAIM model. This comparison aims to demonstrate the efficacy of xHAIM in making accurate predictions in scenarios where limited training data is available.

The second part of the results evaluates the explainability of xHAIM. This evaluation is conducted through two methodologies: manual annotation, where human evaluators assess the model’s outputs, and an automated approach utilizing a large language model (LLM) as a judge. The findings from these evaluations are intended to highlight the model’s ability to provide interpretable and understandable predictions, thereby enhancing its practical applicability in real-world scenarios.

Discussion

The xHAIM framework represents a significant advancement in clinical AI by integrating generative and discriminative models to enhance predictive performance and explainability in medical tasks. Utilizing the HAIM-MIMIC-MM dataset, which includes diverse patient data from the MIMIC-IV critical care database, xHAIM achieved an average area under the receiver operating characteristic curve (AUC) of 91.3% across five critical clinical prediction tasks, marking a substantial improvement from the original HAIM framework’s 79.9% AUC. This enhancement is attributed to the pipeline’s ability to generate task-specific patient summaries, which streamline the input data for predictive modeling and improve the effectiveness of fine-tuning. Notably, the framework excels in pathology detection tasks, with significant gains observed in pleural effusion, cardiomegaly, and pneumonia diagnoses.

In addition to performance improvements, xHAIM provides high-quality, interpretable explanations for its predictions, addressing the challenges of model transparency in clinical settings. The framework employs a two-stage evaluation process using large language models (LLMs) to assess explanation quality across dimensions such as citation accuracy and factual correctness, demonstrating strong alignment with human expert evaluations. By grounding explanations in specific patient records and relevant medical knowledge, xHAIM enhances clinician trust and facilitates effective human-AI collaboration. Overall, the framework not only improves clinical prediction accuracy but also addresses critical challenges in data processing and explanation generation, positioning itself as a valuable tool for enhancing patient care in complex healthcare environments.