نموذج لغوي كبير لرعاية القلب المعقدة A large language model for complex cardiology care

المجلة: Nature Medicine، المجلد: 32، العدد: 2
DOI: https://doi.org/10.1038/s41591-025-04190-9
PMID: https://pubmed.ncbi.nlm.nih.gov/41652123
تاريخ النشر: 2026-02-01
المؤلف: Jack W. O’Sullivan وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية والتعليم

نظرة عامة

تناقش هذه الفقرة التحديات التي تطرحها ندرة خبرة المتخصصين الفرعيين في أمراض القلب، لا سيما في إدارة الحالات المعقدة مثل اعتلال عضلة القلب الوراثي. لمعالجة هذه المشكلة، تقيم الدراسة فعالية نظام الذكاء الاصطناعي “أرتيكوليت ميديكال إنتليجنس إكسبلورر” (AMIE)، وهو نظام قائم على نموذج لغة كبير، في تعزيز اتخاذ القرارات السريرية. تم إجراء تجربة عشوائية محكومة شملت تسعة أطباء قلب عامين قاموا بإدارة حالات معقدة بمساعدة AMIE ودونها. استخدمت التقييمات مقياسًا مكونًا من عشرة مجالات تم تقييمه من قبل ثلاثة متخصصين فرعيين معزولين، مع التركيز على الجودة في الفرز والتشخيص والإدارة.

تشير النتائج إلى أن المتخصصين الفرعيين فضلوا التقييمات المدعومة من AMIE بنسبة 46.7% مقارنة بـ 32.7% للأطباء وحدهم (P = 0.02)، مع انخفاض كبير في الأخطاء السريرية الهامة (24.3% مقابل 13.1%، P = 0.033) والمحتوى المفقود (37.4% مقابل 17.8%، P = 0.0021) عند استخدام AMIE. بالإضافة إلى ذلك، أفاد أطباء القلب أن AMIE ساعدت في تقييماتهم في أكثر من نصف الحالات (57.0%) وأنقذت الوقت في 50.5% من الحالات. تسلط هذه الدراسة الضوء على إمكانيات أنظمة الذكاء الاصطناعي مثل AMIE في التخفيف من تأثير نقص المتخصصين الفرعيين في أمراض القلب، مما يحسن في النهاية نتائج المرضى في السيناريوهات الطبية المعقدة.

الطرق

في هذه الدراسة، طور المؤلفون AMIE، وهو نظام ذكاء اصطناعي طبي تجريبي قائم على نموذج لغة كبير (LLM)، مبني على “جمني 2.0 فلاش” دون ضبط دقيق خاص بالمجال. يستخدم AMIE إجراء استدلال متعدد الخطوات يتضمن البحث عبر الإنترنت والنقد الذاتي لتخصيص وظائفه لتطبيقات المتخصصين الفرعيين. تشمل المنهجية تجربة عشوائية محكومة (RCT) مصممة لتقييم فعالية AMIE، والتي شملت عدة مراحل: تجنيد وإزالة تعريف البيانات السريرية من مركز متخصص في الأمراض القلبية الوراثية، وتقييم حالات سريرية حقيقية من قبل أطباء القلب العامين بمساعدة AMIE ودونها، والتقييم اللاحق من قبل متخصصين فرعيين كانوا معزولين عن مصدر التقييمات.

قام أطباء القلب العامون بتفسير النصوص السريرية والبيانات الخام من المرضى، بما في ذلك تخطيط القلب الكهربائي (ECGs) ووسائل التصوير المختلفة، مع تلقي نصفهم تقرير التقييم السريري الشامل من AMIE والدعم التفاعلي عبر واجهة ويب. استخدمت الدراسة بيانات من مرضى يشتبه في إصابتهم أو تم تأكيد إصابتهم بأمراض قلبية وراثية، مما يضمن تقييمًا قويًا لقدرات AMIE. التزمت RCT بإرشادات CONSORT وتم تسجيلها في ClinicalTrials.gov (NCT06935253)، مع جعل جميع البيانات متاحة للجمهور لتعزيز الشفافية العلمية وقابلية التكرار.

النتائج

في هذه الدراسة، تم تقييم 107 مرضى متتاليين من قبل طبيبين قلب عامين—أحدهما يستخدم نظام AMIE والآخر لا—مما يوفر رؤى حول فعالية واجهة AMIE (المفصلة في الشكل 1 من البيانات الموسعة). كان العمر الوسيط للمشاركين 59 عامًا، مع نطاق من 18 إلى 96 عامًا. كانت توفر بيانات النصوص السريرية للاختبارات المختلفة كما يلي: التصوير بالرنين المغناطيسي القلبي (CMR) 64 (59.8%)، اختبار الجهد القلبي الرئوي (CPX) 65 (60.7%)، تخطيط صدى القلب عبر الصدر في حالة الراحة (TTE) 90 (84.1%)، تخطيط صدى القلب أثناء الجهد 69 (64.5%)، تخطيط القلب الكهربائي (ECG) 99 (92.5%)، جهاز هولتر المتنقل 79 (73.8%)، والاختبارات الجينية 77 (72.0%).

من الجدير بالذكر أنه من بين 107 مرضى تم تقييمهم، وُجد أن 39 (36.4%) لديهم متغير مصنف على أنه مسبب للمرض أو محتمل أن يكون مسببًا للمرض، وفقًا للمعايير التي وضعتها الكلية الأمريكية للطب الوراثي والجينوميات. تؤكد هذه النتائج على أهمية دمج أدوات التشخيص المتقدمة مثل AMIE في أمراض القلب السريرية، لا سيما في تحديد المتغيرات الجينية التي قد تؤثر على إدارة المرضى ونتائجهم.

المناقشة

في هذه الدراسة، نحقق في دمج نماذج اللغة الكبيرة (LLMs) في التقييمات السريرية لأطباء القلب العامين، لا سيما في سياق تشخيص اعتلالات عضلة القلب الوراثية النادرة. تشير النتائج إلى استقبال إيجابي بشكل عام بين أطباء القلب، حيث أفاد 57.0% أن مساعدة LLM حسنت تقييماتهم السريرية و52.3% أشاروا إلى زيادة الثقة في اتخاذ قراراتهم. تم إدراك أن الذكاء الاصطناعي يعزز الكفاءة، حيث أشار 50.5% من المستجيبين إلى توفير الوقت، ولم يبلغ سوى نسبة صغيرة (6.5%) عن هلوسات سريرية هامة. من المهم أن تكشف تقييمات المتخصصين الفرعيين أن التقييمات المدعومة من LLM كانت تحتوي على عدد أقل بكثير من الأخطاء السريرية الهامة (انخفاض بنسبة 11.2%) والقصور (انخفاض بنسبة 19.6%) مقارنة بالتقييمات غير المدعومة، مع الحفاظ على جودة التفكير السريري المعادلة.

تسلط الدراسة الضوء على إمكانيات LLMs في ديمقراطية الوصول إلى خبرة المتخصصين الفرعيين في أمراض القلب، مما يعالج أزمة القوى العاملة ويحسن نتائج المرضى من خلال مساعدة الأطباء العامين في إدارة الحالات المعقدة. تسهل مجموعة البيانات مفتوحة المصدر التي تم إنشاؤها لهذه الدراسة استكشاف تطبيقات LLM في البيئات السريرية. بينما النتائج واعدة، تؤكد الدراسة على الحاجة إلى تقييم مستمر ودقيق لـ LLMs في الممارسة الطبية لضمان دمجها بشكل آمن وفعال في سير العمل في الرعاية الصحية. بشكل عام، تمثل هذه البحث خطوة مهمة نحو إنشاء إطار لاستخدام LLMs في الرعاية المتخصصة، مع آثار لتحسين جودة وكفاءة التقييم السريري في أمراض القلب.

Journal: Nature Medicine, Volume: 32, Issue: 2
DOI: https://doi.org/10.1038/s41591-025-04190-9
PMID: https://pubmed.ncbi.nlm.nih.gov/41652123
Publication Date: 2026-02-01
Author(s): Jack W. O’Sullivan et al.
Primary Topic: Artificial Intelligence in Healthcare and Education

Overview

The section discusses the challenges posed by the scarcity of subspecialist expertise in cardiology, particularly in the management of complex cases such as genetic cardiomyopathy. To address this issue, the study evaluates the effectiveness of the Articulate Medical Intelligence Explorer (AMIE), a large language model-based artificial intelligence system, in enhancing clinical decision-making. A randomized controlled trial was conducted involving nine general cardiologists who managed complex cases with and without AMIE’s assistance. The evaluation utilized a ten-domain rubric assessed by three blinded subspecialists, focusing on triage, diagnosis, and management quality.

The findings indicate that subspecialists preferred AMIE-assisted assessments 46.7% of the time compared to 32.7% for cardiologists alone (P = 0.02), with a significant reduction in clinically significant errors (24.3% vs. 13.1%, P = 0.033) and missing content (37.4% vs. 17.8%, P = 0.0021) when AMIE was utilized. Additionally, cardiologists reported that AMIE aided their assessments in over half of the cases (57.0%) and saved time in 50.5% of instances. This research highlights the potential of AI systems like AMIE to mitigate the impact of subspecialist shortages in cardiology, ultimately improving patient outcomes in complex medical scenarios.

Methods

In this study, the authors developed AMIE, an experimental large language model (LLM)-based medical AI system, built on Gemini 2.0 Flash without domain-specific fine-tuning. AMIE employs a multistep inference procedure that incorporates web search and self-critique to tailor its functionality for subspecialist applications. The methodology includes a randomized controlled trial (RCT) designed to evaluate AMIE’s effectiveness, which involved several phases: recruitment and de-identification of clinical data from a specialized inherited cardiovascular center, assessment of real clinical cases by general cardiologists with and without AMIE’s assistance, and subsequent evaluation by subspecialists who were blinded to the source of the assessments.

General cardiologists interpreted clinical texts and raw data from patients, including ECGs and various imaging modalities, with half receiving AMIE’s comprehensive clinical assessment report and interactive support via a web interface. The study utilized data from patients with suspected or confirmed inherited cardiovascular diseases, ensuring a robust evaluation of AMIE’s capabilities. The RCT adhered to CONSORT guidelines and is registered at ClinicalTrials.gov (NCT06935253), with all data made publicly available to promote scientific transparency and reproducibility.

Results

In this study, 107 consecutive patients were evaluated by two general cardiologists—one utilizing the AMIE system and the other not—providing insights into the effectiveness of the AMIE interface (detailed in Extended Data Fig. 1). The median age of participants was 59 years, with a range from 18 to 96 years. Clinical text data availability for various tests was as follows: Cardiac Magnetic Resonance (CMR) 64 (59.8%), Cardiopulmonary Exercise Testing (CPX) 65 (60.7%), resting Transthoracic Echocardiography (TTE) 90 (84.1%), exercise TTE 69 (64.5%), Electrocardiogram (ECG) 99 (92.5%), ambulatory Holter monitor 79 (73.8%), and genetic testing 77 (72.0%).

Notably, among the 107 patients assessed, 39 (36.4%) were found to have a variant classified as pathogenic or likely pathogenic, according to the criteria established by the American College of Medical Genetics and Genomics. These findings underscore the significance of integrating advanced diagnostic tools like AMIE in clinical cardiology, particularly in identifying genetic variants that may influence patient management and outcomes.

Discussion

In this study, we investigate the integration of large language models (LLMs) into the clinical assessments of general cardiologists, particularly in the context of diagnosing rare genetic cardiomyopathies. The findings indicate a generally favorable reception among cardiologists, with 57.0% reporting that LLM assistance improved their clinical assessments and 52.3% noting increased confidence in their decision-making. The AI was perceived to enhance efficiency, with 50.5% of respondents indicating time savings, and only a small percentage (6.5%) reported clinically significant hallucinations. Importantly, subspecialist evaluations revealed that LLM-assisted assessments had significantly fewer clinically significant errors (11.2% reduction) and omissions (19.6% reduction) compared to unassisted assessments, while maintaining equivalent clinical reasoning quality.

The study highlights the potential of LLMs to democratize access to subspecialist expertise in cardiology, addressing the workforce crisis and improving patient outcomes by assisting generalists in managing complex cases. The open-source dataset created for this research facilitates further exploration of LLM applications in clinical settings. While the results are promising, the study emphasizes the need for continued rigorous evaluation of LLMs in medical practice to ensure their safe and effective integration into healthcare workflows. Overall, this research represents a significant step toward establishing a framework for the use of LLMs in subspecialty care, with implications for improving clinical assessment quality and efficiency in cardiology.