الدقة التشخيصية لـ ChatGPT في تحديد مواقع بؤر عدم انتظام ضربات القلب القلبية بناءً على تخطيط القلب الكهربائي ذو الـ 12 قناة قبل الاستئصال بالقسطرة Diagnostic accuracy of ChatGPT for 12-lead ECG-based localisation of ventricular ectopic foci prior to catheter ablation

المجلة: Frontiers in Medicine، المجلد: 12
DOI: https://doi.org/10.3389/fmed.2025.1685419
PMID: https://pubmed.ncbi.nlm.nih.gov/41601756
تاريخ النشر: 2026-01-12
المؤلف: Zhenyun Du وآخرون
الموضوع الرئيسي: اضطرابات نظم القلب والعلاجات

نظرة عامة

في هذه الدراسة التجريبية، تم تقييم دقة تشخيص ChatGPT في تحديد بؤر عدم انتظام ضربات القلب البطيني (VE) باستخدام بيانات تخطيط القلب الكهربائي (ECG) في مجموعة من 50 بالغًا يخضعون لعملية استئصال VE للمرة الأولى. كانت الدراسة تهدف إلى تحديد ما إذا كان بإمكان ChatGPT تقديم تحديدات ذات صلة سريريًا مقارنةً بالتخطيط الكهربائي التشريحي الغازي، الذي كان بمثابة المعيار المرجعي. تم تكليف النموذج بتحديد أحد الأصول التشريحية الخمسة (مسار تدفق البطين الأيمن (RVOT)، مسار تدفق البطين الأيسر (LVOT)، العضلة الحليمية، الفاسيكولار، والإيبيكارديال) بناءً على أوصاف نصية منظمة لشكل QRS المستمدة من تخطيط القلب الكهربائي ذو 12 قناة.

أشارت النتائج إلى أن ChatGPT حدد بشكل صحيح بؤر VE في 17 حالة فقط من أصل 50 (34%)، مع قيمة كوهين κ تبلغ -0.02، مما يشير إلى عدم وجود اتفاق مع المعيار المرجعي. كانت الحساسية والنوعية لـ RVOT 40% و55%، على التوالي، بينما كانت لـ LVOT 36% و62%. ومن الجدير بالذكر أنه لم يتم إجراء أي توقعات صحيحة للأصول الفاسيكولار أو الإيبيكارديال. كانت أداء ChatGPT متسقة بغض النظر عن وجود مرض قلبي هيكلي، ولم تؤثر على مدة الإجراء أو معدل نجاح الاستئصال الحاد (96%). تسلط النتائج الضوء على أن نماذج اللغة العامة الحالية، مثل ChatGPT، تفتقر إلى التدريب المتخصص في المجال والقدرات متعددة الوسائط لمساعدتها بشكل فعال في تخطيط الإجراءات لاستئصال VE، مما يشير إلى الحاجة إلى نماذج مستقبلية لدمج التدريب المتخصص لتعزيز فائدتها في علم كهرباء القلب.

مقدمة

تسلط المقدمة الضوء على الأهمية السريرية للمركبات البطينية المبكرة (PVCs) وعدم انتظام ضربات القلب البطيني مجهول السبب (VE)، والتي غالبًا ما يتم استئصالها عند ظهور الأعراض أو ارتباطها بتسرع القلب واعتلال عضلة القلب. إن التحديد الدقيق لبؤرة عدم انتظام ضربات القلب أمر حاسم لتحسين إجراءات استئصال القسطرة، ومع ذلك، تظهر الطرق التقليدية المعتمدة على تخطيط القلب الكهربائي ذو 12 قناة (ECGs) دقة متوسطة فقط (حوالي 55-70%) في التمييز بين PVCs الجانبية اليمنى واليسرى وتحديد البؤر غير الخارجة، مثل العضلات الحليمية وقمة البطين الأيسر (LV). هذه القيود واضحة بشكل خاص في الحالات التي تتضمن VEs العضلية الحليمية والإيبيكارديال، والتي يصعب التنبؤ بها عبر تخطيط القلب السطحي وترتبط بمعدلات نجاح استئصال أقل.

أظهرت التطورات الأخيرة في الذكاء الاصطناعي (AI)، وخاصة نماذج التعلم العميق، وعدًا في تحديد مخرجات VE بدقة عالية، خاصة في المناطق التشريحية المعقدة. ومع ذلك، تتطلب هذه النماذج مجموعات بيانات موسومة واسعة وبرامج مخصصة، مما يعيق تطبيقها السريري الفوري. بالمقابل، تقدم نماذج اللغة الكبيرة (LLMs) مثل ChatGPT نهجًا جديدًا من خلال معالجة أوصاف ECG النصية العادية لتوليد مخرجات احتمالية دون الحاجة إلى ترميز موسع. تشير التحقيقات الأولية إلى أن ChatGPT يمكن أن يقدم توصيات متوافقة مع الإرشادات في علم القلب، ومع ذلك، فإن قدرته على تفسير شكل ECG التفصيلي لا تزال غير مختبرة. لمعالجة هذه الفجوة، يقترح المؤلفون دراسة مستقبلية لتقييم دقة تحديد بؤر VE بواسطة ChatGPT مقارنةً بالمعيار الذهبي للتخطيط الكهربائي التشريحي الغازي، مع فرضية أن ChatGPT سيتجاوز خوارزميات ECG التقليدية ويحقق اتفاقًا مقبولًا سريريًا (κ ≥ 0.6) مع نتائج دراسة كهرباء القلب.

الطرق

استخدمت هذه الدراسة تصميمًا تجريبيًا لتقييم دقة التشخيص في مركز واحد، ملتزمةً بإرشادات معايير الإبلاغ عن دقة التشخيص (STARD). تم إجراء الدراسة في مركز عدم انتظام ضربات القلب بكلية الطب بجامعة سيلجوك، وهو مؤسسة إحالة ثلاثية مزودة بتقنيات تخطيط متقدمة، بما في ذلك CARTO 3 (Biosense Webster)، EnSite Precision (Abbott)، وRhythmia (Boston Scientific). امتد فترة تسجيل المشاركين من 1 يناير إلى 30 يونيو 2025، واستمرت لمدة ستة أشهر.

النتائج

في هذه الدراسة، تم تحليل 50 مريضًا متتاليًا (متوسط العمر 43 ± 14 عامًا؛ 58% نساء) لتحديد موقع عدم انتظام ضربات القلب البطيني (VE) باستخدام التخطيط الكهربائي التشريحي (EAM) وChatGPT. كان هناك مرض قلبي هيكلي (SHD) لدى 24% من المشاركين، مع متوسط نسبة قذف البطين الأيسر 58 ± 6%. كانت نسبة الحمل من انقباضات البطين المبكرة (PVC) قبل الإجراء 17%، وكان 36% من المرضى يتلقون علاجًا بمثبطات بيتا. تم تحديد بؤرة VE بشكل أساسي في مسار تدفق البطين الأيمن (RVOT، 60%) ومسار تدفق البطين الأيسر (LVOT، 22%). كانت عملية الاستئصال الحاد ناجحة في 96% من الحالات.

كان أداء ChatGPT في توقع الأصل التشريحي لـ VE ضعيفًا بشكل ملحوظ، حيث حقق تحديدًا صحيحًا في 34% فقط من الحالات، مع قيمة كوهين κ تبلغ -0.02، مما يشير إلى عدم وجود اتفاق أفضل من الصدفة مع نتائج EAM. كانت حساسية ونوعية النموذج لـ RVOT 40% و55%، على التوالي، بينما كانت لـ LVOT 36% و62%. بالمقابل، حقق أخصائي كهرباء القلب (EP) دقة إجمالية تبلغ 72% مع اتفاق معتدل (κ = 0.52) مقارنةً بالمعيار الذهبي لـ EAM. كان أداء EP أفضل بكثير، خاصة بالنسبة للبؤر الخارجة، مما يبرز قيود ChatGPT في التطبيقات السريرية لتحديد VE، خاصةً بالنسبة للمواقع غير الخارجة.

المناقشة

تؤكد قسم المناقشة في ورقة البحث على الأهمية الحاسمة لتحديد بؤر عدم انتظام ضربات القلب البطيني (VE) بدقة قبل إجراءات استئصال القسطرة. شملت الدراسة 50 بالغًا يخضعون لاستئصال للمرة الأولى بسبب VE العرضي، مع معايير إدراج واستبعاد صارمة لضمان موثوقية النتائج. يبرز المؤلفون أن التحديد الدقيق يمكن أن يؤثر بشكل كبير على كفاءة الإجراء، ويقلل من التعرض للأشعة السينية، ويعزز معدلات نجاح الاستئصال بشكل عام. أظهرت الطرق التقليدية لتقدير أصول VE، والتي تعتمد أساسًا على معايير ECG السطحية، دقة متوسطة، خاصة في الحالات المعقدة. بالمقابل، وجدت الدراسة أن نموذج ChatGPT حقق دقة إجمالية تبلغ 34% فقط ودقة متوازنة تبلغ 20.2% في تحديد أصول VE، وهو ما يقل بشكل كبير عن أداء أخصائيي كهرباء القلب ذوي الخبرة الذين حققوا دقة تبلغ 72%.

يناقش المؤلفون قيود استخدام نموذج لغة عام مثل ChatGPT لهذه المهمة المتخصصة، مشيرين إلى أنه يفتقر إلى القدرة على تحليل الإشارات الكمية ويعتمد على أنماط النص المتعلمة بدلاً من التفسير المباشر لـ ECG. يجادلون بأنه بينما أداء ChatGPT في تحديد VE غير كافٍ حاليًا، هناك إمكانية للتحسينات المستقبلية من خلال دمج التدريب المتخصص في المجال، والهياكل متعددة الوسائط، والاقتران بخوارزميات التخطيط المخصصة. تختتم الورقة بتحذير من الاعتماد المفرط على أدوات الذكاء الاصطناعي في البيئات السريرية، داعيةً إلى استخدامها كملحقات للطرق المعمول بها بدلاً من استبدالها، مع التأكيد على الحاجة إلى إشراف بشري لتقليل المخاطر المرتبطة بتحيز الأتمتة.

القيود

تقدم الدراسة عدة قيود يجب أخذها في الاعتبار عند تفسير نتائجها. أولاً، تم إجراء البحث في مركز واحد مع مجموعة صغيرة من 50 مريضًا، تركز بشكل أساسي على أصول مسار تدفق البطين الأيمن (RVOT) ومسار تدفق البطين الأيسر (LVOT)، مما يحد من القوة الإحصائية وقابلية تعميم النتائج على مجموعات سكانية أوسع ذات بؤر عدم انتظام ضربات القلب البطيني مجهول السبب (VE) متنوعة. إن استبعاد المرضى الذين لديهم VEs غير قابلة للتحفيز أو متعددة البؤر يقدم تحيزًا في الاختيار، مما قد يغير من الأداء المدرك لـ ChatGPT في الممارسة السريرية. بالإضافة إلى ذلك، تم إنشاء أوصاف ECG بواسطة أخصائي كهرباء قلب واحد دون تقييم تكرارية المراقب، مما قد يؤثر على تفسير مخرجات ChatGPT.

علاوة على ذلك، استخدمت الدراسة هيكلًا ثابتًا للمطالبات ومعلمات لـ ChatGPT (الإصدار 4o، لقطة مارس 2025؛ درجة الحرارة = 0.2، الحد الأقصى من الرموز = 128) دون استكشاف التغييرات في هندسة المطالبات أو تكوينات النموذج. يحد هذا النقص في تحسين المعلمات الفائقة من قابلية تطبيق النتائج على إعدادات محتملة أخرى. كما لم تعيد الدراسة تنفيذ خوارزميات تحديد موقع ECG المعروفة أو تقيس القيمة المضافة للبيانات الديموغرافية أو بيانات التصوير، بل اعتمدت بدلاً من ذلك على مقاييس منشورة للمقارنة. تؤكد هذه القيود على ضرورة إجراء دراسات أكبر ومتعددة المراكز مع منهجيات صارمة للتحقق من الفائدة السريرية لنماذج اللغة العامة في تحديد VE.

Journal: Frontiers in Medicine, Volume: 12
DOI: https://doi.org/10.3389/fmed.2025.1685419
PMID: https://pubmed.ncbi.nlm.nih.gov/41601756
Publication Date: 2026-01-12
Author(s): Zhenyun Du et al.
Primary Topic: Cardiac Arrhythmias and Treatments

Overview

In this pilot study, the diagnostic accuracy of ChatGPT for localizing ventricular ectopic (VE) foci using ECG data was evaluated in a cohort of 50 adults undergoing first-time VE ablation. The study aimed to determine whether ChatGPT could provide clinically relevant localization compared to invasive electroanatomical mapping, which served as the reference standard. The model was tasked with identifying one of five anatomical origins (right ventricular outflow tract (RVOT), left ventricular outflow tract (LVOT), papillary muscle, fascicular, and epicardial) based on structured textual descriptions of QRS morphology derived from 12-lead ECGs.

Results indicated that ChatGPT correctly localized VE foci in only 17 out of 50 cases (34%), with a Cohen’s κ of -0.02, suggesting no agreement with the reference standard. The sensitivity and specificity for RVOT were 40% and 55%, respectively, while for LVOT, they were 36% and 62%. Notably, no correct predictions were made for fascicular or epicardial origins. The performance of ChatGPT was consistent regardless of the presence of structural heart disease, and it did not impact the duration of the procedure or the acute ablation success rate (96%). The findings highlight that current general-purpose language models, like ChatGPT, lack the necessary domain-specific training and multimodal capabilities to assist effectively in procedural planning for VE ablation, indicating a need for future models to integrate specialized training to enhance their utility in electrophysiology.

Introduction

The introduction highlights the clinical significance of premature ventricular complexes (PVCs) and idiopathic ventricular ectopy (VE), which are often ablated when symptomatic or linked to tachycardia and cardiomyopathy. Accurate localization of the arrhythmic focus is crucial for optimizing catheter ablation procedures, yet conventional methods relying on 12-lead electrocardiograms (ECGs) demonstrate only moderate accuracy (approximately 55-70%) in differentiating between right and left-sided PVCs and identifying non-outflow foci, such as papillary muscles and the left ventricular (LV) summit. These limitations are particularly evident in cases involving papillary-muscle and epicardial VEs, which are challenging to predict via surface ECG and correlate with lower ablation success rates.

Recent advancements in artificial intelligence (AI), particularly deep-learning models, have shown promise in localizing VE exits with high precision, especially in complex anatomical regions. However, these models require extensive labeled datasets and custom software, hindering their immediate clinical application. In contrast, large language models (LLMs) like ChatGPT offer a novel approach by processing plain-text ECG descriptions to generate probabilistic outputs without the need for extensive coding. Preliminary investigations indicate that ChatGPT can provide guideline-concordant recommendations in cardiology, yet its capability to interpret detailed ECG morphology remains untested. To address this gap, the authors propose a prospective study to quantitatively assess the localization accuracy of VE foci by ChatGPT against the gold standard of invasive electro-anatomical mapping, hypothesizing that ChatGPT will surpass traditional ECG algorithms and achieve a clinically acceptable agreement (κ ≥ 0.6) with electrophysiology study results.

Methods

This study employed a single-centre, exploratory diagnostic accuracy pilot design, adhering to the Standards for Reporting Diagnostic Accuracy (STARD) guidelines. It was conducted at the Selçuk University Faculty of Medicine Arrhythmia Centre, a tertiary referral institution equipped with advanced mapping technologies, including CARTO 3 (Biosense Webster), EnSite Precision (Abbott), and Rhythmia (Boston Scientific). The participant enrolment period spanned from January 1 to June 30, 2025, lasting a total of six months.

Results

In this study, 50 consecutive patients (mean age 43 ± 14 years; 58% women) were analyzed for the localization of ventricular ectopy (VE) using electro-anatomical mapping (EAM) and ChatGPT. Structural heart disease (SHD) was present in 24% of participants, with a mean left ventricular ejection fraction of 58 ± 6%. The median pre-procedural premature ventricular contraction (PVC) burden was 17%, and 36% of patients were on β-blocker therapy. The VE focus was identified predominantly in the right-ventricular outflow tract (RVOT, 60%) and left-ventricular outflow tract (LVOT, 22%). Acute ablation was successful in 96% of cases.

ChatGPT’s performance in predicting the anatomical origin of VE was notably poor, achieving correct identification in only 34% of cases, with a Cohen’s κ of -0.02, indicating no better than chance agreement with EAM findings. The model’s sensitivity and specificity for RVOT were 40% and 55%, respectively, while for LVOT, they were 36% and 62%. In contrast, an electrophysiologist (EP) achieved an overall accuracy of 72% with moderate agreement (κ = 0.52) compared to the EAM gold standard. The EP’s performance was significantly superior, particularly for outflow-tract foci, highlighting ChatGPT’s limitations in clinical applications for VE localization, especially for non-outflow-tract sites.

Discussion

The discussion section of the research paper emphasizes the critical importance of accurately localizing ventricular ectopic foci (VE) prior to catheter ablation procedures. The study involved 50 adults undergoing first-time ablation for symptomatic VE, with strict inclusion and exclusion criteria to ensure the reliability of the findings. The authors highlight that precise localization can significantly influence procedural efficiency, reduce fluoroscopy exposure, and enhance overall ablation success rates. Traditional methods for estimating VE origins, primarily based on surface ECG criteria, have shown moderate accuracy, particularly in complex cases. In contrast, the study found that the ChatGPT model achieved only 34% overall accuracy and 20.2% balanced accuracy in localizing VE origins, which is substantially lower than the performance of experienced electrophysiologists who achieved 72% accuracy.

The authors discuss the limitations of using a general-purpose language model like ChatGPT for this specialized task, noting that it lacks the capability for quantitative signal analysis and relies on learned text patterns rather than direct ECG interpretation. They argue that while ChatGPT’s performance in VE localization is currently inadequate, there is potential for future improvements through the integration of domain-specific training, multimodal architectures, and coupling with dedicated mapping algorithms. The paper concludes by cautioning against over-reliance on AI tools in clinical settings, advocating for their use as adjuncts to established methods rather than replacements, and emphasizing the need for human oversight to mitigate risks associated with automation bias.

Limitations

The study presents several limitations that must be considered when interpreting its findings. Firstly, the research was conducted at a single center with a small cohort of 50 patients, predominantly focusing on right ventricular outflow tract (RVOT) and left ventricular outflow tract (LVOT) origins, which limits the statistical power and generalizability of the results to broader populations with diverse idiopathic ventricular ectopic (VE) foci. The exclusion of patients with non-inducible or multifocal VEs introduces a selection bias, potentially skewing the perceived performance of ChatGPT in clinical practice. Additionally, the ECG descriptors were generated by a single electrophysiologist without assessing inter-observer reproducibility, which could affect the interpretation of ChatGPT’s outputs.

Moreover, the study utilized a fixed prompt structure and parameters for ChatGPT (version 4o, March 2025 snapshot; temperature = 0.2, max tokens = 128) without exploring variations in prompt engineering or model configurations. This lack of systematic hyperparameter optimization limits the applicability of the findings to other potential setups. The study also did not reimplement established ECG localization algorithms or quantify the added value of demographic or imaging data, relying instead on published metrics for comparison. These limitations underscore the necessity for larger, multicenter studies with rigorous methodologies to validate the clinical utility of general-purpose language models in VE localization.