مسح للمحولين ونماذج اللغة الكبيرة لتشخيص تخطيط القلب: التقدم، التحديات، والاتجاهات المستقبلية A survey of transformers and large language models for ECG diagnosis: advances, challenges, and future directions

المجلة: Artificial Intelligence Review، المجلد: 58، العدد: 9
DOI: https://doi.org/10.1007/s10462-025-11259-x
تاريخ النشر: 2025-06-04
المؤلف: Mohammed Yusuf Ansari وآخرون
الموضوع الرئيسي: مراقبة وتحليل تخطيط القلب الكهربائي

نظرة عامة

يوفر هذا القسم من ورقة البحث نظرة شاملة على تطبيق منهجيات قائمة على المحولات ونماذج اللغة الكبيرة (LLMs) في تحليل تخطيط القلب الكهربائي (ECG). بينما كانت الشبكات العصبية التلافيفية (CNNs) هي النهج السائد لمعالجة إشارات ECG، فإن قيودها في التقاط الاعتماديات الزمنية بعيدة المدى تستدعي اعتماد أطر المحولات التي تستخدم آليات الانتباه الذاتي. تحدد الورقة فجوة كبيرة في الأدبيات بشأن الاستطلاعات المنهجية لهذه المنهجيات المتقدمة، مما يدفع المؤلفين إلى اقتراح تصنيف هرمي جديد لتشخيص ECG يتراوح من تقييمات النبض الفردي إلى تقييمات النبضات المتعددة والإشارات الكاملة.

تسلط المخطوطة الضوء على النتائج الرئيسية، مشيرة إلى أن CNNs تُستخدم بشكل أساسي لتعلم تمثيل ECG، بينما تُفضل الشبكات التنافسية التوليدية (GANs) لتكبير البيانات. كما تناقش الاستخدام الشائع لكتل الانتباه الذاتي متعدد الرؤوس (MHSA) لنمذجة الاعتماديات الزمنية. ومع ذلك، يحدد المؤلفون قيودًا حاسمة، بما في ذلك نقص تقنيات تمثيل ECG المبتكرة، وصيغ الترميز الموضعي غير الكافية، وغياب هياكل الانتباه الذاتي المحددة للمهام. بالنسبة لـ LLMs، تم الإشارة إلى تحديات مثل محدودية القابلية للتعميم ومشكلات الشفافية. تختتم الورقة باقتراح اتجاهات بحثية مستقبلية تهدف إلى تعزيز تمثيل ECG، وتنقيح الترميزات الموضعية، وتطوير هياكل انتباه ذاتي مخصصة، مع التأكيد على الحاجة إلى التعاون بين خبراء التعلم الآلي والممارسين السريريين لضمان التكامل الفعال في سير العمل السريري ومعالجة التحديات التنظيمية.

مقدمة

تناقش مقدمة الورقة الدور الحاسم لتخطيط القلب الكهربائي (ECGs) في مراقبة صحة القلب، مع التأكيد على فائدتها في كل من البيئات السريرية الطارئة والروتينية. تسهل ECGs تقييم وظيفة القلب، خاصة في حالات ألم الصدر أو احتشاء عضلة القلب، وهي ضرورية لاكتشاف الشذوذات غير العرضية لدى المرضى الذين لديهم عوامل خطر قلبية وعائية. على الرغم من استخدامها على نطاق واسع، فإن تحليل ECG التقليدي غالبًا ما يكون ذاتيًا ويستغرق وقتًا طويلاً، مما يؤدي إلى أخطاء تشخيصية محتملة. ظهرت تقنيات التعلم الآلي، وخاصة أشجار القرار والأساليب الجماعية مثل التجميع والتعزيز، لتعزيز دقة التشخيص من خلال أتمتة استخراج الميزات والتحليل. ومع ذلك، لا تزال هذه الأساليب تعتمد على الميزات المصنوعة يدويًا وقد تتجاهل الخصائص الدقيقة لـ ECG.

تسلط الورقة الضوء على التقدم الذي أحرزته التعلم العميق، وخاصة الشبكات العصبية التلافيفية (CNNs)، التي تستخرج الميزات بشكل مستقل من بيانات ECG الخام. بينما تتفوق CNNs في استخراج الميزات المحلية، فإنها تكافح لالتقاط الاعتماديات بعيدة المدى في البيانات التسلسلية. تعالج هياكل المحولات، التي تم تصميمها في الأصل لمعالجة اللغة الطبيعية، هذه القيود من خلال آليات الانتباه الذاتي، مما يسمح بتحليل البيانات الزمنية مثل إشارات ECG. لقد أدت التطورات الأخيرة في نماذج اللغة الكبيرة (LLMs) إلى تعزيز تشخيص ECG من خلال دمج البيانات متعددة الوسائط وتمكين قدرات التعلم بدون عينة. تمهد المقدمة الطريق لاستطلاع شامل للمنهجيات القائمة على المحولات وتطبيقات LLM في تشخيص ECG الآلي، بهدف تحديد ابتكاراتها التقنية، وقيودها، والاتجاهات المستقبلية. ستقوم المراجعة بتصنيف الأساليب، وتلخيص مجموعات بيانات ECG، واقتراح تحسينات لتحسين فعالية وشفافية LLMs في هذا المجال.

الطرق

يستعرض قسم الطرق في هذه الورقة البحثية الأدبيات حول تطبيق المحولات ونماذج اللغة الكبيرة (LLMs) في تصنيف الأمراض بناءً على بيانات تخطيط القلب الكهربائي (ECG). يتم هيكلة المراجعة بشكل هرمي، بدءًا من التحليلات التفصيلية لاكتشاف النبضات غير الطبيعية وتقييمات مستوى الإيقاع في عدم انتظام ضربات القلب. ثم يتم توسيع نطاقها ليشمل حالات أخرى، مثل انقطاع النفس أثناء النوم وأمراض القلب والأوعية الدموية (CVDs)، كما هو موضح في الشكل 3، الذي يوفر جدولًا زمنيًا للدراسات ذات الصلة. بالإضافة إلى ذلك، يسلط القسم الضوء على التقدمات الأخيرة في استخدام LLMs لتحليل ECG والأغراض التشخيصية، مما يبرز أهميتها المتزايدة في هذا المجال.

المناقشة

يوفر قسم المناقشة في الورقة نظرة شاملة على أساسيات تخطيط القلب الكهربائي (ECG) وهياكل المحولات، الضرورية لفهم التحليلات اللاحقة. يبدأ بتفصيل هيكل ووظيفة تخطيط القلب الكهربائي القياسي ذو 12 رصاصًا، والذي يستخدم 10 أقطاب كهربائية لالتقاط النشاط الكهربائي من كل من الأطراف والرؤوس الصدرية. يبرز القسم أهمية أشكال موجات ECG المختلفة—موجة P، ومجمع QRS، وموجة T—في تشخيص الحالات القلبية، مشيرًا إلى كيف يمكن أن تشير الشذوذات في هذه الموجات إلى مشكلات مثل تضخم الأذين، وتأخيرات التوصيل، أو نقص التروية. كما تحدد المناقشة أهمية الفترات مثل فترات PR وST وQT في تقييم صحة القلب.

علاوة على ذلك، ينتقل القسم إلى هياكل المحولات، التي اكتسبت زخمًا في مجالات مختلفة، بما في ذلك تحليل ECG. يشرح كيف تعالج المحولات قيود النماذج التقليدية مثل RNNs وLSTMs من خلال استخدام آليات الانتباه الذاتي التي تسهل المعالجة المتوازية وتلتقط الاعتماديات بعيدة المدى. يتم وصف المكونات الرئيسية لنماذج المحولات، بما في ذلك تضمينات البيانات، والترميز الموضعي، والانتباه الذاتي، والانتباه متعدد الرؤوس، التي تعزز بشكل جماعي قدرة النموذج على تحليل إشارات ECG المعقدة. تختتم المناقشة بتلخيص التقدمات الأخيرة في هياكل المحولات القائمة لتصنيف عدم انتظام ضربات القلب والإيقاع، مشيرة إلى الابتكارات في آليات الانتباه واستخراج الميزات التي تحسن دقة التصنيف مع الاعتراف بالتحديات الحسابية التي قد تقدمها هذه النماذج.

القيود

تناقش هذه section القيود المفروضة على المنهجيات القائمة على المحولات ونماذج اللغة الكبيرة (LLM) لتشخيص ECG، مع تسليط الضوء على تصاميمها المعمارية وأدوارها في سير العمل التشخيصي. يتم تصنيف الأساليب القائمة على المحولات إلى أربعة أنواع رئيسية: (1) تعلم التمثيل من إشارات ECG الخام باستخدام طبقات تلافيفية وكتل الانتباه الذاتي متعدد الرؤوس (MHSA)، (2) تقسيم إشارات ECG إلى موجات فرعية، (3) تحويل الإشارات إلى تمثيلات زمنية ترددية تتم معالجتها بواسطة المحولات البصرية (ViTs)، و(4) تحويل الإشارات إلى صور لمعالجة ViT. في المقابل، تركز الأطر القائمة على LLM على دمج المعرفة الخبرائية مع ميزات ECG لتشخيصات بدون عينة، وتحليل بيانات ECG المنخفضة، وتسهيل التعلم متعدد الوسائط من خلال محاذاة الشروحات النصية مع ميزات ECG.

على الرغم من التقدمات، لا تزال هناك قيود كبيرة. غالبًا ما تعتمد الأساليب القائمة على المحولات على الطبقات التلافيفية لتوليد تضمينات ECG، مما يثير القلق بشأن فعاليتها في التمييز بين فئات ECG المختلفة. قد لا تلتقط استخدام الترميزات الموضعية الجيبية القياسية، التي تم تصميمها في الأصل لمعالجة اللغة الطبيعية، الهياكل المتكررة الموجودة في إشارات ECG بشكل كافٍ، مما قد يعيق قدرة النموذج على الاستفادة من التفاصيل الزمنية والشكلية. علاوة على ذلك، قد لا تتماشى كتل MHSA القياسية مع الخصائص الفريدة لإشارات ECG، مما يعرض للخطر الإفراط في التكيف ويحد من التفسير الدقيق. تواجه المنهجيات القائمة على LLM أيضًا تحديات، بما في ذلك الاعتماد المفرط على الأدبيات الحالية، ونقص الشفافية في استخراج المعرفة، وتجاهل البيانات الوصفية الحيوية، مما قد يؤدي إلى تحيزات في التنبؤات. تسلط هذه القيود الضوء على الحاجة إلى أساليب أكثر تكيفًا ووعيًا بالسياق لتعزيز فعالية تشخيص ECG في البيئات السريرية.

Journal: Artificial Intelligence Review, Volume: 58, Issue: 9
DOI: https://doi.org/10.1007/s10462-025-11259-x
Publication Date: 2025-06-04
Author(s): Mohammed Yusuf Ansari et al.
Primary Topic: ECG Monitoring and Analysis

Overview

This research paper section provides a comprehensive overview of the application of transformer-based methodologies and large language models (LLMs) in electrocardiogram (ECG) analysis. While convolutional neural networks (CNNs) have been the predominant approach for ECG signal processing, their limitations in capturing long-range temporal dependencies necessitate the adoption of transformer frameworks that utilize self-attention mechanisms. The paper identifies a significant gap in the literature regarding systematic surveys of these advanced methodologies, prompting the authors to propose a novel hierarchical taxonomy for ECG diagnosis that ranges from single-beat to multi-beat and full-length signal evaluations.

The manuscript highlights key findings, noting that CNNs are primarily used for ECG representation learning, while generative adversarial networks (GANs) are favored for data augmentation. It also discusses the prevalent use of multi-head self-attention (MHSA) blocks for modeling temporal dependencies. However, the authors identify critical limitations, including a lack of innovative ECG representation techniques, inadequate positional encoding formulations, and the absence of task-specific self-attention structures. For LLMs, challenges such as limited generalizability and transparency issues are noted. The paper concludes by suggesting future research directions aimed at enhancing ECG representation, refining positional encodings, and developing custom self-attention architectures, while emphasizing the need for collaboration between machine learning experts and clinical practitioners to ensure effective integration into clinical workflows and address regulatory challenges.

Introduction

The introduction of the paper discusses the critical role of electrocardiograms (ECGs) in monitoring cardiac health, emphasizing their utility in both emergency and routine clinical settings. ECGs facilitate the assessment of cardiac function, particularly in cases of chest pain or myocardial infarction, and are essential for detecting asymptomatic abnormalities in patients with cardiovascular risk factors. Despite their widespread use, traditional ECG analysis is often subjective and time-consuming, leading to potential diagnostic errors. Machine learning techniques, particularly decision trees and ensemble methods like bagging and boosting, have emerged to enhance diagnostic accuracy by automating feature extraction and analysis. However, these methods still rely on handcrafted features and may overlook subtle ECG characteristics.

The paper highlights the advancements brought by deep learning, particularly convolutional neural networks (CNNs), which autonomously extract features from raw ECG data. While CNNs excel in local feature extraction, they struggle with capturing long-range dependencies in sequential data. Transformer architectures, originally designed for natural language processing, address this limitation through self-attention mechanisms, allowing for the analysis of temporal data like ECG signals. Recent developments in large language models (LLMs) have further advanced ECG diagnosis by integrating multimodal data and enabling zero-shot learning capabilities. The introduction sets the stage for a comprehensive survey of transformer-based methodologies and LLM applications in automated ECG diagnosis, aiming to identify their technical innovations, limitations, and future directions. The review will categorize methods, summarize ECG datasets, and propose enhancements to improve the efficacy and transparency of LLMs in this domain.

Methods

The methods section of this research paper reviews the literature on the application of transformers and large language models (LLMs) in the classification of diseases based on electrocardiogram (ECG) data. The review is structured hierarchically, starting with detailed analyses of abnormal beat detection and rhythm-level assessments in arrhythmias. It subsequently broadens its scope to include other conditions, such as sleep apnea and cardiovascular diseases (CVDs), as illustrated in Figure 3, which provides a timeline of pertinent studies. Additionally, the section highlights recent advancements in the use of LLMs for ECG analysis and diagnostic purposes, underscoring their growing significance in this field.

Discussion

The discussion section of the paper provides a comprehensive overview of electrocardiogram (ECG) fundamentals and transformer architectures, essential for understanding the subsequent analyses. It begins by detailing the structure and function of standard 12-lead ECGs, which utilize 10 electrodes to capture electrical activity from both limb and chest leads. The section emphasizes the significance of various ECG waveforms—P-wave, QRS complex, and T-wave—in diagnosing cardiac conditions, highlighting how abnormalities in these waveforms can indicate issues such as atrial enlargement, conduction delays, or ischemia. The discussion also outlines the importance of intervals like the PR, ST, and QT intervals in assessing cardiac health.

Furthermore, the section transitions to transformer architectures, which have gained traction in various fields, including ECG analysis. It explains how transformers address limitations of traditional models like RNNs and LSTMs by employing self-attention mechanisms that facilitate parallel processing and capture long-range dependencies. Key components of transformer models are described, including data embeddings, positional encoding, self-attention, and multi-head attention, which collectively enhance the model’s ability to analyze complex ECG signals. The discussion concludes by summarizing recent advancements in transformer-based architectures for arrhythmia and rhythm classification, noting innovations in attention mechanisms and feature extraction that improve classification accuracy while acknowledging the computational challenges these models may present.

Limitations

This section discusses the limitations of transformer-based and large language model (LLM)-based methodologies for ECG-driven diagnosis, highlighting their architectural designs and roles in the diagnostic workflow. Transformer-based approaches are categorized into four main types: (1) representation learning from raw ECG signals using convolutional layers and multi-head self-attention (MHSA) blocks, (2) segmentation of ECG signals into sub-waveforms, (3) transformation of signals into time-frequency representations processed by vision transformers (ViTs), and (4) conversion of signals into images for ViT processing. In contrast, LLM-based frameworks focus on integrating expert knowledge with ECG features for zero-shot diagnoses, analyzing downsampled ECG data, and facilitating multimodal learning by aligning textual explanations with ECG features.

Despite advancements, significant limitations remain. Transformer-based methods often rely on convolutional layers for generating ECG embeddings, raising concerns about their effectiveness in distinguishing between different ECG classes. The use of standard sinusoidal positional encodings, originally designed for natural language processing, may not adequately capture the repetitive structures inherent in ECG signals, potentially impairing the model’s ability to leverage temporal and morphological details. Furthermore, standard MHSA blocks may not align with the unique properties of ECG signals, risking overfitting and limiting accurate interpretation. LLM-based methodologies also face challenges, including over-reliance on existing literature, lack of transparency in knowledge extraction, and neglect of critical metadata, which could lead to biases in predictions. These limitations highlight the need for more adaptive and context-aware approaches to enhance the effectiveness of ECG diagnostics in clinical settings.