دمج ميزات متعددة التدفقات من محول الرؤية والشبكة العصبية التلافيفية للكشف الدقيق عن نوبات الصرع من إشارات EEG Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals

المجلة: Journal of Translational Medicine، المجلد: 23، العدد: 1
DOI: https://doi.org/10.1186/s12967-025-06862-z
PMID: https://pubmed.ncbi.nlm.nih.gov/40770757
تاريخ النشر: 2025-08-06
المؤلف: Qi Li وآخرون
الموضوع الرئيسي: تخطيط الدماغ وواجهات الدماغ-الكمبيوتر

نظرة عامة

تقدم البحث نموذج CMFViT، وهو نهج جديد للكشف التلقائي عن النوبات باستخدام تخطيط الدماغ الكهربائي (EEG). يدمج هذا النموذج شبكة عصبية تلافيفية (CNN) مع محول رؤية (ViT) من خلال استراتيجية دمج ميزات متعددة الجداول (MSFF)، مما يعالج بفعالية قيود طرق التعلم العميق الحالية في التقاط الميزات المحلية والاعتماديات الزمنية العالمية لإشارات EEG. من خلال تحويل بيانات EEG إلى صور في مجال التردد الزمني عبر تحويل الموجات القابلة للتعديل (TQWT)، يعزز النموذج قدرته التمييزية ويحقق مقاييس أداء مثيرة للإعجاب، بما في ذلك دقة 98.85% على مجموعة بيانات CHB-MIT وتعميم قوي في تقييمات عبر الموضوعات على مجموعة بيانات Kaggle.

تؤكد النتائج الأدوار التكميلية لوحدات CNN وViT، حيث تتقن CNN تحديد الفروق الدقيقة في أنماط النوبات بينما تستفيد ViT من آليات الانتباه الذاتي لنمذجة الاعتماديات بعيدة المدى. لا يتفوق نموذج CMFViT فقط في الحساسية، والخصوصية، ودرجة F1، بل يظهر أيضًا قوة ضد التغيرات في مدة إشارة EEG وعدم توازن الفئات. تبسط بنية النموذج من النهاية إلى النهاية عملية الكشف من خلال القضاء على الحاجة إلى هندسة ميزات معقدة، مما يجعله مناسبًا للتطبيقات السريرية في الوقت الحقيقي. بشكل عام، يمثل نموذج CMFViT تقدمًا كبيرًا في الكشف عن النوبات وله آثار محتملة لمهام أخرى مرتبطة بـ EEG، مثل تصنيف مراحل النوم ومراقبة الحالة الإدراكية.

مقدمة

تناقش مقدمة هذه الورقة البحثية التحديات المرتبطة بتشخيص وعلاج الصرع، وهو اضطراب عصبي يتميز بالنوبات المتكررة والعيوب الإدراكية التي تؤثر بشكل كبير على جودة حياة المرضى. تسلط الورقة الضوء على الدور الحاسم لتخطيط الدماغ الكهربائي (EEG) في مراقبة نشاط الدماغ، مع الإشارة أيضًا إلى التعقيدات المرتبطة بتفسير إشارات EEG بسبب عدم خطيتها وتنوعها عبر المرضى. تعتبر طرق التحليل اليدوي التقليدية غير فعالة، مما يدفع لاستكشاف تقنيات التعلم الآلي والتعلم العميق للكشف التلقائي عن النوبات.

يستعرض المؤلفون مجموعة متنوعة من أساليب التعلم الآلي التقليدية، مثل نماذج دمج الميزات واستخراج الميزات المعتمدة على الانتروبيا، التي حققت درجات متفاوتة من الدقة ولكن غالبًا ما تعتمد على هندسة ميزات معقدة وتكون حساسة للتغيرات الخاصة بالمرضى. بالمقابل، تُقدم نماذج التعلم العميق، وخاصة الشبكات العصبية التلافيفية (CNNs) ومحولات الرؤية (ViTs)، كبدائل أكثر فعالية نظرًا لقدرتها على دمج استخراج الميزات مع التصنيف، مما يعزز القابلية للتعميم ويقلل من التكاليف الحاسوبية. تقترح الورقة نموذجًا جديدًا، يسمى CMFViT، الذي يجمع بين نقاط القوة في CNNs في استخراج الميزات المحلية مع قدرات نمذجة الاعتماديات العالمية لـ ViTs. يستخدم هذا النموذج استراتيجية دمج ميزات متعددة الجداول لدمج الميزات المحلية والعالمية بفعالية، مما يظهر نتائج واعدة في الكشف عن الصرع عبر مجموعات بيانات متعددة. تشمل مساهمات الدراسة تقديم نموذج CMFViT والتحقق منه من خلال تجارب صارمة، مما يبرز إمكانياته لتحسين الدقة والقدرة على التكيف في مهام الكشف عن النوبات.

طرق

في هذا القسم، يوضح المؤلفون المنهجيات المستخدمة في دراستهم، مع التركيز على مجموعة البيانات وبنية نموذج الكشف عن الصرع الخاص بهم، الذي يدمج الشبكات العصبية التلافيفية (CNN)، ومحولات الرؤية (ViT)، ودمج الميزات متعددة المقاييس (MSFF). يعالج النموذج بيانات EEG المعالجة مسبقًا لتمييز الأنشطة بين النوبات وغير النوبات، مستخدمًا بيانات من 22 قناة غير مكررة من مجموعة بيانات CHB-MIT. تكشف النتائج التجريبية عن دقة متوسطة مثيرة للإعجاب تبلغ 98.85% عبر الموضوعات، مع اختلاف الأداء الفردي بسبب عوامل مثل خصائص إشارة EEG وخصائص المرضى. من الجدير بالذكر أن بعض الموضوعات حققت دقة 100%، بينما كانت الحساسية المتوسطة العامة 97.98%. تم التحقق من أداء النموذج من خلال اختبارات إحصائية، مما يدل على اتساق عالٍ عبر الموضوعات.

يقارن المؤلفون أيضًا نموذج CMFViT الخاص بهم مع طرق التعلم العميق الرئيسية الأخرى، بما في ذلك CNN-LSTM، وCNN-Transformer، وLSTM، عبر كل من مجموعات بيانات CHB-MIT وKaggle. تفوق نموذج CMFViT على نظرائه في الدقة، والحساسية، والخصوصية، خاصة في السيناريوهات المعقدة عبر الموضوعات. يسمح دمج استخراج الميزات المحلية من CNNs ونمذجة السياق العالمي من ViTs لنموذج CMFViT بالتقاط الفروق الدقيقة في إشارات EEG بفعالية، مما يعزز دقة تصنيفه وقدرته على التعميم. تؤكد النتائج على قوة النموذج وقدرته على التكيف، مما يؤكد فعاليته في الكشف عن الصرع عبر مجموعات بيانات متنوعة.

نقاش

في هذا القسم، يناقش البحث المنهجيات المستخدمة في جمع البيانات، والمعالجة المسبقة، وتطوير نموذج هجين لتحليل إشارات EEG، مع التركيز بشكل خاص على الكشف عن النوبات. تستخدم الدراسة مجموعة بيانات EEG القشرية CHB-MIT، التي تتضمن تسجيلات من 23 مريضًا يعانون من صرع مقاوم للعلاج، ومجموعة بيانات Kaggle، التي تحتوي على تسجيلات من كل من الأفراد الأصحاء ومرضى الصرع. تشمل خطوات المعالجة المسبقة تقسيم البيانات، وتصفية النطاق الترددي، وتطبيق تحويل الموجات القابلة للتعديل (TQWT) لتحليل التردد الزمني المحسن، مما يعالج الضوضاء ويضمن الحفاظ على خصائص الإشارة. تم تحسين معلمات TQWT بناءً على أبحاث سابقة، وتم تنفيذ تقنية أخذ العينات المتداخلة لتحقيق التوازن بين مجموعة البيانات بين الفترات بين النوبات والفترات النوبية، مما يحسن دقة الكشف عن النوبات.

تدمج بنية النموذج الشبكات العصبية التلافيفية (CNN) ومحولات الرؤية (ViT) للاستفادة من قدرات استخراج الميزات المحلية والعالمية. يركز مكون CNN على التقاط الأنماط المحلية الدقيقة، بينما تعالج ViT الاعتماديات الزمنية بعيدة المدى الضرورية لفهم ديناميات EEG. تدمج طريقة دمج الميزات متعددة الجداول المخرجات من كلا الوحدتين، مما يعزز قدرة النموذج على تمثيل خصائص EEG المعقدة. تستخدم عملية التدريب استراتيجية التحقق المتقاطع بواقع 10 طيات، مما يضمن استخدام مقاييس تقييم قوية مثل الدقة، والحساسية، والخصوصية، ودرجة F1، وAUC لتقييم أداء النموذج. تشير النتائج إلى أن نهج CMFViT الهجين يتفوق بشكل كبير على النماذج المستقلة، خاصة في التعامل مع تعقيدات مجموعات البيانات المتنوعة، مما يوضح فعاليته في مهام الكشف عن النوبات.

Journal: Journal of Translational Medicine, Volume: 23, Issue: 1
DOI: https://doi.org/10.1186/s12967-025-06862-z
PMID: https://pubmed.ncbi.nlm.nih.gov/40770757
Publication Date: 2025-08-06
Author(s): Qi Li et al.
Primary Topic: EEG and Brain-Computer Interfaces

Overview

The research presents the CMFViT model, a novel approach for automated seizure detection using scalp electroencephalography (EEG). This model integrates a Convolutional Neural Network (CNN) with a Vision Transformer (ViT) through a Multi-Stream Feature Fusion (MSFF) strategy, effectively addressing the limitations of existing deep learning methods in capturing local features and global time-series dependencies of EEG signals. By transforming EEG data into time-frequency domain images via the Tunable Q-factor Wavelet Transform (TQWT), the model enhances its discriminative ability and achieves impressive performance metrics, including 98.85% accuracy on the CHB-MIT dataset and strong generalization in cross-subject evaluations on the Kaggle dataset.

The findings underscore the complementary roles of the CNN and ViT modules, with the CNN adept at identifying subtle differences in seizure patterns and the ViT leveraging self-attention mechanisms to model long-range dependencies. The CMFViT model not only excels in sensitivity, specificity, and F1-score but also demonstrates robustness against variations in EEG signal duration and class imbalance. Its end-to-end architecture simplifies the detection process by eliminating the need for complex feature engineering, making it suitable for real-time clinical applications. Overall, the CMFViT model represents a significant advancement in seizure detection and has potential implications for other EEG-related tasks, such as sleep stage classification and cognitive state monitoring.

Introduction

The introduction of this research paper addresses the challenges associated with diagnosing and treating epilepsy, a neurological disorder characterized by frequent seizures and cognitive impairments that significantly impact patients’ quality of life. The paper highlights the critical role of electroencephalography (EEG) in monitoring brain activity, while also noting the complexities involved in interpreting EEG signals due to their nonlinearity and variability across patients. Traditional manual analysis methods are deemed inefficient, prompting the exploration of machine learning and deep learning techniques for automated seizure detection.

The authors review various conventional machine learning approaches, such as feature fusion models and entropy-based feature extraction, which have achieved varying degrees of accuracy but often rely on complex feature engineering and are sensitive to patient-specific variations. In contrast, deep learning models, particularly Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), are presented as more effective alternatives due to their ability to integrate feature extraction with classification, thereby enhancing generalizability and reducing computational costs. The paper proposes a novel model, termed CMFViT, which combines the strengths of CNNs in local feature extraction with the global dependency modeling capabilities of ViTs. This model employs a multistream feature fusion strategy to effectively integrate local and global features, demonstrating promising results in epilepsy detection across multiple datasets. The study’s contributions include the introduction of the CMFViT model and its validation through rigorous experiments, showcasing its potential for improved accuracy and adaptability in seizure detection tasks.

Methods

In this section, the authors detail the methodologies employed in their study, focusing on the dataset and the architecture of their epilepsy detection model, which integrates Convolutional Neural Networks (CNN), Vision Transformers (ViT), and Multi-Scale Feature Fusion (MSFF). The model processes pre-processed EEG data to distinguish between seizure and non-seizure activities, utilizing data from 22 non-duplicated channels of the CHB-MIT dataset. The experimental results reveal an impressive average accuracy of 98.85% across subjects, with individual performances varying due to factors such as EEG signal characteristics and patient demographics. Notably, some subjects achieved 100% accuracy, while the overall average sensitivity was 97.98%. The model’s performance was further validated through statistical tests, indicating high consistency across subjects.

The authors also compare their CMFViT model against other mainstream deep learning methods, including CNN-LSTM, CNN-Transformer, and LSTM, across both the CHB-MIT and Kaggle datasets. The CMFViT model outperformed its counterparts in accuracy, sensitivity, and specificity, particularly in complex cross-subject scenarios. The integration of local feature extraction from CNNs and global contextual modeling from ViTs allows the CMFViT model to effectively capture the nuances of EEG signals, enhancing its classification precision and generalization ability. The findings underscore the model’s robustness and adaptability, confirming its efficacy in epilepsy detection across diverse datasets.

Discussion

In this section, the research discusses the methodologies employed for data sourcing, preprocessing, and the development of a hybrid model for EEG signal analysis, particularly focusing on seizure detection. The study utilizes the CHB-MIT scalp EEG dataset, which includes recordings from 23 refractory epilepsy patients, and the Kaggle dataset, which features recordings from both healthy individuals and epilepsy patients. The preprocessing steps involve segmentation, bandpass filtering, and the application of the Tunable Q-Wavelet Transform (TQWT) for enhanced time-frequency analysis, addressing noise and ensuring the preservation of signal characteristics. The TQWT parameters were optimized based on prior research, and an overlap sampling technique was implemented to balance the dataset between interictal and ictal periods, thereby improving seizure detection accuracy.

The model architecture integrates Convolutional Neural Networks (CNN) and Vision Transformers (ViT) to leverage both local and global feature extraction capabilities. The CNN component focuses on capturing fine-grained local patterns, while the ViT addresses long-range temporal dependencies essential for understanding EEG dynamics. The multi-stream feature fusion method concatenates the outputs from both modules, enhancing the model’s ability to represent complex EEG characteristics. The training process employs a 10-fold cross-validation strategy, ensuring robust evaluation metrics such as accuracy, sensitivity, specificity, F1-score, and AUC are utilized to assess model performance. The results indicate that the hybrid CMFViT approach significantly outperforms standalone models, particularly in handling the complexities of diverse datasets, thus demonstrating its efficacy in seizure detection tasks.