التعرف على المشاعر المعتمد على EEG باستخدام CNN الديناميكية متعددة المقاييس والمحولات المغلقة EEG-based emotion recognition using multi-scale dynamic CNN and gated transformer

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-82705-z
PMID: https://pubmed.ncbi.nlm.nih.gov/39733023
تاريخ النشر: 2024-12-28
المؤلف: Zhuoling Cheng وآخرون
الموضوع الرئيسي: التعرف على العواطف والمزاج

نظرة عامة

يقدم هذا القسم طريقة جديدة للتعرف على المشاعر من إشارات EEG، تُسمى MSDCGTNet، والتي تدمج شبكة CNN ديناميكية متعددة المقاييس 1D ومشفّر المحولات المغلق. تبدأ الطريقة بشبكة CNN الديناميكية متعددة المقاييس لاستخراج ميزات مكانية وطيفية معقدة من بيانات EEG الخام، مما يقلل بشكل فعال من فقدان المعلومات وتكاليف الحساب المرتبطة بالتحويلات الزمنية الترددية. بعد ذلك، يلتقط مشفّر المحولات المغلق الاعتماديات العالمية داخل إشارات EEG، مما يحسن استخدام الموارد من خلال المعالجة المتوازية وآليات الانتباه الذاتي متعددة الرؤوس المعززة. بالإضافة إلى ذلك، يتم استخدام شبكة الالتفاف الزمنية لاستخراج الميزات الزمنية، مما يؤدي إلى وحدة تصنيف تستخدم SoftMax للتعرف على حالة المشاعر.

تم تقييم الطريقة المقترحة بدقة على ثلاثة مجموعات بيانات متاحة للجمهور: DEAP وSEED وSEED_IV، مما أسفر عن دقة وكفاءة عالية في التعرف على المشاعر. تشير النتائج إلى أن MSDCGTNet قوي ومناسب لمجموعة متنوعة من التطبيقات العملية في أنظمة واجهة الدماغ-الكمبيوتر (BCI). ستهدف الأبحاث المستقبلية إلى الاستفادة من هذا النموذج لتحليل إشارات EEG بشكل أعمق وتطبيقاته المحتملة في تشخيص الأمراض وتقييم الصحة النفسية.

طرق

تدمج الطريقة المقترحة للتعرف على المشاعر من بيانات EEG الخام، المشار إليها بشبكة المحولات الديناميكية متعددة المقاييس (MSDCGTNet)، عدة مكونات لتعزيز استخراج الميزات والتصنيف. تتكون المنهجية من أربعة أجزاء رئيسية: (1) شبكة عصبية تلافيفية ديناميكية متعددة المقاييس 1D (1DCNN) تلتقط الخصائص المحلية من إشارات EEG؛ (2) مشفّر المحولات المغلق الذي يركز على استخراج الميزات العالمية؛ (3) شبكة الالتفاف الزمنية (TCN) التي تحدد ميزات السلاسل الزمنية التمييزية مع دمج السياق التاريخي؛ و(4) طبقة SoftMax التي تحدد التنبؤ النهائي من خلال اختيار الفئة ذات الاحتمالية الأعلى.

تم إجراء الإعداد التجريبي على خادم Dell 5810 مزود بمعالج Intel Xeon E5-2660، وذاكرة RAM سعة 64 جيجابايت، وبطاقة رسومات NVIDIA 2080Ti، يعمل على CentOS 7.0. استخدم التدريب خوارزمية تحسين آدم مع دالة خسارة الانتروبيا المتقاطعة، المنفذة في إطار عمل PyTorch. تم تدريب النموذج بحجم دفعة قدره 20 عبر ثلاث مجموعات بيانات، مع تكوينات محددة لمعدلات التعلم، والحقب، وتآكل الوزن مصممة لكل مجموعة بيانات: لمجموعة بيانات DEAP، تم استخدام معدل تعلم قدره $1 \times 10^{-5}$، و300 حقبة، وتآكل وزن قدره 0.01؛ بينما لمجموعتي بيانات SEED وSEED_IV، تم تطبيق معدل تعلم قدره $1 \times 10^{-4}$، و200 حقبة، وتآكل وزن قدره $1 \times 10^{-4}$.

نقاش

في قسم النقاش من ورقة البحث، يقدم المؤلفون تحليلًا شاملاً لشبكتهم المقترحة من المحولات الديناميكية متعددة المقاييس (MSDCGTNet) للتعرف على المشاعر المستندة إلى EEG. يصفون إشارات EEG كبيانات ثنائية الأبعاد تتميز بأبعاد مكانية (قنوات الأقطاب الكهربائية) وزمنية (نقاط العينة)، مما يستلزم تقسيمها إلى نوافذ زمنية لالتقاط معلومات عاطفية متغيرة. يهدف نموذج MSDCGTNet إلى تصنيف هذه المقاطع من EEG بشكل فعال من خلال رسم مصفوفة بيانات الإدخال $X_i$ (بحجم $C \times T$) إلى فئات المشاعر المقابلة $y_i$.

ينتقد المؤلفون الطرق الحالية التي تعتمد على التحويلات الزمنية الترددية، مشيرين إلى مشكلات مثل فقدان المعلومات، والطلبات الحسابية العالية، والاستخراج غير الكافي للميزات من إشارات EEG غير الثابتة. للتغلب على هذه القيود، يقدمون شبكة CNN ديناميكية متعددة المقاييس 1D تستخرج الميزات مباشرة من بيانات EEG الخام باستخدام نوى تلافيفية بأحجام مختلفة، مما يلتقط الخصائص الترددية الديناميكية. بالإضافة إلى ذلك، يتم استخدام مشفّر المحولات المغلق لتعزيز قدرة النموذج على تعلم الاعتماديات العالمية داخل بيانات EEG، باستخدام آلية انتباه ذاتي متعددة الرؤوس ووحدة خطية مغلقة (GLU) لتحسين اختيار الميزات وكفاءة الحساب. تختتم الورقة بالقول إن نهجهم لا يحقق فقط دقة تصنيف متفوقة مقارنة بالطرق الحالية، ولكنه أيضًا يحافظ على الكفاءة التشغيلية، مما يجعله مناسبًا لتطبيقات التعرف على المشاعر في الوقت الحقيقي.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-82705-z
PMID: https://pubmed.ncbi.nlm.nih.gov/39733023
Publication Date: 2024-12-28
Author(s): Zhuoling Cheng et al.
Primary Topic: Emotion and Mood Recognition

Overview

This section presents a novel end-to-end emotion recognition method from EEG signals, termed MSDCGTNet, which integrates a Multi-Scale Dynamic 1D CNN and a Gated Transformer Encoder. The method begins with the Multi-Scale Dynamic CNN to extract complex spatial and spectral features from raw EEG data, effectively minimizing information loss and computational costs associated with time-frequency conversions. The Gated Transformer Encoder then captures global dependencies within the EEG signals, optimizing resource use through parallel processing and enhanced multi-head self-attention mechanisms. Additionally, a Temporal Convolution Network is employed to extract temporal features, culminating in a classification module that utilizes SoftMax for emotion state recognition.

The proposed method was rigorously evaluated on three publicly available datasets: DEAP, SEED, and SEED_IV, yielding high accuracy and efficiency in emotion recognition. The findings indicate that MSDCGTNet is robust and suitable for various practical applications in Brain-Computer Interface (BCI) systems. Future research will aim to leverage this model for deeper EEG signal analysis and its potential applications in disease diagnosis and mental health assessment.

Methods

The proposed method for emotion recognition from raw EEG data, referred to as Multi-Scale Dynamic Gated Transformer Network (MSDCGTNet), integrates several components to enhance feature extraction and classification. The methodology consists of four key parts: (1) a Multi-Scale Dynamic 1D Convolutional Neural Network (1DCNN) that captures local characteristics from the EEG signals; (2) a Gated Transformer Encoder that focuses on extracting global features; (3) a Temporal Convolutional Network (TCN) that identifies discriminative time series features while incorporating historical context; and (4) a SoftMax layer that determines the final prediction by selecting the class with the highest probability.

The experimental setup was conducted on a Dell 5810 server equipped with an Intel Xeon E5-2660 processor, 64GB of RAM, and an NVIDIA 2080Ti graphics card, running on CentOS 7.0. The training utilized the Adam optimization algorithm with a cross-entropy loss function, implemented in the PyTorch framework. The model was trained with a batch size of 20 across three datasets, with specific configurations for learning rates, epochs, and weight decay tailored to each dataset: for the DEAP dataset, a learning rate of $1 \times 10^{-5}$, 300 epochs, and a weight decay of 0.01 were used; while for the SEED and SEED_IV datasets, a learning rate of $1 \times 10^{-4}$, 200 epochs, and a weight decay of $1 \times 10^{-4}$ were applied.

Discussion

In the discussion section of the research paper, the authors present a comprehensive analysis of their proposed Multi-Scale Dynamic Convolutional Gated Transformer Network (MSDCGTNet) for EEG-based emotion recognition. They describe EEG signals as two-dimensional data characterized by spatial (electrode channels) and temporal (sampling points) dimensions, which necessitate segmentation into time windows to capture varying emotional information. The MSDCGTNet model aims to effectively classify these EEG segments by mapping the input data matrix $X_i$ (of size $C \times T$) to corresponding emotion categories $y_i$.

The authors critique existing methods that rely on time-frequency transformations, highlighting issues such as information loss, high computational demands, and inadequate feature extraction from non-stationary EEG signals. To overcome these limitations, they introduce a Multi-Scale Dynamic 1D CNN that extracts features directly from raw EEG data using convolutional kernels of varying sizes, thereby capturing dynamic frequency characteristics. Additionally, the Gated Transformer Encoder is employed to enhance the model’s ability to learn global dependencies within the EEG data, utilizing a multi-head self-attention mechanism and a Gated Linear Unit (GLU) to improve feature selection and computational efficiency. The paper concludes that their approach not only achieves superior classification accuracy compared to existing methods but also maintains operational efficiency, making it suitable for real-time emotion recognition applications.