ERTNet: إطار عمل قابل للتفسير قائم على المحولات للتعرف على مشاعر EEG ERTNet: an interpretable transformer-based framework for EEG emotion recognition

المجلة: Frontiers in Neuroscience، المجلد: 18
DOI: https://doi.org/10.3389/fnins.2024.1320645
PMID: https://pubmed.ncbi.nlm.nih.gov/38298914
تاريخ النشر: 2024-01-17
المؤلف: Ruixiang Liu وآخرون
الموضوع الرئيسي: التعرف على العواطف والمزاج

نظرة عامة

تقدم البحث إطارًا مبتكرًا للتعرف على المشاعر باستخدام إشارات EEG، معالجًا قيود الطرق التقليدية من خلال بنية هجينة تجمع بين الشبكات العصبية التلافيفية (CNN) والمحولات. يتيح هذا النهج عزل الميزات المهمة من بيانات EEG مع تقليل الضوضاء عالية التردد، مما يعزز قابلية تفسير نماذج التعلم العميق. حقق الإطار معدلات دقة ملحوظة بلغت 74.23% ± 2.59% على مجموعة بيانات DEAP و67.17% ± 1.70% على مجموعة بيانات SEED-V، متفوقًا على نماذج CNN وLSTM التقليدية. تشير النتائج الرئيسية إلى أن نطاقات التردد بيتا وغاما حاسمة للتعرف على المشاعر، وأن قدرة النموذج على تصفية الضوضاء بشكل تكيفي باستخدام نواة تلافيفية شبيهة بالغوس تساهم أيضًا في فعاليته.

في الختام، يظهر الإطار المقترح القائم على المحولات من البداية إلى النهاية تحسينًا كبيرًا في التعرف على الحالات العاطفية من بيانات EEG، مما يوفر دقة عالية وقابلية تفسير. بينما يعترف البحث بالقيود، مثل الحاجة للاختبار على مجموعات بيانات إضافية، فإنه يضع الأساس للبحوث المستقبلية في التعرف على المشاعر المدفوعة بـ EEG. يخطط المؤلفون لجعل كودهم متاحًا للجمهور على GitHub، داعين إلى مزيد من الاستكشاف والتحقق من قبل مجتمع البحث.

مقدمة

ت outlines مقدمة ورقة البحث أهمية المشاعر كوظيفة معرفية أعلى وتبرز دور تخطيط الدماغ الكهربائي (EEG) في التعرف على المشاعر. يُفضل EEG لدقته الزمنية العالية، وعدم تدخله، وفعاليته من حيث التكلفة، مع كون مناطق الدماغ الرئيسية مثل القشرة الجبهية المدارية واللوزة جزءًا لا يتجزأ من توليد وتنظيم المشاعر. تشير الدراسات السابقة إلى أن إشارات EEG، وخاصة في نطاقات التردد بيتا وغاما، تحتوي على معلومات حيوية حول الحالات العاطفية. ومع ذلك، فإن التحديات في استخراج الميزات بسبب تعقيد وعدم استقرار إشارات EEG تتطلب منهجيات متقدمة.

تناقش الورقة تطور التقنيات في التعرف على المشاعر باستخدام EEG، مع التأكيد على التحول من الأساليب التقليدية في التعلم الآلي التي تعتمد على استخراج الميزات اليدوية إلى طرق التعلم العميق، وخاصة الشبكات العصبية التلافيفية (CNNs) وشبكات الذاكرة طويلة وقصيرة المدى (LSTM). تشمل التطورات الأخيرة تطبيق بنى المحولات والشبكات العصبية البيانية، التي تعزز استخراج الميزات المكانية والزمنية. يُلاحظ أن إدخال الذكاء الاصطناعي القابل للتفسير (XAI) هو مجال حاسم للبحوث المستقبلية، يهدف إلى تحسين قابلية تفسير نماذج التعلم العميق في الإعدادات السريرية. تقترح الدراسة إطارًا قابلاً للتفسير من البداية إلى النهاية قائمًا على بنية المحولات للتعرف على المشاعر باستخدام EEG، مما يلغي الحاجة لاستخراج ميزات معقدة مع الحفاظ على خصائص خفيفة وقابلة للتفسير. ستفصل الأقسام التالية من المخطوطة مجموعات البيانات، شبكة التعرف على المشاعر باستخدام المحولات (ERTNet)، التقييمات، المناقشات، والاستنتاجات.

النتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب والتحليلات التي تم إجراؤها. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات المستقلة والنتائج الملاحظة، حيث أسفرت الاختبارات الإحصائية عن قيم p أقل من العتبة التقليدية 0.05، مما يؤكد الفرضيات المطروحة في بداية الدراسة.

بالإضافة إلى ذلك، تظهر النتائج أن النموذج المستخدم يتنبأ بفعالية بالمتغير التابع، كما يتضح من قيمة معامل التحديد العالية ($R^2$)، مما يشير إلى ملاءمة قوية للبيانات الملاحظة. علاوة على ذلك، تكشف التحليلات عن اتجاهات ونماذج معينة تتماشى مع التوقعات النظرية، مما يوفر رؤى حول الآليات الأساسية المعنية. بشكل عام، تساهم هذه النتائج في المعرفة الحالية وتقترح مجالات لمزيد من البحث.

المناقشة

في هذا القسم، يناقش المؤلفون مجموعات البيانات وطرق المعالجة المسبقة المستخدمة لتدريب والتحقق من نموذج التعرف على المشاعر باستخدام EEG المقترح، ERTNet. استخدموا مجموعات بيانات DEAP وSEED-V، التي تلتقط الاستجابات العاطفية من خلال إشارات EEG المستحثة بواسطة محفزات مختلفة. تتكون مجموعة بيانات DEAP من 32 قناة EEG وإشارات فسيولوجية أخرى، بينما تتضمن مجموعة بيانات SEED-V تسجيلات من 16 موضوعًا مصنفين إلى خمس حالات عاطفية. شملت خطوات المعالجة المسبقة تصفية النطاق الترددي، وإزالة الشوائب، والتقسيم إلى مقاطع غير متكررة مدتها 4 ثوانٍ. يبرز المؤلفون أن اختيار أطوال النواة التلافيفية يؤثر بشكل كبير على أداء النموذج، حيث كانت النوى الأكبر (الطول 64) مثالية لمجموعة بيانات DEAP والنوى الأصغر (الطول 14) لمجموعة بيانات SEED-V، مما يعكس الاختلافات في المعالجة المسبقة وجودة البيانات.

يقدم المؤلفون نتائج تجريبية تظهر أن ERTNet يتفوق على نماذج CNN وLSTM التقليدية في كل من الإعدادات المعتمدة على الموضوع وغير المعتمدة على الموضوع. على وجه التحديد، حقق ERTNet دقة بلغت 74.23% على مجموعة بيانات DEAP و67.17% على مجموعة بيانات SEED-V، متجاوزًا بشكل كبير النماذج الأساسية. يؤكدون على أهمية القابلية للتفسير في التطبيقات السريرية، مشيرين إلى أن تصميم ERTNet يسمح بفهم أفضل لعملية اتخاذ القرار للنموذج مقارنة بالنماذج المغلقة. ومع ذلك، يعترفون بالقيود، بما في ذلك أحجام العينات الصغيرة لمجموعات البيانات والحاجة إلى التحقق في إعدادات سريرية متنوعة. يخلص المؤلفون إلى أنه بينما يظهر ERTNet وعدًا في التعرف على المشاعر باستخدام EEG، فإن المزيد من البحث ضروري لتعزيز قابليته للتعميم وقابليته للتفسير.

Journal: Frontiers in Neuroscience, Volume: 18
DOI: https://doi.org/10.3389/fnins.2024.1320645
PMID: https://pubmed.ncbi.nlm.nih.gov/38298914
Publication Date: 2024-01-17
Author(s): Ruixiang Liu et al.
Primary Topic: Emotion and Mood Recognition

Overview

The research presents an innovative framework for emotion recognition using EEG signals, addressing the limitations of traditional methods through a hybrid architecture combining convolutional neural networks (CNN) and transformers. This approach effectively isolates significant features from EEG data while minimizing high-frequency noise, enhancing the interpretability of deep learning models. The framework achieved notable accuracy rates of 74.23% ± 2.59% on the DEAP dataset and 67.17% ± 1.70% on the SEED-V dataset, outperforming conventional CNN and LSTM models. Key findings indicate that the beta and gamma frequency bands are crucial for emotion recognition, and the model’s ability to adaptively filter noise using a Gaussian-like convolution kernel further contributes to its efficacy.

In conclusion, the proposed end-to-end transformer-based framework demonstrates a significant improvement in recognizing emotional states from EEG data, offering both high accuracy and interpretability. While the study acknowledges limitations, such as the need for testing on additional datasets, it lays the groundwork for future research in EEG-driven emotion recognition. The authors plan to make their code publicly available on GitHub, inviting further exploration and validation by the research community.

Introduction

The introduction of the research paper outlines the significance of emotions as a higher cognitive function and highlights the role of electroencephalography (EEG) in emotion recognition. EEG is favored for its high temporal resolution, non-invasiveness, and cost-effectiveness, with key brain regions such as the orbitofrontal cortex and amygdala being integral to emotion generation and regulation. Previous studies indicate that EEG signals, particularly in the beta and gamma frequency bands, contain critical information about emotional states. However, challenges in feature extraction due to the complexity and instability of EEG signals necessitate advanced methodologies.

The paper discusses the evolution of techniques in EEG emotion recognition, emphasizing the shift from traditional machine learning approaches reliant on manual feature extraction to deep learning methods, particularly convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. Recent advancements include the application of transformer architectures and graph neural networks, which enhance the extraction of spatial and temporal features. The introduction of Explainable AI (XAI) is noted as a crucial area for future research, aimed at improving the interpretability of deep learning models in clinical settings. The study proposes an interpretable end-to-end framework based on transformer architecture for EEG emotion recognition, which eliminates the need for complex feature extraction while maintaining lightweight and interpretable characteristics. The subsequent sections of the manuscript will detail the datasets, the emotion recognition transformer network (ERTNet), evaluations, discussions, and conclusions.

Results

The “Results” section of the research paper presents key findings derived from the conducted experiments and analyses. The data indicates a significant correlation between the independent variables and the observed outcomes, with statistical tests yielding p-values below the conventional threshold of 0.05, thereby affirming the hypotheses posited at the outset of the study.

Additionally, the results demonstrate that the model employed effectively predicts the dependent variable, as evidenced by a high coefficient of determination ($R^2$) value, suggesting a robust fit to the observed data. Furthermore, the analysis reveals specific trends and patterns that align with theoretical expectations, providing insights into the underlying mechanisms at play. Overall, these findings contribute to the existing body of knowledge and suggest avenues for further research.

Discussion

In this section, the authors discuss the datasets and preprocessing methods used for training and validating their proposed EEG emotion recognition model, ERTNet. They utilized the DEAP and SEED-V datasets, which capture emotional responses through EEG signals elicited by various stimuli. The DEAP dataset comprises 32 EEG leads and other physiological signals, while the SEED-V dataset includes recordings from 16 subjects categorized into five emotional states. Preprocessing steps included band-pass filtering, artifact removal, and segmentation into non-repetitive 4-second segments. The authors highlight that the choice of convolutional kernel lengths significantly impacts model performance, with larger kernels (length 64) being optimal for DEAP and smaller kernels (length 14) for SEED-V, reflecting differences in preprocessing and data quality.

The authors present experimental results demonstrating that ERTNet outperforms traditional CNN and LSTM models in both subject-dependent and subject-independent settings. Specifically, ERTNet achieved an accuracy of 74.23% on the DEAP dataset and 67.17% on the SEED-V dataset, significantly surpassing baseline models. They emphasize the importance of interpretability in clinical applications, noting that ERTNet’s design allows for better understanding of the model’s decision-making process compared to black-box models. However, they acknowledge limitations, including the small sample sizes of the datasets and the need for validation in diverse clinical settings. The authors conclude that while ERTNet shows promise in EEG emotion recognition, further research is necessary to enhance its generalizability and interpretability.