بنية شبكة عصبية عميقة فعالة للتعرف على المشاعر بناءً على تخطيط الدماغ الكهربائي An Effective Deep Neural Network Architecture for EEG-Based Recognition of Emotions

المجلة: IEEE Access، المجلد: 13
DOI: https://doi.org/10.1109/access.2025.3525996
تاريخ النشر: 2025-01-01
المؤلف: Khadidja Henni وآخرون
الموضوع الرئيسي: تخطيط الدماغ وواجهات الدماغ-الكمبيوتر

نظرة عامة

تبحث هذه الدراسة في تحديد المشاعر من خلال تقنيات التعلم الآلي المطبقة على بيانات تخطيط الدماغ الكهربائي (EEG). تتناول التحديات التي تطرحها الأبعاد العالية والتنوع في إشارات EEG، مما يعقد تصنيف المشاعر بدقة. غالبًا ما تفشل الطرق التقليدية في استخراج الميزات ذات الصلة بفعالية والتقاط الاعتماد الزمني المعقد المتأصل في بيانات EEG. للتغلب على هذه القيود، يقترح المؤلفون إطار عمل جديد للتعلم العميق من البداية إلى النهاية يدمج مشفرًا تلقائيًا كثيفًا لتقليل الأبعاد مع مجموعة من الشبكات العصبية التلافيفية ثنائية الأبعاد (CNN) وشبكات الذاكرة طويلة وقصيرة المدى (LSTM) لالتقاط الميزات المكانية والزمنية.

تظهر النتائج التجريبية، المستمدة من مجموعة بيانات DEAP، فعالية النموذج في تصنيف أربع فئات عاطفية: الإثارة، القيمة، الهيمنة، والإعجاب، محققة دقة تبلغ 90.04%، 89.97%، 87.73%، و90.84%، على التوالي. تؤكد النتائج فعالية المشفر التلقائي في تعزيز استخراج الميزات، وقدرة بنية CNN-LSTM على تحليل بيانات EEG بشكل شامل. يقترح المؤلفون أن تركز الأعمال المستقبلية على تطبيق هذا النموذج التصنيفي لتطوير إطار عمل عملي للتعرف على المشاعر يهدف إلى تحديد الاضطرابات النفسية المختلفة.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على أهمية التعرف على المشاعر في التفاعل بين الإنسان والكمبيوتر (HCI)، مشددة على دورها في تعزيز تجربة المستخدم من خلال الحوسبة العاطفية. تركز الدراسة على استخدام بيانات تخطيط الدماغ الكهربائي (EEG) للتعرف على المشاعر، مع معالجة التحديات التي تطرحها الأبعاد العالية والتنوع في إشارات EEG. إن الفحص البصري التقليدي لبيانات EEG غير عملي، مما يحفز استخدام الشبكات العصبية العميقة (DNNs) للتعرف الفعال على الأنماط. تناقش الورقة مزايا الشبكات العصبية التلافيفية (CNNs) وشبكات الذاكرة طويلة وقصيرة المدى (LSTMs) في التقاط الميزات المكانية والزمنية، على التوالي، مع الإشارة أيضًا إلى قيود CNNs عند تطبيقها على بيانات السلاسل الزمنية.

للتغلب على هذه التحديات، يقترح المؤلفون إطار عمل جديد يدمج مشفرًا تلقائيًا لتقليل الأبعاد، وطبقات تلافيفية ثنائية الأبعاد لاستخراج الميزات المكانية، وبنية LSTM محسّنة لنمذجة الاعتماد الزمني. تهدف هذه المقاربة التآزرية إلى تعزيز دقة تصنيف المشاعر من بيانات EEG. يتم تقييم الطريقة المقترحة على مجموعة بيانات DEAP، محققة دقة تصنيف مثيرة للإعجاب تبلغ 90.04% لـ “الإثارة”، 89.97% لـ “القيمة”، 87.73% لـ “الهيمنة”، و90.84% لـ “الإعجاب”. تؤكد هذه النتائج على إمكانية الإطار في تعزيز التعرف على المشاعر المعتمد على EEG وتحسين التطبيقات العملية في هذا المجال. توضح الورقة هيكلها، مفصلة الدراسات ذات الصلة، المنهجية، النتائج التجريبية، والاستنتاجات في الأقسام التالية.

طرق

في هذا القسم، يصف المؤلفون الطرق المستخدمة في دراستهم، مع التركيز على الإعداد التجريبي لتدريب نموذج شبكة عصبية عميقة. يتم استخدام دالة خسارة الانتروبيا المتقاطعة الفئوية، المعرفة على أنها \( L(y, \hat{y}) = – \sum_{c=1}^{M} y_c \log(\hat{y}_c) \)، حيث \( M = 2 \) تمثل الفئتين (منخفض أو مرتفع). يتم تدريب النموذج باستخدام مُحسّن الانحدار العشوائي (SGD) بمعدل تعلم قدره 0.01 وعامل تدهور قدره 0.82. للتحقق من صحة النموذج، يتم تنفيذ نهج التحقق المتقاطع 10 مرات، مما يسمح بتقييم أداء قوي من خلال التدريب على تسع طيات واختبار الطية المتبقية. يتم تحسين المعلمات الفائقة باستخدام طريقة البحث الشبكي، حيث يظهر SGD أعلى دقة بين تقنيات التحسين المختلفة.

يتم تقييم أداء النموذج في تصنيف أربع مشاعر—الإثارة، القيمة، الهيمنة، والإعجاب—باستخدام ثلاثة مقاييس رئيسية: دقة التصنيف (ACC)، الدقة، والاسترجاع. يتم حساب دقة التصنيف كنسبة التوقعات الصحيحة (الإيجابيات الحقيقية والسلبيات الحقيقية) إلى إجمالي التوقعات. تقيس الدقة نسبة التوقعات الإيجابية الحقيقية بالنسبة لجميع التوقعات الإيجابية، بينما يقيم الاسترجاع نسبة الإيجابيات الحقيقية بين جميع الحالات الإيجابية الفعلية. توفر هذه المقاييس تقييمًا شاملاً لفعالية النموذج في تصنيف المشاعر.

نقاش

يستفيد نموذج التعرف على المشاعر المقترح من بنية هجينة تجمع بين مشفر تلقائي لاستخراج الميزات وإطار عمل CNN-LSTM لتصنيف إشارات EEG. يعالج هذا النموذج بفعالية تعقيدات التعرف على المشاعر من خلال التقاط كل من الميزات المكانية والزمنية المتأصلة في بيانات EEG. يقوم المشفر التلقائي بتقليل أبعاد بيانات الإدخال، مكرسًا إياها من 8064 ميزة إلى 512، مما يعزز كفاءة طبقات التصنيف اللاحقة. تركز طبقات CNN على استخراج الأنماط المكانية، بينما تقوم طبقات LSTM بنمذجة الاعتماد الزمني لإشارات EEG، وهو أمر حاسم لفهم تطور الحالات العاطفية.

تظهر النتائج التجريبية أن النموذج يحقق دقة تصنيف عالية عبر أربعة أبعاد عاطفية: الإثارة (90.04%)، القيمة (89.97%)، الهيمنة (87.73%)، والإعجاب (90.84%). تسلط دراسات الإزالة الضوء على أهمية كل مكون؛ بشكل ملحوظ، أدى إزالة طبقة LSTM إلى انخفاض ملحوظ في الأداء، مما يبرز دورها في التقاط الطبيعة التسلسلية لبيانات EEG. تشير المقارنات مع الأدبيات الحالية إلى أن النموذج المقترح يتفوق على عدة طرق متطورة، لا سيما في التعرف على مشاعر الإثارة والقيمة، مما يثبت فعاليته في مجال التعرف على المشاعر المعتمد على EEG.

Journal: IEEE Access, Volume: 13
DOI: https://doi.org/10.1109/access.2025.3525996
Publication Date: 2025-01-01
Author(s): Khadidja Henni et al.
Primary Topic: EEG and Brain-Computer Interfaces

Overview

This study investigates the identification of emotions through machine learning techniques applied to electroencephalography (EEG) data. It addresses the challenges posed by the high-dimensionality and variability of EEG signals, which complicate accurate emotion classification. Traditional methods often fail to effectively extract relevant features and capture the complex temporal dependencies inherent in EEG data. To overcome these limitations, the authors propose a novel end-to-end deep learning framework that integrates a dense autoencoder for dimensionality reduction with a combination of two-dimensional convolutional networks (CNN) and Long Short-Term Memory networks (LSTM) to capture both spatial and temporal features.

The experimental results, derived from the DEAP dataset, demonstrate the model’s efficacy in classifying four emotional categories: arousal, valence, dominance, and liking, achieving accuracies of 90.04%, 89.97%, 87.73%, and 90.84%, respectively. The findings confirm the effectiveness of the autoencoder in enhancing feature extraction, and the CNN-LSTM architecture’s ability to analyze EEG data comprehensively. The authors suggest that future work will focus on applying this classification model to develop a practical emotion recognition framework aimed at identifying various psychological disorders.

Introduction

The introduction of this research paper highlights the significance of emotion recognition in human-computer interaction (HCI), emphasizing its role in enhancing user experience through affective computing. The study focuses on utilizing electroencephalography (EEG) data for emotion recognition, addressing the challenges posed by the high dimensionality and variability of EEG signals. Traditional visual examination of EEG data is impractical, motivating the use of deep neural networks (DNNs) for effective pattern recognition. The paper discusses the advantages of convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) in capturing spatial and temporal features, respectively, while also noting the limitations of CNNs when applied to time series data.

To overcome these challenges, the authors propose a novel framework that integrates an autoencoder for dimensionality reduction, 2D convolutional layers for spatial feature extraction, and an optimized LSTM structure for temporal dependency modeling. This synergistic approach aims to enhance the accuracy of emotion classification from EEG data. The proposed method is evaluated on the DEAP dataset, achieving impressive classification accuracies of 90.04% for “arousal,” 89.97% for “valence,” 87.73% for “dominance,” and 90.84% for “liking.” These results underscore the framework’s potential to advance EEG-based emotion recognition and improve practical applications in the field. The paper outlines its structure, detailing related studies, methodology, experimental results, and conclusions in subsequent sections.

Methods

In this section, the authors describe the methods employed in their study, focusing on the experimental setup for training a deep neural network model. The categorical cross-entropy loss function is utilized, defined as \( L(y, \hat{y}) = – \sum_{c=1}^{M} y_c \log(\hat{y}_c) \), where \( M = 2 \) represents the two classes (low or high). The model is trained using the stochastic gradient descent (SGD) optimizer with a learning rate of 0.01 and a decay factor of 0.82. To validate the model, a 10-fold cross-validation approach is implemented, allowing for robust performance evaluation by training on nine folds and testing on the remaining fold. Hyper-parameters are optimized using a grid search method, with SGD demonstrating the highest accuracy among various optimization techniques.

The model’s performance in classifying four emotions—arousal, valence, dominance, and liking—is assessed using three key metrics: classification accuracy (ACC), precision, and recall. Classification accuracy is calculated as the ratio of correct predictions (true positives and true negatives) to the total predictions. Precision measures the proportion of true positive predictions relative to all positive predictions, while recall assesses the proportion of true positives among all actual positive cases. These metrics provide a comprehensive evaluation of the model’s effectiveness in emotion classification.

Discussion

The proposed emotion recognition model leverages a hybrid architecture combining an Autoencoder for feature extraction and a CNN-LSTM framework for classification of EEG signals. This model effectively addresses the complexities of emotion recognition by capturing both spatial and temporal features inherent in EEG data. The Autoencoder reduces the dimensionality of the input data, distilling it from 8064 features to 512, which enhances the efficiency of subsequent classification layers. The CNN layers focus on extracting spatial patterns, while the LSTM layers model the temporal dependencies of the EEG signals, crucial for understanding the evolution of emotional states.

Experimental results demonstrate that the model achieves high classification accuracy across four emotional dimensions: arousal (90.04%), valence (89.97%), dominance (87.73%), and liking (90.84%). The ablation studies highlight the significance of each component; notably, the removal of the LSTM layer resulted in a marked decline in performance, underscoring its role in capturing the sequential nature of EEG data. Comparisons with existing literature indicate that the proposed model outperforms several state-of-the-art methods, particularly in recognizing arousal and valence emotions, thereby establishing its efficacy in the domain of EEG-based emotion recognition.