إطار تعلم عميق جماعي للتعرف على المشاعر من خلال أجهزة قابلة للارتداء وإشارات فسيولوجية متعددة الوسائط An ensemble deep learning framework for emotion recognition through wearable devices multi-modal physiological signals

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-99858-0
PMID: https://pubmed.ncbi.nlm.nih.gov/40383809
تاريخ النشر: 2025-05-18
المؤلف: Durgesh Nandini وآخرون
الموضوع الرئيسي: تخطيط الدماغ وواجهات الدماغ-الكمبيوتر

نظرة عامة

تقدم ورقة البحث نظامًا مبتكرًا للتعرف على المشاعر يستفيد من أجهزة تتبع اللياقة البدنية القابلة للارتداء المصغرة لتعزيز الوعي العاطفي في التفاعلات بين الإنسان والكمبيوتر. تستخدم الدراسة بنية تعلم عميق جماعية، تدمج نماذج الذاكرة طويلة وقصيرة الأمد (LSTM) ووحدات التكرار المغلقة (GRU)، لالتقاط الاعتماديات الزمنية الديناميكية في البيانات العاطفية بشكل فعال. باستخدام قاعدة بيانات EMOGNITION، التي تشمل إشارات فسيولوجية من أجهزة مثل ساعة سامسونج جالاكسي، وسوار إمباتيكا E4، وعصابة الرأس MUSE 2 EEG، يحدد النظام تسع مشاعر منفصلة من خلال تحليل مفصل لمجموعات مختلفة من الإشارات الحيوية. حققت أجهزة سامسونج جالاكسي وMUSE 2 دقة تصنيف متوسطة مثيرة للإعجاب بلغت 99.14% و99.41%، على التوالي، بينما أظهرت جهاز سامسونج جالاكسي دقة 97.81% بعد البعد المتعلق بالميل و72.94% للانفعال في نموذج الميل والانفعال ثنائي الأبعاد.

تسلط الدراسة الضوء على إمكانيات نظام الكشف عن المشاعر المقترح لتحسين الدقة والموثوقية في تطبيقات التكنولوجيا القابلة للارتداء. ومع ذلك، تعترف بالقيود التي تفرضها بيئة المختبر المسيطر عليها في قاعدة البيانات، والتي قد لا تمثل تمامًا تعقيدات السيناريوهات الواقعية، مثل سلوك المستخدم المتغير والضوضاء البيئية. يُقترح أن تشمل الأعمال المستقبلية تطوير قاعدة بيانات محددة للمشاعر باستخدام المحفزات الصوتية والمرئية، وتطبيق التعلم الانتقالي أو التعلم الميتا للنماذج المخصصة، ودمج تقنيات الذكاء الاصطناعي القابلة للتفسير لتعزيز شفافية النموذج وثقة المستخدم. بالإضافة إلى ذلك، يمكن أن يؤدي دمج المعلومات السياقية مثل النشاط البدني والظروف البيئية إلى تحسين قدرات التعرف على المشاعر في التطبيقات العملية.

الطرق

توضح المنهجية المقدمة في ورقة البحث هذه إطار عمل منظم للتعرف على المشاعر باستخدام التكنولوجيا القابلة للارتداء، وهو أمر حيوي لتعزيز التفاعل بين الإنسان والكمبيوتر (HCI). يتكون الإطار من خمسة كتل متسلسلة، تبدأ بجمع البيانات الفسيولوجية من ثلاثة أجهزة: ساعة سامسونج جالاكسي الذكية، وسوار إمباتيكا E4، وعصابة الرأس MUSE 2 EEG. تلتقط هذه الأجهزة إشارات فسيولوجية متنوعة، مثل نبض حجم الدم (BVP)، ومعدل ضربات القلب (HR)، والنشاط الكهربائي للجلد (EDA)، والتي تعتبر حاسمة لتقييم الحالات العاطفية بدقة.

في الكتل التالية، تؤكد المنهجية على معالجة البيانات المسبقة لضمان المتانة والموثوقية، باستخدام طريقة SMOTE-Tomek لمعالجة عدم التوازن بين الفئات المختلفة من المشاعر. ثم يتم تقسيم البيانات إلى مجموعات تدريب (60%) واختبار/تحقق (40%) لتسهيل تقييم النموذج بشكل فعال. جوهر الإطار هو مجموعة تعلم عميق تجمع بين نماذج الذاكرة طويلة وقصيرة الأمد (LSTM) ووحدات التكرار المغلقة (GRU)، والتي تلتقط الديناميات الزمنية بمهارة وتحدد الأنماط العاطفية. يحدد الإطار تسع مشاعر أساسية—المرح، والدهشة، والحماس، والإعجاب، والمفاجأة، والغضب، والاشمئزاز، والخوف، والحزن—بينما يستخدم أيضًا نموذجًا ثنائي الأبعاد للميل والانفعال (VA) لتقديم رؤى دقيقة حول شدة المشاعر. تسهم هذه الدراسة بشكل كبير في مجال الحوسبة العاطفية، مقدمة فهمًا أوضح للتكنولوجيا المدركة للعواطف في التطبيقات الواقعية.

النتائج

تقدم الدراسة نموذج LSTM-GRU المكدس الجماعي لنظام التعرف على المشاعر متعدد الوسائط، الذي تم تنفيذه باستخدام مكتبات بايثون مثل Keras وTensorFlow. يتم تقييم أداء النموذج من خلال مصفوفة الارتباك، مع التركيز على مقاييس تشمل الدقة، ودرجة F1، والدقة، والاسترجاع. يتم تقسيم مجموعة البيانات إلى مجموعات تدريب واختبار والتحقق بنسبة 60:20:20، باستخدام نهج تحقق احتياطي لتعزيز الكفاءة الحسابية. تُعتبر هذه الدراسة ملحوظة لكونها الأولى التي تقيم أداء ثلاثة أجهزة قابلة للارتداء—MUSE 2، وإمباتيكا E4، وسامسونج جالاكسي SM-R810—باستخدام نماذج الكشف عن المشاعر المنفصلة والأبعاد.

تعتبر قاعدة بيانات EMOGNITION، التي تشمل تسجيلات فسيولوجية من 43 مشاركًا تعرضوا لمحفزات عاطفية، الأساس لنظام التعرف على المشاعر. تلتقط الأجهزة إشارات فسيولوجية متنوعة، مثل ACC، وBVP، وEEG، والتي تتم معالجتها من خلال خوارزميات التعلم العميق لتبسيط التحليل وتقليل التعقيد الحسابي. تستخدم الدراسة إطارين للتصنيف: واحد للمشاعر المنفصلة وآخر لنموذج الميل والانفعال ثنائي الأبعاد. بعد معالجة شاملة للإشارات، يتم تدريب النموذج الجماعي بشكل تكراري لتحقيق أداء مثالي، مع نتائج تشير إلى تقدم واعد في تكنولوجيا التعرف على المشاعر القابلة للارتداء. تتم مقارنة النتائج مع المنهجيات الحالية، مما يظهر إمكانيات هذا النهج في تعزيز قدرات التعرف على المشاعر من خلال الأجهزة القابلة للارتداء.

المناقشة

تسلط المناقشة الضوء على التقدم الكبير في أنظمة التفاعل بين الإنسان والكمبيوتر (HCI) الذكية عاطفيًا، المدفوعة بالتطورات الأخيرة في الذكاء الاصطناعي والتكنولوجيا القابلة للارتداء. ركز الباحثون بشكل متزايد على دمج إشارات فسيولوجية متعددة للكشف عن المشاعر، مستخدمين منهجيات متنوعة مثل الشبكات المعززة بالجبر الهندسي، والنماذج الهرمية المستوحاة من الدماغ، والأنظمة متعددة الوسائط التي تقيم المشاعر من خلال الصوت والنص والتعبيرات الوجهية. وقد أظهرت الدراسات بشكل ملحوظ فعالية دمج إشارات مثل تخطيط صدى الدم (PPG)، ومعدل ضربات القلب (HR)، واستجابة الجلد الكهربائية (GSR) لتحقيق دقة عالية في التعرف على المشاعر، حيث وصلت بعض الأنظمة إلى أكثر من 90% دقة باستخدام تقنيات التعلم العميق. ومع ذلك، لا تزال هناك تحديات تتعلق بالملاءمة، وقابلية التوسع، وتنوع المشاعر التي تلتقطها الأنظمة الحالية، مما يشير إلى الحاجة إلى حلول أكثر قابلية للتكيف وواقعية.

تكشف مراجعة الأدبيات عن فجوة في تقييم أجهزة قابلة للارتداء محددة للكشف عن المشاعر، مما يبرز أهمية استخدام الأجهزة القابلة للارتداء الشائعة لتعزيز إمكانية التكرار والتوحيد القياسي في البحث. تهدف الدراسة المقترحة إلى معالجة هذه الفجوات من خلال استخدام قاعدة بيانات EMOGNITION، التي تشمل إشارات فسيولوجية تم جمعها من ثلاثة أجهزة قابلة للارتداء مختلفة: ساعة سامسونج جالاكسي، وسوار إمباتيكا E4، وعصابة الرأس MUSE 2 EEG. تستخدم الدراسة بنية تعلم عميق جماعية هجينة تجمع بين نماذج الذاكرة طويلة وقصيرة الأمد (LSTM) ووحدات التكرار المغلقة (GRU) لتصنيف المشاعر بناءً على النماذج المنفصلة والأبعاد. تشير النتائج إلى أن النظام المقترح يتفوق على الطرق الحالية، مما يظهر الإمكانية للكشف عن المشاعر في الوقت الحقيقي باستخدام إشارات فسيولوجية من مستشعرات قابلة للارتداء مستخدمة على نطاق واسع.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-99858-0
PMID: https://pubmed.ncbi.nlm.nih.gov/40383809
Publication Date: 2025-05-18
Author(s): Durgesh Nandini et al.
Primary Topic: EEG and Brain-Computer Interfaces

Overview

The research paper presents an innovative emotion recognition system leveraging miniaturized wearable fitness trackers to enhance emotional awareness in human-computer interactions. The study employs an ensemble deep learning architecture, integrating Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models, to effectively capture dynamic temporal dependencies in emotional data. Utilizing the EMOGNITION database, which includes physiological signals from devices such as the Samsung Galaxy Watch, Empatica E4 wristband, and MUSE 2 EEG headband, the system identifies nine discrete emotions through a detailed analysis of various bio-signal combinations. The Samsung Galaxy and MUSE 2 devices achieved impressive average classification accuracies of 99.14% and 99.41%, respectively, while the Samsung Galaxy device demonstrated a 97.81% accuracy for the Valence dimension and 72.94% for Arousal in the 2D Valence-Arousal model.

The study highlights the potential of the proposed emotion detection system to improve accuracy and robustness in wearable technology applications. However, it acknowledges the limitations posed by the controlled laboratory setting of the database, which may not fully represent the complexities of real-world scenarios, such as variable user behavior and environmental noise. Future work is suggested to include the development of an emotion-specific database using audio-video stimuli, the application of transfer learning or meta-learning for personalized models, and the integration of explainable AI techniques to enhance model transparency and user trust. Additionally, incorporating contextual information like physical activity and environmental conditions could further refine emotion recognition capabilities in practical applications.

Methods

The methodology presented in this research paper outlines a structured framework for emotion recognition using wearable technology, which is pivotal for enhancing human-computer interaction (HCI). The framework consists of five sequential blocks, beginning with the collection of physiological data from three devices: the Samsung Galaxy Smartwatch, Empatica E4 wristband, and MUSE 2 EEG headband. These devices capture various physiological signals, such as Blood Volume Pulse (BVP), Heart Rate (HR), and Electrodermal Activity (EDA), which are crucial for accurately assessing emotional states.

In the subsequent blocks, the methodology emphasizes data preprocessing to ensure robustness and reliability, employing the SMOTE-Tomek method to address class imbalance among different emotions. The data is then split into training (60%) and testing/validation (40%) sets to facilitate effective model evaluation. The core of the framework is a deep learning ensemble combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models, which adeptly captures temporal dynamics and identifies emotional patterns. The framework identifies nine basic emotions—amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, and sadness—while also employing a 2D Valence-Arousal (VA) dimensional model to provide nuanced insights into emotional intensity. This research contributes significantly to the field of affective computing, offering a clearer understanding of emotional-aware technology in real-world applications.

Results

The research presents an ensemble stacked LSTM-GRU model for a multi-modal emotion recognition system, implemented using Python libraries such as Keras and TensorFlow. The model’s performance is evaluated through a confusion matrix, focusing on metrics including accuracy, F1 score, precision, and recall. The dataset is partitioned into training, testing, and validation sets in a 60:20:20 ratio, employing a hold-out validation approach to enhance computational efficiency. This study is notable for being the first to assess the performance of three wearable devices—MUSE 2, Empatica E4, and Samsung Galaxy SM-R810—using both discrete and dimensional emotion detection models.

The EMOGNITION database, which includes physiological recordings from 43 participants exposed to emotional stimuli, serves as the foundation for the emotion recognition system. The devices capture various physiological signals, such as ACC, BVP, and EEG, which are processed through deep learning algorithms to streamline analysis and reduce computational complexity. The research employs two classification frameworks: one for discrete emotions and another for a two-dimensional Valence-Arousal model. Following comprehensive preprocessing of the signals, the ensemble model is trained iteratively to achieve optimal performance, with results indicating promising advancements in wearable emotion recognition technology. The findings are compared against existing methodologies, showcasing the potential of this approach in enhancing emotion recognition capabilities through wearable devices.

Discussion

The discussion highlights significant advancements in emotionally intelligent human-computer interaction (HCI) systems, driven by recent developments in artificial intelligence and wearable technology. Researchers have increasingly focused on integrating multiple physiological signals for emotion detection, employing various methodologies such as geometric algebra-enhanced networks, brain-inspired hierarchical models, and multimodal systems that assess emotions through voice, text, and facial expressions. Notably, studies have demonstrated the effectiveness of combining signals like photoplethysmogram (PPG), heart rate (HR), and galvanic skin response (GSR) to achieve high accuracy in emotion recognition, with some systems reaching over 90% accuracy using deep learning techniques. However, challenges remain regarding the wearability, scalability, and diversity of emotions captured by existing systems, indicating a need for more adaptable and practical solutions.

The literature review reveals a gap in the evaluation of specific wearable devices for emotion detection, emphasizing the importance of using common wearables to enhance reproducibility and standardization in research. The proposed study aims to address these gaps by utilizing the EMOGNITION database, which includes physiological signals collected from three different wearable devices: the Samsung Galaxy Watch, Empatica E4, and EEG headband MUSE 2. The study employs a hybrid ensemble deep learning architecture combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models to classify emotions based on both discrete and dimensional models. The results indicate that the proposed system outperforms existing methods, showcasing the potential for real-time emotion detection using physiological signals from widely used wearable sensors.