إطار عمل قائم على وحدة العمل للتعرف على التعبيرات الدقيقة للكشف عن الحالة العاطفية للسائق Action unit based micro-expression recognition framework for driver emotional state detection

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-12245-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40738929
تاريخ النشر: 2025-07-30
المؤلف: Parul Malik وآخرون
الموضوع الرئيسي: التعرف على العواطف والمزاج

نظرة عامة

تؤكد ورقة البحث على أهمية فهم الحالات العاطفية للسائقين لتعزيز سلامة الطرق. تحدد أن المشاعر مثل الغضب والخوف والاشمئزاز والحزن والسعادة تؤثر بشكل كبير على سلوك القيادة. تقدم الدراسة إطار عمل للتعرف على التعبيرات الدقيقة يستخدم وحدات الحركة الوجهية (AUs) المستمدة من نظام ترميز الحركة الوجهية (FACS) لاكتشاف هذه الحالات العاطفية. يستخدم الإطار شبكة متبقية (ResNet18) لاستخراج الميزات المكانية وشبكة الذاكرة طويلة وقصيرة المدى ثنائية الاتجاه (Bi-LSTM) لتحليل الأنماط الزمنية. تم تدريب النموذج على مجموعتين مرجعيتين، SAMM و KMU-FED، محققًا دقة التعرف بنسبة 96.38% و 95.96% على التوالي، وأظهر دقة بنسبة 91% في اكتشاف الحالات العاطفية للسائقين في تحليل الحالة.

تسلط الدراسة الضوء على التحديات في التعرف بدقة على التعبيرات الدقيقة الوجهية بسبب شدتها المنخفضة ومدة ظهورها القصيرة. يلتقط النهج المقترح بفعالية الخصائص الوجهية الدقيقة والتغيرات الزمنية في التعبيرات، مما يظهر قابليته للتكيف مع حالات عاطفية متنوعة. ومع ذلك، تعترف الأبحاث بالقيود، مثل الاعتماد على التوصيف الذاتي، والتباين الفردي في التعبير العاطفي، والانحياز المحتمل في التسمية. تهدف الأعمال المستقبلية إلى إصدار مجموعة بيانات مفتوحة المصدر تحتوي على مقاطع فيديو وجهية من سيناريوهات القيادة في العالم الحقيقي، والتي ستتضمن تسميات عاطفية على مستوى القطع وتوصيفات AU. تسعى هذه المبادرة إلى تعزيز التعرف على العواطف في ظروف القيادة الطبيعية وتسهيل دمج التحليل العاطفي متعدد الوسائط من خلال الإشارات الفسيولوجية.

الطرق

تستعرض ورقة البحث منهجيات متنوعة مستخدمة في التعرف على التعبيرات الوجهية، مصنفة إلى طرق تعلم الآلة، والتعلم العميق، والطرق المعتمدة على وحدات الحركة (AU). تشمل تقنيات تعلم الآلة استخدام تحويل الموجات الكنتورية لاستخراج الميزات، وغابة عشوائية موزونة هرمية للتطبيقات في الوقت الحقيقي، ودمج الأنماط الثنائية المحلية مع المعالم الوجهية باستخدام آلات الدعم الناقل (SVM) لاكتشاف العواطف. تستفيد طرق التعلم العميق من الشبكات العصبية التلافيفية (CNN) والشبكات العصبية المتكررة (RNN)، مع تطبيقات ملحوظة مثل الشبكات العميقة للاعتقاد والنماذج الهجينة التي تستخدم التعلم الانتقالي. بالإضافة إلى ذلك، تسلط الورقة الضوء على استكشاف هياكل CNN المختلفة، بما في ذلك ResNet50 و Xception، للاعتراف بالعواطف في مراقبة سلوك السائقين.

في سياق الطرق المعتمدة على وحدات الحركة، تناقش الورقة أهمية AUs في تحليل التعبيرات الوجهية، مع التأكيد على الحاجة إلى إرشادات لربط AUs بمشاعر معينة. ساهمت دراسات متنوعة في هذا المجال من خلال تطوير مجموعات بيانات واستخدام تقنيات مثل مقاييس أطول تسلسل مشترك لمطابقة تسلسل AU، وتصنيف متعدد التسميات، وهياكل هجينة تدمج AUs مع التعرف على التعبيرات الوجهية. على الرغم من التقدم، لا تزال التحديات قائمة في التعرف على التعبيرات الدقيقة في البيئات المعقدة. لمعالجة هذه القضايا، يقترح المؤلفون نظامًا قويًا للتعرف على العواطف يجمع بين ResNet18 لاستخراج الميزات البصرية مع LSTM ثنائية الاتجاه للتحليل الزمني، مستفيدًا من معلومات AU لفهم دقيق للعواطف البشرية.

تتفصل قسم المنهجية في استخدام مجموعات البيانات المتاحة للجمهور والاعتبارات الأخلاقية، مشيرة إلى أنه لم يتم جمع أي معلومات تعريف شخصية أو صور وجهية من المشاركين. شملت الدراسة الحصول على موافقة شفهية مستنيرة والتقيد بمبادئ خصوصية البيانات، مع اعتماد البروتوكول التجريبي من قبل لجنة الأخلاقيات البشرية المؤسسية.

النتائج

في قسم النتائج، يتم تقييم أداء النهج المقترح مقابل عدة طرق متطورة، مع تسليط الضوء على فعاليته في تصنيف وحدات الحركة (AUs). تشير النتائج إلى أن الطريقة المقترحة تتفوق على التقنيات الحالية في المقاييس الرئيسية، مما يظهر دقة وموثوقية أعلى. بالإضافة إلى ذلك، يتم تقديم تمثيلات رسومية لتوضيح اتجاهات وأنماط التصنيف الملاحظة في البيانات، مما يوفر رؤى حول الديناميات الأساسية لتصنيف AU. تعزز هذه المساعدات البصرية فهم نقاط القوة في النهج المقترح ومزاياه المقارنة على المنهجيات الأخرى.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التأثير الكبير للحالات العاطفية على أداء القيادة، مشددًا على أن المشاعر القوية مثل الغضب والخوف والحزن يمكن أن تعيق بشكل كبير الحمل المعرفي واتخاذ القرار أثناء القيادة. يشير المؤلفون إلى أن الطرق التقليدية لتقييم الحالات العاطفية للسائقين غالبًا ما تكون تدخليه وتستغرق وقتًا طويلاً، بينما يوفر التعرف على التعبيرات الدقيقة باستخدام وحدات الحركة الوجهية (AUs) بديلاً غير تدخلي وفي الوقت الحقيقي. يستفيد هذا النهج من تقنيات تعلم الآلة لتعزيز دقة التعرف على العواطف من خلال تحليل الحركات الوجهية اللاإرادية، مما يحسن سلامة الطرق.

تستعرض الورقة منهجيات متنوعة لاكتشاف العواطف، مع تحديد الفجوات في الدراسات الحالية، مثل القيود في تنوع مجموعة البيانات والحاجة إلى التحقق من صحة العالم الحقيقي. يستخدم المؤلفون مجموعتين محددتين، وهما مجموعة بيانات الأفعال التلقائية والحركات الدقيقة (SAMM) ومجموعة بيانات تعبيرات الوجه للسائقين من جامعة كيميونغ (KMU-FED)، اللتين تقدمان مجموعة من التعبيرات العاطفية ذات الصلة بسياقات القيادة. تُعتبر مجموعة بيانات SAMM ملحوظة بشكل خاص لتنوعها العرقي وتوزيع الأعمار، وهو أمر حاسم للاعتراف الدقيق بالتعبيرات الدقيقة. يدمج الإطار المقترح هيكلًا هجينيًا يجمع بين شبكة CNN المعتمدة على ResNet لاستخراج الميزات وLSTM ثنائية الاتجاه للمعالجة الزمنية، بهدف تصنيف العواطف بفعالية مع معالجة التحديات التي تطرحها ظروف القيادة المتنوعة والحالات العاطفية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-12245-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40738929
Publication Date: 2025-07-30
Author(s): Parul Malik et al.
Primary Topic: Emotion and Mood Recognition

Overview

The research paper emphasizes the importance of understanding drivers’ emotional states for enhancing road safety. It identifies that emotions such as anger, fear, disgust, sadness, and happiness significantly affect driving behavior. The study introduces a micro-expression recognition framework that utilizes facial Action Units (AUs) derived from the Facial Action Coding System (FACS) to detect these emotional states. The framework employs a Residual Network (ResNet18) for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network for analyzing temporal patterns. The model was trained on two benchmark datasets, SAMM and KMU-FED, achieving recognition accuracies of 96.38% and 95.96%, respectively, and demonstrated a 91% accuracy in detecting drivers’ emotional states in a case analysis.

The study highlights the challenges in accurately recognizing subtle facial micro-expressions due to their low intensity and brief duration. The proposed approach effectively captures fine facial characteristics and temporal changes in expressions, showcasing its adaptability to various emotional states. However, the research acknowledges limitations, such as reliance on self-annotations, individual variability in emotional expression, and potential biases in labeling. Future work aims to release an open-source annotated dataset of facial videos from real-world driving scenarios, which will include segment-level emotion labels and AU annotations. This initiative seeks to enhance emotion recognition in naturalistic driving conditions and facilitate the integration of multimodal affective analysis through physiological signals.

Methods

The research paper outlines various methodologies employed in facial expression recognition, categorized into machine learning, deep learning, and action-unit (AU) based methods. Machine learning techniques include the use of Contourlet wavelet transform for feature extraction, hierarchical Weighted Random Forest for real-time applications, and the fusion of local binary patterns with facial landmarks using Support Vector Machines (SVM) for emotion detection. Deep learning approaches leverage convolutional neural networks (CNN) and recurrent neural networks (RNN), with notable implementations such as deep belief networks and hybrid models utilizing transfer learning. Additionally, the paper highlights the exploration of various CNN architectures, including ResNet50 and Xception, for emotion recognition in driver behavior monitoring.

In the context of action-unit based methods, the paper discusses the significance of AUs in analyzing facial expressions, emphasizing the need for guidelines to connect AUs with specific emotions. Various studies have contributed to this field by developing datasets and employing techniques such as longest common subsequence metrics for AU sequence matching, multilabel classification, and hybrid architectures that integrate AUs with facial expression recognition. Despite advancements, challenges persist in recognizing subtle expressions in complex environments. To address these issues, the authors propose a robust emotion recognition system that combines ResNet18 for visual feature extraction with Bidirectional LSTM for temporal analysis, utilizing AU information for a nuanced understanding of human emotions.

The methodology section details the use of publicly available datasets and ethical considerations, noting that no personal identifying information or facial imagery was collected from participants. The study involved informed verbal consent and adhered to data privacy principles, with the experimental protocol approved by the institutional human ethics committee.

Results

In the Results section, the performance of the proposed approach is evaluated against several state-of-the-art methods, highlighting its effectiveness in categorizing Action Units (AUs). The findings indicate that the proposed method outperforms existing techniques in key metrics, demonstrating superior accuracy and reliability. Additionally, graphical representations are provided to illustrate the categorization trends and patterns observed in the data, offering insights into the underlying dynamics of AU classification. These visual aids enhance the understanding of the proposed approach’s strengths and its comparative advantages over other methodologies.

Discussion

The discussion section of the research paper highlights the significant impact of emotional states on driving performance, emphasizing that strong emotions such as anger, fear, and sadness can severely impair cognitive load and decision-making while driving. The authors note that traditional methods for assessing driver emotional states are often invasive and time-consuming, whereas micro-expression recognition using facial action units (AUs) provides a non-intrusive, real-time alternative. This approach leverages machine learning techniques to enhance the accuracy of emotion recognition by analyzing involuntary facial movements, thereby improving road safety.

The paper reviews various methodologies for emotion detection, identifying gaps in existing studies, such as limitations in dataset diversity and the need for real-world validation. The authors utilize two specific datasets, the Spontaneous Actions and Micro-Movements (SAMM) and the Keimyung University Facial Expression of Drivers (KMU-FED), which offer a range of emotional expressions relevant to driving contexts. The SAMM dataset is particularly noted for its ethnic diversity and age distribution, which are crucial for accurate micro-expression recognition. The proposed framework integrates a hybrid architecture combining a ResNet-based CNN for feature extraction and a Bidirectional LSTM for temporal processing, aiming to classify emotions effectively while addressing the challenges posed by varying driving conditions and emotional states.