نحو التعرف على المشاعر في الوقت الحقيقي في أنظمة الحوسبة السحابية: الاستفادة من PCA_CNN القابلة للتفسير، وYOLO مع آلية الانتباه الذاتي Toward real-time emotion recognition in fog computing-based systems: leveraging interpretable PCA_CNN, YOLO with self-attention mechanism

المجلة: Frontiers in Computer Science، المجلد: 7
DOI: https://doi.org/10.3389/fcomp.2025.1714394
تاريخ النشر: 2026-01-14
المؤلف: Nora EL Rashidy وآخرون
الموضوع الرئيسي: التعرف على العواطف والمزاج

نظرة عامة

تقدم ورقة البحث نظامًا جديدًا للتعرف على المشاعر في الوقت الحقيقي باستخدام تقنية الضباب، تهدف إلى تتبع الحالات العاطفية البشرية من خلال تحليل تعبيرات الوجه. تتناول الدراسة التحديات المتعلقة بالصور ذات الجودة المنخفضة الملتقطة عن بُعد، والتي تعيق عادةً الأداء في تصنيف المشاعر. يدمج النظام المقترح تحليل المكونات الرئيسية (PCA) لاختيار الميزات مع نسخة متخصصة من نموذج YOLO (YOLOv8)، المعزز بآليات الانتباه المكاني لتحسين قدرات التعرف في الوقت الحقيقي.

تشير تقييمات الأداء إلى أن نموذج PCA_CNN، الذي يجمع بين PCA لتقليل الأبعاد مع شبكة عصبية تلافيفية (CNN) للتصنيف، يحقق مقاييس ملحوظة: دقة 0.936، دقة 0.971، استرجاع 0.843، ودرجة منطقة تحت المنحنى (AUC) تبلغ 0.943. بالمقارنة، يحقق نموذج YOLOv8 مع الانتباه مقاييس أعلى، بما في ذلك دقة 0.986 وAUC تبلغ 0.952، بينما يظهر أيضًا أوقات معالجة أسرع بشكل ملحوظ، حيث يكمل المهام في 610 ثوانٍ فقط. يتم التحقق من فعالية النموذج من خلال اختبارات موسعة على مجموعات بيانات إضافية، مما يبرز إمكانياته في تعزيز التطبيقات في الحوسبة العاطفية.

مقدمة

تؤكد مقدمة ورقة البحث هذه على أهمية التعرف على المشاعر (ER) في فهم الحالات العقلية البشرية وتطبيقاته في مراقبة الصحة العقلية وتفاعل الإنسان مع الروبوت. تهدف الدراسة إلى تطوير ER من خلال تحويل تعبيرات الوجه إلى مشاعر عابرة للحدود من خلال عملية من خطوتين تتضمن استخراج الميزات واكتشاف المشاعر. تشمل التقنيات الرئيسية معالجة البيانات (التطبيع، تغيير الحجم، إزالة الضوضاء) واستخدام تحليل المكونات الرئيسية (PCA) لاستخراج الميزات، مما يقلل من أبعاد البيانات مع الاحتفاظ بالميزات ذات التباين العالي. يتم استخدام شبكة عصبية تلافيفية عميقة (CNN) لتصنيف الصور، مستفيدة من نقاط القوة في هياكل CNN المختلفة لمعالجة التحديات في ER، مثل الحاجة إلى صور عالية الدقة وتعقيدات تعبيرات المشاعر الوجهية.

يهدف نظام التعرف على المشاعر في الوقت الحقيقي المقترح القائم على تقنية الحوسبة الضبابية (RERS-FoG) إلى تحسين نتائج المرضى من خلال توفير إشعارات في الوقت المناسب ومراقبة في الوقت الحقيقي. تقدم الدراسة نموذجين: PCA_CNN، الذي يستخدم PCA لتقليل الأبعاد تليه DCNN للتصنيف، وYOLOv8 المدمج مع آلية الانتباه الذاتي. تشير النتائج إلى أن نظام RERS-FoG يتفوق على الطرق التقليدية، محققًا دقة عالية، ودقة، واسترجاع عبر مجموعات بيانات متنوعة، بما في ذلك تلك الخاصة باضطراب طيف التوحد (ASD). توضح الورقة هيكلها، مع أقسام لاحقة تفصل اتجاهات البحث، والمنهجية، وتصميم النظام، والنتائج التجريبية، والاستنتاجات.

طرق

في هذا القسم، يقيم المؤلفون هياكل مختلفة مدربة مسبقًا على مجموعات البيانات الأصلية والمخفضة الأبعاد، مع التركيز على مقاييس أدائها مثل الدقة، والدقة، والاسترجاع، والكفاءة الحسابية. تشير النتائج إلى أن VGG19 يتفوق على VGG16، محققًا دقة أعلى (0.884) ودقة (0.913)، ويعزى ذلك إلى عمقه وزخرفته المتزايدة. كما يظهر ResNet169 أداءً محسنًا مقارنة بـ ResNet121، مع دقة متفوقة (0.859) واسترجاع (0.853)، على الرغم من أنه يتطلب أعلى وقت حسابي بسبب عدد معاييره الكبير. من ناحية أخرى، تظهر MobileNet وMobileNet V2 دقة تنافسية مع استخدام عدد أقل من المعايير، حيث تحقق MobileNet V2 أعلى دقة (0.889) بين جميع النماذج، مما يبرز كفاءتها في البيئات ذات الموارد المحدودة.

عند تقييم تأثير تقليل الأبعاد، تكشف النتائج أن جميع النماذج تستفيد من تحسين القدرات التنبؤية، مع الحفاظ على مقاييس أداء قوية لـ VGG19 وResNet169. من الجدير بالذكر أن MobileNet تحقق دقة مرضية (0.851) مع تقليل كبير في الوقت الحسابي، مما يجعلها مناسبة للتطبيقات في الوقت الحقيقي. تؤكد التحليلات أن تقليل الأبعاد يركز بشكل فعال على الميزات المعلوماتية، مما يؤدي إلى تحسين أداء النموذج مع تعزيز الكفاءة الحسابية، خاصة بالنسبة لهياكل MobileNet. بشكل عام، تشير النتائج إلى أن MobileNet V2 هو النموذج الأكثر فائدة بسبب توازنه بين الدقة ومتطلبات الحساب، مما يجعله مثاليًا للمهام الحساسة للزمن.

نتائج

يقدم قسم النتائج في ورقة البحث تقييمًا شاملاً لآلية الانتباه المقترحة لـ CNN وYOLO V8 مع آلية الانتباه الذاتي. حقق نموذج CNN، عند تطبيقه على البيانات الأصلية، دقة تبلغ 0.916، مع قيم دقة، واسترجاع، وF-measure، وAUC تبلغ 0.923، 0.910، 0.913، و0.911، على التوالي. من الجدير بالذكر أن تطبيق تقليل الأبعاد حسن هذه المقاييس، حيث أسفر عن دقة 0.937، ودقة 0.971، وتقليل في الوقت الحسابي من 1,132.32 ثانية إلى 711.45 ثانية. تشير هذه النتائج إلى أن تقليل الأبعاد يلتقط بشكل فعال الميزات المهمة، مما يعزز أداء التصنيف مع تحسين الكفاءة الحسابية.

في تقييم نموذج YOLO V8 مع آلية الانتباه الذاتي، حقق النموذج دقة تبلغ 0.926 على البيانات الأصلية، مع قيم دقة، واسترجاع، وF-measure، وAUC تبلغ 0.893، 0.910، 0.874، و0.855، على التوالي. أدى دمج الانتباه الذاتي إلى تحسين الأداء بشكل أكبر، مما أسفر عن دقة تبلغ 0.89669 وتقليل في وقت المعالجة إلى 610.45 ثانية. تؤكد النتائج على مزايا كل من آليات الانتباه وتقنيات تقليل الأبعاد في تعزيز أداء النموذج وكفاءته الحسابية، مما يجعل هذه النماذج مناسبة للتطبيقات في الوقت الحقيقي. بشكل عام، تظهر النتائج تحسينات كبيرة في دقة التصنيف واستخدام الموارد عبر كلا النموذجين المقترحين.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التحديات المستمرة في اكتشاف المشاعر من تعبيرات الوجه، خاصة بالنسبة للأفراد المصابين بالتوحد، الذين غالبًا ما يواجهون صعوبة في التعرف على المشاعر بسبب حالتهم العصبية السلوكية. اعتمدت الطرق التقليدية للتعرف على المشاعر الوجهية على الأساليب الإحصائية وتقنيات استخراج الميزات المحلية، مثل موجات غابور وتحويل الميزات غير القابلة للتغيير (SIFT)، والتي أظهرت درجات متفاوتة من النجاح. ومع ذلك، تتطلب هذه الطرق غالبًا ضبط معايير معقدة ومعرفة مسبقة لاستخراج الميزات بشكل فعال. في المقابل، ظهرت خوارزميات التعلم العميق (DL)، وخاصة الشبكات العصبية التلافيفية (CNNs)، كبدائل أكثر كفاءة، قادرة على تعلم ميزات الصورة تلقائيًا دون معالجة مسبقة مكثفة. لقد عززت التطورات الأخيرة في هياكل CNN، بما في ذلك آليات الانتباه، أداء أنظمة التعرف على المشاعر الوجهية من خلال السماح للنماذج بالتركيز على الميزات البارزة داخل الصور.

تؤكد الورقة أيضًا على أهمية الأساليب متعددة الوسائط في التعرف على المشاعر، حيث أن الاعتماد فقط على البيانات الأحادية يمكن أن يحد من الدقة والموثوقية. يهدف نظام التعرف على المشاعر في الوقت الحقيقي المقترح القائم على الحوسبة الضبابية إلى معالجة هذه التحديات من خلال دمج مصادر بيانات متنوعة، بما في ذلك تعبيرات الوجه ومؤشرات الفسيولوجية، لتحسين فهم الرفاهية العاطفية. يتكون هيكل هذا النظام من عدة طبقات، بما في ذلك طبقة استشعار لجمع البيانات، وطبقة حوسبة ضبابية لتقليل الميزات واتخاذ القرارات في الوقت الحقيقي، ونظام إشعارات للتدخلات في الوقت المناسب. لا تعزز هذه المقاربة المبتكرة دقة وكفاءة التعرف على المشاعر فحسب، بل تحمل أيضًا وعدًا بتطبيقات في الصحة العقلية وتفاعل الإنسان مع الكمبيوتر.

Journal: Frontiers in Computer Science, Volume: 7
DOI: https://doi.org/10.3389/fcomp.2025.1714394
Publication Date: 2026-01-14
Author(s): Nora EL Rashidy et al.
Primary Topic: Emotion and Mood Recognition

Overview

The research paper presents a novel Real-Time Emotion Recognition system utilizing a Fog Technique, aimed at tracking human emotional states through facial expression analysis. The study addresses the challenges of low-quality images captured at a distance, which typically hinder performance in emotion classification. The proposed system integrates Principal Component Analysis (PCA) for feature selection with a specialized version of the YOLO (YOLOv8) model, enhanced by spatial attention mechanisms for improved real-time recognition capabilities.

Performance evaluations indicate that the PCA_CNN model, which combines PCA for dimension reduction with a Convolutional Neural Network (CNN) for classification, achieves notable metrics: accuracy of 0.936, precision of 0.971, recall of 0.843, and Area Under the Curve (AUC) score of 0.943. In comparison, the YOLOv8 model with attention achieves even higher metrics, including an accuracy of 0.986 and AUC of 0.952, while also demonstrating significantly faster processing times, completing tasks in just 610 seconds. The model’s effectiveness is further validated through extensive testing on additional datasets, showcasing its potential for advancing applications in affective computing.

Introduction

The introduction of this research paper emphasizes the significance of emotion recognition (ER) in understanding human mental states and its applications in mental health monitoring and Human-Robot Interaction. The study aims to advance ER by transforming facial expressions into cross-bound emotions through a two-step process involving feature extraction and emotion detection. Key techniques include data preprocessing (normalization, resizing, denoising) and the use of Principal Component Analysis (PCA) for feature extraction, which reduces data dimensionality while retaining high-variance features. A deep convolutional neural network (CNN) is employed for image classification, leveraging the strengths of various CNN architectures to address challenges in ER, such as the need for high-resolution images and the complexities of facial emotion expression.

The proposed Real-Time Emotion Recognition System based on Fog Computing Technique (RERS-FoG) aims to enhance patient outcomes by providing timely notifications and real-time monitoring. The study introduces two models: PCA_CNN, which utilizes PCA for dimensionality reduction followed by a DCNN for classification, and YOLOv8 integrated with a self-attention mechanism. The findings indicate that the RERS-FoG system outperforms traditional methods, achieving high accuracy, precision, and recall across diverse datasets, including those specific to Autism Spectrum Disorder (ASD). The paper outlines its structure, with subsequent sections detailing research directions, methodology, system design, experimental results, and conclusions.

Methods

In this section, the authors evaluate various pre-trained architectures on original and dimension-reduced datasets, focusing on their performance metrics such as accuracy, precision, recall, and computational efficiency. The results indicate that VGG19 outperforms VGG16, achieving higher accuracy (0.884) and precision (0.913), attributed to its increased depth and complexity. ResNet169 also shows improved performance over ResNet121, with superior precision (0.859) and recall (0.853), although it incurs the highest computational time due to its extensive parameter count. Conversely, MobileNet and MobileNet V2 demonstrate competitive accuracy while utilizing fewer parameters, with MobileNet V2 achieving the highest accuracy (0.889) among all models, highlighting its efficiency in resource-constrained environments.

When assessing the impact of dimension reduction, the results reveal that all models benefit from enhanced predictive capabilities, with VGG19 and ResNet169 maintaining strong performance metrics. Notably, MobileNet achieves satisfactory accuracy (0.851) with significantly reduced computational time, making it suitable for real-time applications. The analysis underscores that dimensionality reduction effectively concentrates informative features, leading to improved model performance while also enhancing computational efficiency, particularly for MobileNet architectures. Overall, the findings suggest that MobileNet V2 is the most advantageous model due to its balance of accuracy and computational demands, making it ideal for latency-sensitive tasks.

Results

The results section of the research paper presents a comprehensive evaluation of the proposed CNN attention mechanism and YOLO V8 with a self-attention mechanism. The CNN model, when applied to original data, achieved an accuracy of 0.916, with precision, recall, F-measure, and AUC values of 0.923, 0.910, 0.913, and 0.911, respectively. Notably, applying dimensionality reduction improved these metrics, yielding an accuracy of 0.937, precision of 0.971, and a reduction in computational time from 1,132.32 seconds to 711.45 seconds. These findings indicate that dimensionality reduction effectively captures significant features, enhancing classification performance while improving computational efficiency.

In the evaluation of the YOLO V8 model with a self-attention mechanism, the model achieved an accuracy of 0.926 on original data, with precision, recall, F-measure, and AUC values of 0.893, 0.910, 0.874, and 0.855, respectively. The incorporation of self-attention further improved performance, resulting in an accuracy of 0.89669 and a reduction in processing time to 610.45 seconds. The findings emphasize the advantages of both attention mechanisms and dimensionality reduction techniques in enhancing model performance and computational efficiency, making these models suitable for real-time applications. Overall, the results demonstrate significant improvements in classification accuracy and resource utilization across both proposed models.

Discussion

The discussion section of the research paper highlights the ongoing challenges in detecting emotions from facial expressions, particularly for individuals with Autism, who often struggle with emotion recognition due to their neurobehavioral condition. Traditional methods for facial emotion recognition have relied on statistical approaches and local feature extraction techniques, such as Gabor wavelets and Scale-Invariant Feature Transform (SIFT), which have shown varying degrees of success. However, these methods often require complex parameter tuning and prior knowledge for effective feature extraction. In contrast, deep learning (DL) algorithms, particularly Convolutional Neural Networks (CNNs), have emerged as more efficient alternatives, capable of automatically learning image features without extensive preprocessing. Recent advancements in CNN architectures, including attention mechanisms, have further enhanced the performance of facial emotion recognition systems by allowing models to focus on salient features within images.

The paper also emphasizes the importance of multimodal approaches in emotion recognition, as relying solely on unimodal data can limit accuracy and robustness. The proposed real-time emotion recognition system based on fog computing aims to address these challenges by integrating various data sources, including facial expressions and physiological indicators, to improve the understanding of emotional well-being. The architecture of this system comprises multiple layers, including a sensing layer for data acquisition, a fog computing layer for feature reduction and real-time decision-making, and a notification system for timely interventions. This innovative approach not only enhances the accuracy and efficiency of emotion recognition but also holds promise for applications in mental health and human-computer interaction.