تحسين التعرف على تعبيرات الوجه في الوقت الحقيقي باستخدام التعلم العميق Enhanced Real-Time Facial Expression Recognition Using Deep Learning

المجلة: Acadlore Transactions on AI and Machine Learning، المجلد: 3، العدد: 1
DOI: https://doi.org/10.56578/ataiml030103
تاريخ النشر: 2024-01-25
المؤلف: Hafiz Burhan Ul Haq وآخرون
الموضوع الرئيسي: التعرف على العواطف والمزاج

نظرة عامة

تتناول هذه الورقة البحثية تطبيق الشبكات العصبية التلافيفية (CNNs) في التعرف على تعبيرات الوجه (FER)، مع التركيز بشكل خاص على تحديد سبعة حالات عاطفية عالمية: المفاجأة، الاشمئزاز، الخوف، السعادة، الحياد، الغضب، والازدراء. تسلط الدراسة الضوء على قيود النماذج الحالية، التي تؤدي بشكل جيد في البيئات المسيطر عليها ولكنها تواجه صعوبات مع مجموعات البيانات غير المسيطر عليها في الوقت الحقيقي بسبب تحديات مثل تدهور جودة الصورة، والانغلاق، والإضاءة المتغيرة، وتغيرات وضع الرأس. يقترح المؤلفون نهج تعلم عميق يهدف إلى تحسين دقة التعرف في السيناريوهات الواقعية، موضحين فعالية الطريقة من خلال مقارنات صارمة مع التقنيات اليدوية والنماذج الحالية.

تكشف النتائج أن نظام الكشف عن العواطف المقترح لا يميز بدقة بين الفئات العاطفية السبع فحسب، بل يظهر أيضًا أداءً متفوقًا بدقة تصل إلى 97.7%، مع الحفاظ على سرعة معالجة وكفاءة عالية. استخدمت التقييمات التجريبية مجموعتي بيانات، FER2013 وRandom، مؤكدة موثوقية النموذج مقارنة بأساليب التقييم الذاتية. تختتم الورقة بتوصيات للبحوث المستقبلية، بما في ذلك استكشاف فئات عاطفية إضافية، وتعزيز الدقة، وتنفيذ أنظمة التعرف في الوقت الحقيقي، جنبًا إلى جنب مع خطط لتطوير واجهة سهلة الاستخدام للتطبيقات العملية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على التقدمات والتحديات الكبيرة في مجال الذكاء الاصطناعي (AI) فيما يتعلق بتحليل تعبيرات الوجه، وهي مهمة معقدة تتأثر بالاختلافات الفردية مثل العرق، والعمر، والجنس، بالإضافة إلى العوامل السياقية مثل الإضاءة ووضع الجسم. تشير الورقة إلى الدراسات الأساسية، بما في ذلك أعمال تشارلز داروين وتحديد إيكمان وفريسن لستة عواطف عالمية، مما يبرز أهمية التواصل غير اللفظي في التفاعلات البشرية. تفرض الأهمية المتزايدة للعواطف في تفاعل الإنسان مع الروبوت (HRI) ضرورة اتباع نهج متعدد التخصصات، يجمع بين الذكاء الاصطناعي، والروبوتات، والعلوم الاجتماعية، مع معالجة قيود تقنيات التعرف على الوجه الحالية.

تقترح الدراسة إطارًا جديدًا قائمًا على التعلم العميق يهدف إلى تحسين التعرف على عواطف الوجه في الوقت الحقيقي، موسعًا نطاق العواطف القابلة للاكتشاف من ستة إلى سبعة فئات، بما في ذلك الحياد. تشمل التحسينات الرئيسية للنموذج بنية مبسطة لمعالجة فعالة، ودقة محسنة من خلال تقنيات التعلم العميق المتقدمة، وتقييم صارم ضد مجموعات بيانات محددة مسبقًا. تشمل المنهجية جمع بيانات شاملة، وتدريب النموذج، وتطبيقه في الوقت الحقيقي، مما يضع النظام المقترح كخطوة كبيرة إلى الأمام في الكشف عن عواطف الوجه، مع تطبيقات محتملة في تفاعل الإنسان مع الكمبيوتر، وأبحاث السوق، وتقييمات الصحة النفسية.

طرق

تم هيكلة نموذج التعرف على العواطف المقترح في أربعة مكونات رئيسية: جمع البيانات، معالجة البيانات، توقع العواطف، وتقييم الأداء. في البداية، يتم تجميع مجموعة بيانات متنوعة، تحتوي على صور تجسد طيفًا من العواطف. ثم يتم تصنيف هذه المجموعة إلى سبع حالات عاطفية: الغضب، السعادة، الخوف، الاشمئزاز، الحياد، الحزن، والمفاجأة. يتم استخدام نموذج تعلم عميق لتوقع العواطف من الصور، تليها تقييم الأداء الذي يقيم دقة هذه التوقعات.

استخدم التحليل التجريبي جهاز كمبيوتر مزود بمعالج Intel Core i5-6200U وذاكرة RAM سعة 8 جيجابايت، مع استخدام بايثون كلغة برمجة. تم تقييم دقة النموذج مقابل مجموعة بيانات اختبار تحتوي على ميزات مستهدفة محددة، باستخدام مصفوفة الارتباك لتصور وقياس الأداء. تم حساب مقاييس رئيسية مثل الدقة، والدقة، والاسترجاع باستخدام الصيغ:

\[
\text{الدقة} = \frac{TP + TN}{TP + TN + FP + FN}, \quad \text{الدقة} = \frac{TP}{TP + FP}, \quad \text{الاسترجاع} = \frac{TP}{TP + FN}
\]

شمل التقييم مقارنة توقعات النموذج مع الصور المصنفة يدويًا، مما كشف عن تحديات في التمييز بدقة بين حالات العواطف المتشابهة. على سبيل المثال، تم تصنيف الصور التي تظهر الخوف أحيانًا كمفاجأة، وتلك التي تمثل الاشمئزاز تم التعرف عليها بشكل غير صحيح كحزن. تؤكد هذه النتائج على الصعوبات المتأصلة في التعرف على العواطف بسبب التشابهات البصرية بين العواطف المختلفة، كما هو موضح في الأشكال المقدمة.

نتائج

تشير نتائج نموذج MobileNet-V1 المقترح إلى دقة ملحوظة تبلغ 100% خلال التحقق و97.9% خلال التدريب، مما يظهر قدراته الفعالة في التعميم. ومع ذلك، عند تطبيقه على صور العالم الحقيقي التي تصور مجموعة من الحالات العاطفية—مثل السعادة، الحزن، الغضب، الخوف، المفاجأة، الاشمئزاز، والحياد—واجه النموذج تحديات كبيرة. كما واجهت التقييمات اليدوية صعوبات في تصنيف العواطف بدقة، خاصة في الحالات التي أظهرت فيها تعبيرات الوجه غموضًا، مما أدى إلى تفاوتات بين توقعات النموذج وأحكام البشر. على سبيل المثال، تم تصنيف الصور التي بدت خائفة أحيانًا كمفاجأة، مما يبرز الطبيعة الذاتية للتعرف على العواطف.

على الرغم من الأداء العالي للنموذج في البيئات المسيطر عليها، كشفت تطبيقاته في السيناريوهات الواقعية عن صعوبات في تمييز الفروق الدقيقة العاطفية. تؤكد النتائج على التعقيدات المتأصلة في مهام الكشف عن العواطف، خاصة عندما تكون تعبيرات الوجه مرتبطة ارتباطًا وثيقًا. بشكل عام، بينما يظهر نموذج MobileNet-V1 دقة متفوقة مقارنة بالنماذج الأخرى في الدراسة المقارنة، تؤكد النتائج على الحاجة إلى معالجة القيود والتحديات المرتبطة بالتعرف على العواطف في سياقات تصويرية متنوعة.

نقاش

تسلط قسم النقاش في الورقة الضوء على التقدمات والتحديات في تقنيات التعرف على عواطف الوجه (FER)، مع التركيز بشكل خاص على دمج نماذج التعلم العميق. يستعرض مجموعة متنوعة من المنهجيات، بما في ذلك الشبكات العصبية العميقة مع التعلم النسبي (DNNRL) المقترحة من قبل Guo وآخرين، والتي تقلل من خسارة الثلاثي لتعزيز التعرف على التعبيرات من خلال تعديل المسافات في فضاء التضمين. كما تصنف القسم طرق استخراج الميزات إلى طرق هندسية وأخرى قائمة على المظهر، مما يبرز أهمية استخراج ميزات عالية الجودة لتصنيف العواطف بدقة. تسلط المساهمات الملحوظة من باحثين آخرين، مثل استخدام Li وآخرين للجيران الأقرب مع خسارة الحفاظ على المحلية وتعزيز Cai وآخرين لخسارة المركز من خلال خسارة الجزيرة، الضوء على الجهود المستمرة لتحسين تماسك الفئات الداخلية وفصل الفئات الخارجية في تمثيل الميزات.

تؤكد الورقة على أهمية خوارزميات التعلم العميق في تقدم FER، مع اختبار هياكل مختلفة عبر مجموعات بيانات متعددة. على سبيل المثال، يتم تسليط الضوء على تطبيق MobileNet للكشف عن العواطف في الوقت الحقيقي، مما يظهر كفاءته في سرعة المعالجة مقارنة بالنماذج الأخرى. يحقق النموذج المقترح دقة ملحوظة تبلغ 97.7% مع الحفاظ على سرعات معالجة عالية، مما يجعله مناسبًا للتطبيقات الواقعية. تختتم المناقشة بدعوة للبحوث المستقبلية لاستكشاف تصنيفات عاطفية إضافية وتحسين قدرات التعرف في الوقت الحقيقي، جنبًا إلى جنب مع تطوير واجهة سهلة الاستخدام للنشر العملي.

Journal: Acadlore Transactions on AI and Machine Learning, Volume: 3, Issue: 1
DOI: https://doi.org/10.56578/ataiml030103
Publication Date: 2024-01-25
Author(s): Hafiz Burhan Ul Haq et al.
Primary Topic: Emotion and Mood Recognition

Overview

This research paper focuses on the application of convolutional neural networks (CNNs) for facial expression recognition (FER), specifically targeting the identification of seven universal emotional states: surprise, disgust, fear, happiness, neutrality, anger, and contempt. The study highlights the limitations of existing models, which perform well in controlled environments but struggle with real-time, uncontrolled datasets due to challenges such as image quality degradation, occlusions, variable lighting, and head pose variations. The authors propose a deep learning approach aimed at improving recognition accuracy in real-world scenarios, demonstrating the method’s effectiveness through rigorous comparisons with manual techniques and existing models.

The findings reveal that the proposed emotion detection system not only accurately differentiates among the seven emotional categories but also exhibits superior performance with an accuracy of 97.7%, while maintaining high processing speed and efficiency. The experimental evaluation utilized two datasets, FER2013 and Random datasets, confirming the model’s reliability compared to subjective evaluation methods. The paper concludes with suggestions for future research, including the exploration of additional emotional classes, enhancements in accuracy, and the implementation of real-time recognition systems, alongside plans for developing a user-friendly interface for practical applications.

Introduction

The introduction of this research paper highlights the significant advancements and challenges in the field of artificial intelligence (AI) concerning facial expression analysis, a complex task influenced by individual differences such as ethnicity, age, and gender, as well as contextual factors like lighting and posture. The paper references foundational studies, including Charles Darwin’s work and Ekman and Friesen’s identification of six universal emotions, emphasizing the importance of non-verbal communication in human interactions. The increasing relevance of emotions in human-robot interaction (HRI) necessitates a multidisciplinary approach, integrating AI, robotics, and social sciences, while also addressing the limitations of current facial recognition technologies.

The study proposes a novel deep learning-based framework aimed at improving real-time facial emotion recognition, expanding the range of detectable emotions from six to seven categories, including neutrality. Key enhancements of the model include streamlined architecture for efficient processing, improved accuracy through advanced deep learning techniques, and rigorous evaluation against predefined datasets. The methodology encompasses comprehensive data collection, model training, and real-time application, ultimately positioning the proposed system as a significant advancement in facial emotion detection, with potential applications in human-computer interaction, market research, and mental health assessments.

Methods

The proposed emotion recognition model is structured into four main components: data collection, data preprocessing, emotion prediction, and performance evaluation. Initially, a diverse dataset is compiled, featuring images that encapsulate a spectrum of emotions. This dataset is then categorized into seven emotional states: anger, happiness, fear, disgust, neutrality, sadness, and surprise. A deep learning model is employed to predict emotions from the images, followed by a performance evaluation that assesses the accuracy of these predictions.

The experimental analysis utilized a computer with an Intel Core i5-6200U CPU and 8 GB of RAM, with Python as the programming language. The model’s accuracy was evaluated against a test dataset with established target features, using a confusion matrix to visualize and quantify performance. Key metrics such as accuracy, precision, and recall were calculated using the formulas:

\[
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}, \quad \text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}
\]

The evaluation involved comparing the model’s predictions with manually classified images, revealing challenges in accurately distinguishing between similar emotional states. For instance, images depicting fear were sometimes misclassified as surprise, and those representing disgust were incorrectly identified as sadness. These findings underscore the difficulties inherent in emotion recognition due to visual similarities among different emotions, as illustrated in the provided figures.

Results

The results of the proposed MobileNet-V1 model indicate a remarkable accuracy of 100% during validation and 97.9% during training, demonstrating its effective generalization capabilities. However, when applied to real-world images depicting a range of emotional states—such as happiness, sadness, anger, fear, surprise, disgust, and neutrality—the model encountered significant challenges. Manual assessments also struggled with accurate emotion classification, particularly in cases where facial expressions exhibited ambiguity, leading to discrepancies between the model’s predictions and human judgments. For instance, images that appeared fearful were sometimes classified as surprise, highlighting the subjective nature of emotion recognition.

Despite the model’s high performance in controlled settings, its application to real-world scenarios revealed difficulties in distinguishing subtle emotional nuances. The findings underscore the complexities inherent in emotion detection tasks, particularly when facial expressions are closely related. Overall, while the MobileNet-V1 model demonstrates superior accuracy compared to other models in the comparative study, the results emphasize the need to address the limitations and challenges associated with emotion recognition in diverse photographic contexts.

Discussion

The discussion section of the paper highlights the advancements and challenges in facial emotion recognition (FER) technologies, particularly focusing on the integration of deep learning models. It reviews various methodologies, including the innovative Deep Neural Networks with Relativity Learning (DNNRL) proposed by Guo et al., which minimizes triplet loss to enhance expression recognition by adjusting distances in the embedding space. The section also categorizes feature extraction methods into geometric and appearance-based approaches, emphasizing the importance of high-quality feature extraction for accurate emotion classification. Notable contributions from other researchers, such as Li et al.’s use of k-nearest neighbors with locality-preserving loss and Cai et al.’s enhancement of center loss through island loss, further illustrate the ongoing efforts to improve intra-class compactness and inter-class separation in feature representation.

The paper underscores the significance of deep learning algorithms in advancing FER, with various architectures being tested across multiple datasets. For instance, the application of MobileNet for real-time emotion detection is highlighted, showcasing its efficiency in processing speed compared to other models. The proposed model achieves a remarkable accuracy of 97.7% while maintaining high processing speeds, making it suitable for real-world applications. The discussion concludes with a call for future research to explore additional emotional classifications and improve real-time recognition capabilities, alongside the development of a user-friendly interface for practical deployment.