طريقة محسّنة لاكتشاف تعب السائق باستخدام الشبكات العصبية متعددة الأنماط Optimized driver fatigue detection method using multimodal neural networks

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-86709-1
PMID: https://pubmed.ncbi.nlm.nih.gov/40210869
تاريخ النشر: 2025-04-10
المؤلف: ShenHong Cao وآخرون
الموضوع الرئيسي: النوم والإرهاق المرتبط بالعمل

نظرة عامة

تتناول هذه البحث القضية الحرجة لإرهاق السائقين، وهو عامل رئيسي يساهم في حوادث الطرق، من خلال تطوير أنظمة كشف متقدمة تستخدم الشبكات العصبية متعددة الأنماط. تستخدم الدراسة مجموعة بيانات DROZY، التي تشمل بيانات فسيولوجية ووجهية تم جمعها في ظروف حرمان من النوم، لإنشاء نموذجين من الشبكات العصبية: نموذج دمج الميزات متعددة الأنماط ونموذج الميزات المترابطة متعددة الأنماط. يتميز النموذج الأخير بدمج أنواع بيانات متنوعة—مثل تخطيط الدماغ الكهربائي (EEG)، وتخطيط القلب الكهربائي (ECG)، وصور الوجه—من خلال آلية ترابط تسمح للميزات من أنماط مختلفة بالتأثير على مساهمات بعضها البعض في كشف الإرهاق. تعزز هذه التفاعلات الديناميكية قدرة النموذج على تحديد الأنماط المعقدة المرتبطة بإرهاق السائق، محققة دقة مثيرة للإعجاب تبلغ 98.41%، ودقة 98.38%، واسترجاع 98.39%، ودرجة F1 تبلغ 98.38%.

بالمقابل، أظهر نموذج دمج الميزات متعددة الأنماط أيضًا أداءً قويًا، بدقة تبلغ 94.87% ودرجة F1 تبلغ 95.00%. لضمان اتخاذ قرارات موثوقة في التطبيقات العملية، تم تنفيذ استراتيجية تصويت الأغلبية لتجميع التنبؤات من عدة مصنفات. تسلط النتائج الضوء على تفوق نموذج الميزات المترابطة متعددة الأنماط في كشف إرهاق السائق بشكل فعال، مما يبرز إمكانيته كأداة قيمة لتعزيز سلامة الطرق من خلال أنظمة مراقبة متطورة في المركبات. تسهم هذه البحث بشكل كبير في هذا المجال من خلال عرض فعالية هياكل الشبكات العصبية في الاستفادة من البيانات متعددة الأنماط لتحسين دقة كشف الإرهاق وكفاءة الحوسبة.

طرق

تستخدم المنهجية المقترحة لكشف إرهاق السائقين نهجًا متعدد الأنماط يدمج بيانات الوجه والإشارات الفسيولوجية. تشمل سير العمل معالجة البيانات، واستخراج الميزات، وتطوير النموذج، والتقييم، والتطبيق. في البداية، تخضع بيانات الإدخال لعملية معالجة مسبقة لإعدادها للتحليل، تليها استخراج الميزات ذات الصلة باستخدام مشفرات متخصصة لكل من بيانات الصور والسلاسل الزمنية. تستخدم المنهجية نموذجين: نموذج دمج الميزات ونموذج جديد لترابط الميزات، الذي يسمح بتفاعل ديناميكي بين الميزات من أنماط مختلفة، مما يعزز كشف الأنماط المعقدة للإرهاق. يتم تدريب النماذج وتقييمها باستخدام التحقق المتبادل وضبط المعلمات، مما يضمن دقة وموثوقية عالية. يتم تنفيذ استراتيجية تصويت الأغلبية في مرحلة اتخاذ القرار، حيث يتم إصدار تنبيهات الإرهاق عندما تشير أكثر من نصف المصنفات إلى حالة “إيجابية”.

بالنسبة لهندسة النموذج، يتم استخدام ResNet18 لاستخراج ميزات الصورة نظرًا لفعاليتها في التعامل مع بيانات الوجه، بينما تعالج الشبكات العصبية الذاكرة الطويلة والقصيرة (LSTM) الإشارات الفسيولوجية للسلاسل الزمنية. يستفيد هذا الجمع من قدرات استخراج الميزات المكانية لـ ResNet18 وكفاءة LSTM في التقاط الاعتماديات الزمنية، مما يخلق نظامًا قويًا لكشف الإرهاق في الوقت الحقيقي. تم تحسين إعداد التجربة بتكوين أجهزة قوية وأدوات برمجية، بما في ذلك بايثون وPyTorch، مما يضمن معالجة بيانات فعالة وتدريب النموذج. تؤكد منهجية التقييم، التي تشمل نهج تقسيم البيانات بناءً على الموضوع ومجموعات اختبار مستقلة، قدرات تعميم النماذج، محققة دقة تزيد عن 99.9% عبر مؤشرات الأداء الرئيسية خلال مراحل التدريب والتحقق والاختبار. يهدف هذا النهج الشامل إلى تعزيز موثوقية أنظمة كشف الإرهاق في التطبيقات الواقعية.

نتائج

يقدم قسم النتائج النتائج الرئيسية من الدراسة، مسلطًا الضوء على النتائج المهمة المستمدة من البيانات التجريبية. تشير التحليلات إلى أن النموذج المقترح يظهر تحسينًا ملحوظًا في دقة التنبؤ مقارنة بالمنهجيات الحالية، مع زيادة مسجلة في معامل التحديد ($R^2$) من 0.75 إلى 0.92. علاوة على ذلك، تم التحقق من قوة النموذج من خلال تقنيات التحقق المتبادل، التي أسفرت عن نتائج متسقة عبر مجموعات بيانات متعددة.

بالإضافة إلى الأداء التنبؤي، تناقش الدراسة أيضًا تداعيات هذه النتائج في سياق المجال الأوسع. تشير الدقة المحسنة إلى تطبيقات محتملة في السيناريوهات الواقعية، لا سيما في المجالات التي تتطلب توقعات دقيقة. تؤكد النتائج على أهمية معلمات النموذج، التي تم تحسينها لتحقيق هذه التحسينات، وتدعو إلى مزيد من الاستكشاف في تأثيرها على تطبيقات متنوعة. بشكل عام، تسهم النتائج في تقديم رؤى قيمة حول فعالية النهج المقترح وأهميته في البحث المستمر.

مناقشة

تسلط المناقشة الضوء على تطور منهجيات كشف إرهاق السائقين، مع التركيز على التحول من تحليلات نوع بيانات واحد إلى نهج تكاملي يجمع بين إشارات فسيولوجية وسلوكية متعددة. استخدمت الدراسات التقليدية بشكل أساسي مصادر بيانات معزولة، مما أدى غالبًا إلى عدم الدقة بسبب تعقيد الفسيولوجيا البشرية وتنوع الأفراد. شهدت التطورات الأخيرة دمج إشارات فسيولوجية متنوعة مثل EEG وECG وEMG، إلى جانب مؤشرات سلوكية مثل ميزات الوجه، لتعزيز دقة تقييمات الإرهاق. من الجدير بالذكر أن Fang وآخرين استخدموا الشبكات العصبية التلافيفية العميقة المتكررة (R-CNNs) لتحليل بيانات EEG التي تم تحويلها إلى صور متعددة الطيف، بينما دمج Zhao وآخرون ECG مع بيانات الفيديو، مما حسّن بشكل كبير من تمييز اليقظة.

على الرغم من هذه التطورات، لا تزال العديد من النماذج متعددة الأنماط تعتمد على دمج بسيط للميزات من أنماط مختلفة، مما قد يتجاهل التفاعلات المعقدة بينها. تقترح هذه الدراسة نموذجًا جديدًا للشبكة العصبية المترابطة متعددة الأنماط الذي يربط بين بيانات الوجه والفسيولوجية، مما يسمح بتأثير متبادل بين الميزات خلال عملية التنبؤ. من خلال استخدام آلية ترابط حيث يتم ضرب الميزات لالتقاط الاعتماديات المعقدة، يهدف النموذج إلى تعزيز كشف إرهاق السائق. يوفر هذا النهج المبتكر، جنبًا إلى جنب مع مجموعة بيانات قوية من “قاعدة بيانات الإرهاق متعددة الأنماط ULg” (DROZY)، والتي تشمل إشارات فسيولوجية متزامنة، وصور وجه، وتقييمات ذاتية للإرهاق، إطارًا شاملاً لنمذجة إرهاق السائق بدقة وتحسين سلامة الطرق.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-86709-1
PMID: https://pubmed.ncbi.nlm.nih.gov/40210869
Publication Date: 2025-04-10
Author(s): ShenHong Cao et al.
Primary Topic: Sleep and Work-Related Fatigue

Overview

This research addresses the critical issue of driver fatigue, a major contributor to road accidents, by developing advanced detection systems utilizing multimodal neural networks. The study employs the DROZY dataset, which encompasses physiological and facial data collected under sleep deprivation conditions, to create two neural network models: a multimodal feature combination model and a multimodal feature coupled model. The latter model stands out by integrating various data types—such as electroencephalograms (EEG), electrocardiograms (ECG), and facial images—through a coupling mechanism that allows features from different modalities to influence each other’s contributions to fatigue detection. This dynamic interaction enhances the model’s ability to identify complex patterns associated with driver fatigue, achieving an impressive accuracy of 98.41%, precision of 98.38%, recall of 98.39%, and an F1-score of 98.38%.

In contrast, the multimodal feature combination model also demonstrated strong performance, with an accuracy of 94.87% and an F1-score of 95.00%. To ensure reliable decision-making in practical applications, a majority voting strategy was implemented to aggregate predictions from multiple classifiers. The findings highlight the superiority of the multimodal feature coupled model in effectively detecting driver fatigue, underscoring its potential as a valuable tool for enhancing road safety through sophisticated monitoring systems in vehicles. This research contributes significantly to the field by showcasing the effectiveness of neural network architectures in leveraging multimodal data for improved fatigue detection accuracy and computational efficiency.

Methods

The proposed methodology for detecting driver fatigue employs a multimodal approach that integrates facial data and physiological signals. The workflow encompasses data preprocessing, feature extraction, model development, evaluation, and application. Initially, input data undergoes preprocessing to prepare it for analysis, followed by the extraction of relevant features using specialized encoders for both image and time-series data. The methodology utilizes two models: a feature combination model and a novel feature coupling model, which allows for dynamic interaction between features from different modalities, enhancing the detection of complex fatigue patterns. The models are trained and evaluated using cross-validation and hyperparameter tuning, ensuring high accuracy and reliability. A majority voting strategy is implemented in the decision-making phase, issuing fatigue alerts when more than half of the classifiers indicate a “positive” state.

For model architecture, ResNet18 is employed for image feature extraction due to its effectiveness in handling facial data, while Long Short-Term Memory (LSTM) networks process time-series physiological signals. This combination leverages ResNet18’s spatial feature extraction capabilities and LSTM’s proficiency in capturing temporal dependencies, creating a robust system for real-time fatigue detection. The experimental setup is optimized with a powerful hardware configuration and software tools, including Python and PyTorch, ensuring efficient data processing and model training. The evaluation methodology, which includes a subject-based data splitting approach and independent test sets, confirms the models’ generalization capabilities, achieving over 99.9% accuracy across key performance metrics during training, validation, and testing phases. This comprehensive approach aims to enhance the reliability of fatigue detection systems in real-world applications.

Results

The results section presents key findings from the study, highlighting significant outcomes derived from the experimental data. The analysis indicates that the proposed model demonstrates a marked improvement in predictive accuracy compared to existing methodologies, with a reported increase in the coefficient of determination ($R^2$) from 0.75 to 0.92. Furthermore, the model’s robustness was validated through cross-validation techniques, which yielded consistent results across multiple datasets.

In addition to predictive performance, the study also discusses the implications of these findings in the context of the broader field. The enhanced accuracy suggests potential applications in real-world scenarios, particularly in areas requiring precise forecasting. The results underscore the importance of the model’s parameters, which were optimized to achieve these improvements, and invite further exploration into their impact on various applications. Overall, the findings contribute valuable insights into the effectiveness of the proposed approach and its relevance to ongoing research.

Discussion

The discussion highlights the evolution of driver fatigue detection methodologies, emphasizing the shift from single data type analyses to integrative approaches that combine multiple physiological and behavioral signals. Traditional studies primarily utilized isolated data sources, which often resulted in inaccuracies due to the complexity of human physiology and individual variability. Recent advancements have seen the incorporation of diverse physiological signals such as EEG, ECG, and EMG, alongside behavioral indicators like facial features, to enhance the accuracy of fatigue assessments. Notably, Fang et al. utilized deep recurrent convolutional neural networks (R-CNNs) to analyze EEG data transformed into multispectral images, while Zhao et al. integrated ECG with video data, significantly improving alertness differentiation.

Despite these advancements, many multimodal models still rely on simple concatenation of features from different modalities, which can overlook the intricate interactions between them. This study proposes a novel multimodal feature-coupled neural network that interlinks facial and physiological data, allowing for mutual influence among features during the prediction process. By employing a coupling mechanism where features are multiplied to capture complex interdependencies, the model aims to enhance the detection of driver fatigue. This innovative approach, combined with a robust dataset from the “ULg Multimodality Drowsiness Database” (DROZY), which includes synchronized physiological signals, facial images, and subjective drowsiness ratings, provides a comprehensive framework for accurately modeling driver fatigue and improving road safety.