إطار التعلم العميق المتكامل لاكتشاف تشتيت انتباه السائق والتعرف على الأجسام على الطريق في أنظمة مساعدة السائق المتقدمة Integrated deep learning framework for driver distraction detection and real-time road object recognition in advanced driver assistance systems

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-08475-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40645997
تاريخ النشر: 2025-07-11
المؤلف: S. Rakesh وآخرون
الموضوع الرئيسي: التفاعل بين الإنسان والآلات والسلامة

نظرة عامة

تقدم ورقة البحث نهجًا مبتكرًا لتعزيز سلامة الطرق من خلال دمج تقنيات التعلم العميق المتقدمة للكشف عن تشتيت انتباه السائق والتعرف على العناصر الموجودة على الطريق في الوقت الحقيقي. تعتمد المنهجية على الشبكات العصبية التلافيفية (CNNs) والتعلم الانتقالي لتصنيف تشتيت انتباه السائق إلى فئات جسدية وبصرية ومعرفية، مما يحسن الدقة مع تقليل المتطلبات الحاسوبية. يستخدم النظام إطار عمل YOLO (تنظر مرة واحدة فقط) للكشف في الوقت الحقيقي عن العناصر الحيوية على الطريق، بما في ذلك المركبات والمشاة وعلامات المسار وإشارات المرور. لا يكتفي هذا النظام ذو الوظائف المزدوجة بالكشف عن التشتيت، بل يتعرف أيضًا على سيناريوهات القيادة، مما يمكّن وحدة اتخاذ القرار من تقييم مستويات الخطر وإصدار تحذيرات أو إجراءات تصحيحية في الوقت المناسب.

تم التحقق من فعالية النظام المقترح من خلال اختبارات شاملة على مجموعات بيانات مثل مجموعة بيانات السائق المشتت من State Farm، وKITTI، وMS COCO، مما يظهر تحسينًا في الدقة والكفاءة. يسهل دمج CNNs وYOLO نظام مساعدة السائق المتقدم (ADAS) القابل للتوسع والعملي، القادر على العمل في ظروف صعبة متنوعة، بما في ذلك المطر والضباب والبيئات ذات الإضاءة المنخفضة. تم نشر النظام بنجاح على منصة NVIDIA Jetson Xavier NX، محققًا 25 إطارًا في الثانية مع تقليل زمن الاستجابة، مما يدل على جدواه للتطبيقات ذات الموارد المحدودة. تؤكد الورقة على أهمية ربط تقنيات القيادة الذاتية بالعوامل البشرية، مقترحةً وحدة مراقبة السائق المحددة المجال ونموذج التعرف على مخاطر الطريق القائم على المعرفة كاتجاهات مستقبلية للبحث والتطوير في هذا المجال.

طرق

في هذا القسم، يحدد المؤلفون نظامًا متكاملًا قائمًا على التعلم العميق مصممًا للكشف عن تشتيت انتباه السائق والتعرف على العناصر الموجودة على الطريق. تشمل المنهجية عدة مكونات حيوية، بما في ذلك متطلبات البيانات، وتكوينات النموذج، واستراتيجيات الدمج، وخطط النشر.

تم تفصيل إعداد التجربة كما يلي: الأجهزة المستخدمة تشمل وحدة معالجة الرسوميات NVIDIA Tesla V100 بسعة 16 جيجابايت من الذاكرة وقرص SSD بسعة 2 تيرابايت لتخزين البيانات. على الجانب البرمجي، يستخدم النظام TensorFlow 2.0 وPyTorch لتنفيذ النموذج، إلى جانب OpenCV لمعالجة الصور والتعليق، وTensorRT لتحسين النشر. نظام التشغيل المستخدم لهذا الإعداد هو Ubuntu 20.04، مما يضمن بيئة قوية للمنهجيات المقترحة.

نتائج

يقدم قسم النتائج في ورقة البحث تقييمًا شاملاً للنظام المتكامل المقترح للكشف عن تشتيت انتباه السائق والتعرف على العناصر الموجودة على الطريق. تشمل النتائج الرئيسية استخدام مصفوفات الالتباس ورسوم بيانية للدقة لتوضيح أداء تصنيف النموذج عبر فئات التشتيت المختلفة وتحسينه على مر عصور التدريب. أظهر النظام دقة عالية في الكشف عن أنواع مختلفة من تشتيت انتباه السائق، مما يؤكد فعالية إطار العمل القائم على التعلم العميق المستخدم. تتماشى النتائج مع الدراسات السابقة التي تشير إلى أن الأساليب القائمة على التجميع تعزز الكشف عن سلوكيات السائق الدقيقة، مما يساهم في قدرات تعميم النموذج عبر سيناريوهات متنوعة.

ومع ذلك، تعترف الدراسة بالقيود، مثل حدوث إيجابيات كاذبة وتأثير التغيرات البيئية على دقة الكشف. تؤكد هذه التحديات على الحاجة إلى نماذج أكثر تطورًا لتقليل مخاطر التصنيف الخاطئ والتكيف مع الظروف المتغيرة. أثبت دمج الشبكات العصبية التلافيفية (CNN) وإطار عمل You Only Look Once (YOLO) فعاليته في الحفاظ على معدلات دقة واسترجاع عالية لكل من تشتيت انتباه السائق ومخاطر الطريق. تم ملاحظة تحسينات في الأداء بعد التحسين، مع تقليل كبير في زمن الاستدلال وحجم النموذج، مما يعزز جدوى النظام للنشر في الوقت الحقيقي على الأجهزة الطرفية. بشكل عام، بينما تبقى التحديات، فإن قابلية تكيف النظام وقابليته للتوسع تجعله حلاً واعدًا لتعزيز سلامة الطرق في سياقات القيادة الذاتية.

نقاش

تقدم البحث المقترح نظامًا متكاملًا يجمع بين الكشف عن تشتيت انتباه السائق والتعرف على العناصر الموجودة على الطريق، مما يعالج فجوة حيوية في أنظمة مساعدة السائق المتقدمة الحالية (ADAS). من خلال الاستفادة من التعلم الانتقالي مع نماذج مدربة مسبقًا مثل ResNet وYOLOv4، يحقق النظام دقة عالية مع تقليل التكاليف الحاسوبية، مما يعزز الوعي بالوضع وقد يقلل من حوادث الطرق. يسمح هذا الدمج بمراقبة متزامنة لسلوك السائق والمخاطر البيئية، وهو أمر ضروري لاتخاذ القرارات في الوقت الحقيقي في كل من سياقات القيادة المساعدة والقيادة الذاتية بالكامل.

تسلط مراجعة الأدبيات الضوء على قيود الأنظمة الحالية التي تركز عادةً على إما تشتيت انتباه السائق أو الكشف عن العناصر الموجودة على الطريق بشكل منفصل. بينما حقق الكشف عن تشتيت انتباه السائق تقدمًا باستخدام الشبكات العصبية التلافيفية (CNNs)، فإنه غالبًا ما يواجه صعوبات في التعميم عبر ظروف مختلفة. من ناحية أخرى، تقدم الكشف عن العناصر الموجودة على الطريق بشكل كبير من خلال نماذج التعلم العميق ولكنه يواجه تحديات في ظروف الطقس السيئة. يتغلب النظام المقترح على هذه القيود من خلال دمج مخرجات كلا وحدتي الكشف، وحساب مستوى الخطر في الوقت الحقيقي من خلال جمع مرجح لدرجات التشتيت والمخاطر. لا يحسن هذا النهج الشامل دقة تقييمات المخاطر فحسب، بل يسهل أيضًا التنبيهات في الوقت المناسب لمنع الحوادث، مما يعزز سلامة الطرق بشكل عام.

علاوة على ذلك، يستخدم الدمج آلية انتباه خفيفة الوزن لضبط أهمية مخاطر السائق والبيئة ديناميكيًا بناءً على العوامل السياقية، مما يحسن قابلية تكيف النظام مع سيناريوهات القيادة المتنوعة. تستخدم وحدة اتخاذ القرار مصنفات تعلم آلي قائمة على التجميع، محققةً مقاييس أداء قوية، بما في ذلك درجة F1 تبلغ 92.4%. بشكل عام، يظهر البحث أن النظام المتكامل يعزز بشكل كبير الوعي بالوضع والسلامة في بيئات القيادة، مما يمهد الطريق للتقدمات المستقبلية في تقنيات المركبات الذاتية القيادة.

قيود

في قسم القيود، يناقش المؤلفون القيود المرتبطة بالتقنيات الرئيسية المستخدمة في بحثهم. يبرزون أنه بينما توفر هذه الأساليب رؤى قيمة، إلا أنها ليست خالية من العيوب. على سبيل المثال، قد تبسط بعض النماذج الرياضية الظواهر المعقدة، مما يؤدي إلى احتمالية عدم الدقة في التنبؤات. بالإضافة إلى ذلك، يمكن أن يحد الاعتماد على افتراضات معينة من إمكانية تعميم النتائج عبر سياقات أو مجموعات سكانية مختلفة.

علاوة على ذلك، يعترف المؤلفون بأن التعقيد الحاسوبي لبعض التقنيات قد يعيق تطبيقها العملي في السيناريوهات الواقعية. تؤكد هذه القيود على الحاجة إلى مزيد من التحسين والتحقق من صحة هذه الأساليب لتعزيز قوتها وقابليتها للتطبيق. بشكل عام، بينما تسهم التقنيات بشكل كبير في هذا المجال، فإن قيودها تستدعي اعتبارًا دقيقًا في تفسير النتائج وتطبيقها على سياقات أوسع.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-08475-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40645997
Publication Date: 2025-07-11
Author(s): S. Rakesh et al.
Primary Topic: Human-Automation Interaction and Safety

Overview

The research paper presents an innovative approach to enhancing road safety by integrating advanced deep learning techniques for driver distraction detection and real-time road object recognition. The methodology employs Convolutional Neural Networks (CNNs) and transfer learning to classify driver distractions into physical, visual, and cognitive categories, optimizing accuracy while minimizing computational demands. The system utilizes the YOLO (You Only Look Once) framework for real-time detection of critical road elements, including vehicles, pedestrians, lane markers, and traffic signals. This dual-functionality system not only detects distractions but also recognizes driving scenarios, enabling a decision-making module to assess danger levels and issue timely warnings or corrective actions.

The effectiveness of the proposed system is validated through extensive testing on datasets such as the State Farm Distracted Driver Dataset, KITTI, and MS COCO, demonstrating improved accuracy and efficiency. The integration of CNNs and YOLO facilitates a scalable and practical Advanced Driver Assistance System (ADAS), capable of operating in various challenging conditions, including rain, fog, and low-light environments. The system was successfully deployed on an NVIDIA Jetson Xavier NX platform, achieving 25 frames per second with reduced latency, indicating its feasibility for resource-constrained applications. The paper emphasizes the importance of connecting autonomous driving technologies with human factors, proposing a domain-specific driver monitoring module and a knowledge-based road hazard recognition model as future directions for research and development in this field.

Methods

In this section, the authors outline a deep learning-based integrated system designed for detecting driver distractions and recognizing road objects. The methodology encompasses several critical components, including data requirements, model configurations, integration strategies, and deployment plans.

The experimental setup is detailed as follows: the hardware utilized includes an NVIDIA Tesla V100 GPU with 16 GB of memory and a 2 TB SSD for data storage. On the software side, the system employs TensorFlow 2.0 and PyTorch for model implementation, alongside OpenCV for image preprocessing and annotation, and TensorRT for optimizing deployment. The operating system used for this setup is Ubuntu 20.04, ensuring a robust environment for the proposed methodologies.

Results

The results section of the research paper presents a comprehensive evaluation of the proposed integrated driver distraction detection and road object recognition system. Key findings include the use of confusion matrices and accuracy graphs to illustrate the model’s classification performance across various distraction categories and its improvement over training epochs. The system demonstrated high accuracy in detecting different types of driver distractions, affirming the efficacy of the deep learning framework employed. The results align with previous studies indicating that ensemble-based methods enhance the detection of nuanced driver behaviors, contributing to the model’s generalization capabilities across diverse scenarios.

However, the study acknowledges limitations, such as the occurrence of false positives and the impact of environmental variations on detection accuracy. These challenges underscore the need for more sophisticated models to mitigate misclassification risks and adapt to changing conditions. The fusion of Convolutional Neural Networks (CNN) and You Only Look Once (YOLO) frameworks proved effective in maintaining high precision and recall rates for both driver distractions and road hazards. Performance improvements were noted post-optimization, with significant reductions in inference latency and model size, enhancing the system’s feasibility for real-time deployment on edge devices. Overall, while challenges remain, the system’s adaptability and scalability position it as a promising solution for enhancing road safety in autonomous driving contexts.

Discussion

The proposed research presents an integrated system that combines driver distraction detection with road object recognition, addressing a critical gap in current Advanced Driver Assistance Systems (ADAS). By leveraging transfer learning with pre-trained models such as ResNet and YOLOv4, the system achieves high accuracy while minimizing computational costs, thus enhancing situational awareness and potentially reducing road accidents. This integration allows for simultaneous monitoring of driver behavior and environmental hazards, which is essential for real-time decision-making in both assistive and fully autonomous driving contexts.

The literature review highlights the limitations of existing systems that typically focus on either driver distraction or road object detection in isolation. While driver distraction detection has made strides using convolutional neural networks (CNNs), it often struggles with generalizability across varying conditions. Conversely, road object detection has advanced significantly through deep learning models but faces challenges in adverse weather conditions. The proposed system overcomes these limitations by merging outputs from both detection modules, calculating a real-time risk level through a weighted summation of distraction and hazard scores. This holistic approach not only improves the accuracy of risk assessments but also facilitates timely alerts to prevent accidents, thereby enhancing overall road safety.

Furthermore, the integration employs a lightweight attention mechanism to dynamically adjust the importance of driver and environmental risks based on contextual factors, improving the system’s adaptability to diverse driving scenarios. The decision-making module utilizes ensemble machine learning classifiers, achieving robust performance metrics, including an F1-score of 92.4%. Overall, the research demonstrates that the integrated system significantly enhances situational awareness and safety in driving environments, paving the way for future advancements in autonomous vehicle technologies.

Limitations

In the section on limitations, the authors discuss the constraints associated with the key techniques employed in their research. They highlight that while these methods provide valuable insights, they are not without shortcomings. For instance, certain mathematical models may oversimplify complex phenomena, leading to potential inaccuracies in predictions. Additionally, the reliance on specific assumptions can limit the generalizability of the findings across different contexts or populations.

Moreover, the authors acknowledge that the computational complexity of some techniques may hinder their practical application in real-world scenarios. This limitation underscores the need for further refinement and validation of these methods to enhance their robustness and applicability. Overall, while the techniques contribute significantly to the field, their limitations warrant careful consideration in interpreting the results and applying them to broader contexts.