نموذج كشف سلوك السلامة في البناء خفيف الوزن قائم على YOLOv8 المحسن Lightweight construction safety behavior detection model based on improved YOLOv8

المجلة: Discover Applied Sciences، المجلد: 7، العدد: 4
DOI: https://doi.org/10.1007/s42452-025-06766-z
تاريخ النشر: 2025-04-10
المؤلف: Kan Huang وآخرون
الموضوع الرئيسي: أبحاث الصحة والسلامة المهنية

نظرة عامة

تقدم هذه الورقة البحثية نموذجًا خفيف الوزن للكشف عن سلوكيات السلامة في مواقع البناء استنادًا إلى نسخة محسّنة من YOLOv8، تهدف إلى تحسين دقة الكشف وكفاءته في بيئات البناء المعقدة. يتناول المؤلفون قيود نماذج YOLO التقليدية، وخاصة القضايا المتعلقة بالكشف المفقود والمعالجة غير الكافية للميزات في مجموعات البيانات الكبيرة. لتعزيز الأداء، يستبدلون العمود الفقري الأصلي CSPDARKNET53 بـ MobileNetV3، مما يقلل من الحمل الحسابي ويزيد من سرعة المعالجة. بالإضافة إلى ذلك، يدمج النموذج وحدة حقل الاستقبال (RFB) لتوسيع مجال الاستقبال ويستخدم آلية الانتباه العالمية (GAM) لتحسين التعرف على الميزات المحلية.

تظهر النتائج التجريبية أن نموذج YOLOv8 المحسن يتفوق في الكشف عن خمسة سلوكيات غير آمنة شائعة بين عمال البناء، محققًا متوسط دقة (mAP) يبلغ 0.86، ودقة 0.84، ومعدل استرجاع 0.87، ودرجة F1 تبلغ 0.85، وIntersection over Union (IoU) تبلغ 0.8، متفوقًا بشكل كبير على الطرق التقليدية. بينما يوازن النموذج بنجاح بين التصميم الخفيف وزيادة دقة الكشف، يعترف المؤلفون بوجود قيود في المتانة والتعميم تحت ظروف بيئية متغيرة. يُقترح أن تركز الأعمال المستقبلية على تحسين الكفاءة الحسابية وتوسيع قدرة النموذج على تحديد مجموعة أوسع من سلوكيات السلامة، مما يعزز قابليته للتطبيق عبر إعدادات البناء المتنوعة.

مقدمة

تتناول مقدمة هذه الورقة البحثية التحديات الحرجة للسلامة التي تواجهها في هندسة النقل والبناء، وخاصة في مواقع البناء التي تتميز بعمليات معقدة، والعديد من الأفراد، والآلات الثقيلة. يبرز المؤلفون أن الأخطاء الطفيفة يمكن أن تؤدي إلى حوادث خطيرة، مما يبرز أهمية تعزيز تدابير السلامة في مشاريع البنية التحتية الكبيرة. تظهر نماذج الكشف عن السلوكيات الحالية، على الرغم من قدرتها على تحديد الأفعال غير الآمنة، قيودًا في البيئات ذات الكثافة العالية، مما يؤدي غالبًا إلى الكشف المفقود وصعوبات في تمييز السلوكيات المتشابهة. علاوة على ذلك، تكافح النماذج التقليدية مع التوازن بين دقة الكشف والأداء في الوقت الحقيقي بسبب متطلباتها الحسابية.

لمعالجة هذه التحديات، تقترح الورقة خوارزمية YOLOv8 المحسّنة المصممة خصيصًا للكشف عن سلوكيات السلامة في البناء. يتضمن هذا النموذج المحسن عمودًا فقريًا خفيف الوزن من Mobilenetv3 لتعزيز الكفاءة الحسابية، ويقدم وحدة حقل الاستقبال (RFB) لالتقاط الميزات متعددة المقاييس بشكل أفضل، ويدمج آلية الانتباه GAM لتحسين تعلم الميزات المحلية. تظهر النتائج التجريبية أن النموذج المقترح يتفوق على الطرق التقليدية، مما يلبي بفعالية متطلبات الإشراف على السلامة في مواقع البناء الكبيرة من خلال تحسين دقة الكشف والحفاظ على الأداء في الوقت الحقيقي. تشمل المساهمات الرئيسية تحسين خوارزمية YOLOv8، وتقديم وحدة RFB، ودمج آلية الانتباه GAM.

طرق

في هذه الدراسة، تم تطوير نموذج الكشف عن سلوكيات السلامة في مواقع البناء (SBD) باستخدام إعداد حوسبة عالية الأداء، يتميز بمعالج Intel Core i9-11900K وذاكرة RAM سعة 32 جيجابايت. تتكون مجموعة البيانات التجريبية من 2,300 صورة مستخرجة من مقاطع الفيديو المراقبة لـ 12 موقع بناء للنقل الذكي، مع التركيز على خمسة سلوكيات غير آمنة: عدم ارتداء خوذة، عدم استخدام حبل الأمان، عدم ارتداء سترة عاكسة، دخول المناطق الخطرة، والتسلق بشكل غير آمن. تم تطبيق تقنيات زيادة البيانات، بما في ذلك التدوير، والتدوير، وتعزيز السطوع، لزيادة حجم مجموعة البيانات.

استخدم تدريب النموذج التوقف المبكر لتحسين التكرارات، مع ملاحظات كبيرة في دالة الخسارة خلال الـ 50 تكرارًا الأولى، واستقرت بعد 100 تكرار. أظهر ضبط المعلمات أن الأداء الأمثل تحقق بمعدل تعلم يبلغ 0.001، وحجم دفعة يبلغ 64، وتدهور وزن يبلغ 0.0005، مما أدى إلى قيمة دالة خسارة دنيا تبلغ 0.64. أظهر نموذج YOLOv8 المحسن أعلى دقة كشف لفئة دخول المناطق الخطرة، محققًا درجة F1 تبلغ 0.91، مع الحفاظ على درجات تزيد عن 0.8 عبر جميع الفئات. ومع ذلك، لوحظت تحديات في الكشف عن السلوكيات المتعلقة بالملابس، والتي تعزى إلى تباين الحركات الفردية والملابس. بشكل عام، تشير النتائج إلى أن النموذج المقترح يعزز بشكل فعال الكشف عن سلوكيات السلامة في بيئات البناء.

نتائج

تسلط نتائج الدراسة حول MobileNetV3 الضوء على استخدامه المبتكر للتفكيك القابل للفصل عمقياً، والذي يقلل بشكل فعال من التعقيد الحسابي وعدد المعلمات. تتضمن هذه التقنية تفكيك الالتفاف القياسي إلى عمليتين متميزتين: الالتفاف العميق، الذي يعمل على كل قناة إدخال بشكل مستقل، والالتفاف النقطي، الذي يدمج المعلومات عبر القنوات. لا يقلل هذا التفكيك من التكاليف الحسابية فحسب، بل يعزز أيضًا خصائص النموذج الخفيفة الوزن، مما يؤدي إلى زيادة سرعة الاستدلال وتقليل استخدام الذاكرة.

علاوة على ذلك، يعزز دمج وحدات Squeeze-and-Excitation (SE) ووحدات Non-Linear (NL) بشكل كبير قدرات استخراج الميزات للنموذج وقدرته على تفسير المشاهد المعقدة. تعمل هذه التحسينات على تحسين العلاقات بين قنوات الميزات وتمكن من التقاط الاعتماديات بعيدة المدى، مما يؤدي إلى تحسين أداء النموذج. وبالتالي، تحقق MobileNetV3 سرعات استدلال أسرع واستهلاك موارد أقل مع الحفاظ على دقة عالية، مما يجعلها مناسبة بشكل خاص للأجهزة المحمولة والأنظمة المدمجة. يسهل التصميم الخفيف لـ MobileNetV3 النشر في البيئات ذات الموارد المحدودة، مما يوسع من قابلية تطبيق تقنيات رؤية الكمبيوتر ويوفر رؤى قيمة لتطوير نماذج خفيفة الوزن أخرى في مجال التعلم العميق.

نقاش

تسلط قسم النقاش في الورقة البحثية الضوء على التقدم والتحديات في الكشف عن سلوكيات السلامة في البناء (SBD) باستخدام تقنيات التعلم العميق (DL) ورؤية الكمبيوتر. يشير إلى أنه على الرغم من تحقيق تقدم كبير، خاصة مع سلسلة خوارزميات YOLO (You Only Look Once)، لا تزال الطرق الحالية تواجه قيودًا مثل مجموعات البيانات الصغيرة وغير المتنوعة، والتكيف الزائد، والصعوبات في البيئات المعقدة للبناء. تؤكد الورقة على أهمية المراقبة في الوقت الحقيقي لتعزيز سلامة العمال وتقليل الحوادث، داعية إلى دمج خوارزميات متقدمة مثل YOLOv8، التي تم تحسينها لأداء أفضل في الكشف عن السلوكيات غير الآمنة في مواقع البناء.

يقترح المؤلفون نموذج SBD خفيف الوزن محسّن من YOLOv8 يدمج MobileNetV3 كشبكة عمود فقري، إلى جانب وحدة حقل الاستقبال (RFB) وآلية الانتباه العالمية (GAM). تهدف هذه التحسينات إلى زيادة دقة الكشف مع تقليل التعقيد الحسابي، مما يجعل النموذج أكثر ملاءمة للتطبيقات في الوقت الحقيقي في البيئات ذات الموارد المحدودة. تشير النتائج إلى أن النموذج المحسن يتفوق بشكل كبير على سابقيه في مختلف مقاييس الأداء، بما في ذلك متوسط دقة (mAP)، والدقة، والاسترجاع، وIntersection over Union (IoU). ومع ذلك، يعترف المؤلفون بضرورة إجراء المزيد من التحسينات في متانة النموذج وقدرات التعميم، خاصة في ظروف الإضاءة الديناميكية والمتغيرة التي تتميز بها مواقع البناء. ستركز الأعمال المستقبلية على تحسين الكفاءة الحسابية وتوسيع قدرة النموذج على الكشف عن مجموعة أوسع من سلوكيات السلامة.

Journal: Discover Applied Sciences, Volume: 7, Issue: 4
DOI: https://doi.org/10.1007/s42452-025-06766-z
Publication Date: 2025-04-10
Author(s): Kan Huang et al.
Primary Topic: Occupational Health and Safety Research

Overview

This research paper presents a lightweight construction safety behavior detection model based on an enhanced version of YOLOv8, aimed at improving detection accuracy and efficiency in complex construction environments. The authors address the limitations of traditional YOLO models, particularly issues related to missed detections and inadequate feature processing in large-scale datasets. To enhance performance, they replace the original CSPDARKNET53 backbone with MobileNetV3, which reduces computational load and increases processing speed. Additionally, the model integrates the Receptive Field Block (RFB) module to expand the receptive field and employs the Global Attention Mechanism (GAM) to improve local feature recognition.

Experimental results demonstrate that the improved YOLOv8 model excels in detecting five common unsafe behaviors among construction workers, achieving a mean Average Precision (mAP) of 0.86, precision of 0.84, recall rate of 0.87, F1 score of 0.85, and Intersection over Union (IoU) of 0.8, significantly outperforming traditional methods. While the model successfully balances lightweight design with enhanced detection accuracy, the authors acknowledge limitations in robustness and generalization under varying environmental conditions. Future work is suggested to further optimize computational efficiency and expand the model’s capability to identify a broader range of safety behaviors, thereby enhancing its applicability across diverse construction settings.

Introduction

The introduction of this research paper addresses the critical safety challenges faced in transportation engineering construction, particularly at construction sites characterized by complex processes, numerous personnel, and heavy machinery. The authors highlight that even minor errors can lead to severe accidents, emphasizing the urgency of enhancing safety measures in large-scale infrastructure projects. Existing behavior detection models, while capable of identifying unsafe actions, exhibit limitations in high-density environments, often resulting in missed detections and difficulties in distinguishing similar behaviors. Furthermore, traditional models struggle with a trade-off between detection accuracy and real-time performance due to their computational demands.

To address these challenges, the paper proposes an improved YOLOv8 algorithm tailored for construction safety behavior detection. This enhanced model incorporates a lightweight Mobilenetv3 backbone to boost computational efficiency, introduces a Receptive Field Block (RFB) module for better multi-scale feature capture, and integrates a GAM-Attention mechanism to refine local feature learning. Experimental results demonstrate that the proposed model outperforms traditional methods, effectively meeting the safety supervision requirements of large construction sites by improving detection accuracy and maintaining real-time performance. The main contributions include the optimization of the YOLOv8 algorithm, the introduction of the RFB module, and the integration of the GAM-Attention mechanism.

Methods

In this study, a construction site Safety Behavior Detection (SBD) model was developed using a high-performance computing setup, featuring an Intel Core i9-11900K processor and 32 GB of RAM. The experimental dataset comprised 2,300 images extracted from surveillance videos of 12 intelligent transportation construction sites, focusing on five unsafe behaviors: not wearing a helmet, not using a safety rope, not wearing a reflective vest, entering dangerous areas, and climbing unsafely. Data augmentation techniques, including flipping, rotating, and brightness enhancement, were applied to increase the dataset’s volume.

The model training utilized early stopping to optimize iterations, with significant reductions in the loss function observed during the initial 50 iterations, stabilizing after 100 iterations. Parameter tuning revealed that optimal performance was achieved with a learning rate of 0.001, a batch size of 64, and a weight decay of 0.0005, resulting in a minimum loss function value of 0.64. The improved YOLOv8 model demonstrated the highest detection accuracy for the category of entering dangerous areas, achieving an F1 score of 0.91, while maintaining scores above 0.8 across all categories. However, challenges in detecting behaviors related to clothing were noted, attributed to variability in individual movements and attire. Overall, the findings indicate that the proposed model effectively enhances safety behavior detection in construction environments.

Results

The results of the study on MobileNetV3 highlight its innovative use of depth-wise separable convolution, which effectively reduces computational complexity and the number of parameters. This technique involves decomposing the standard convolution into two distinct processes: deep convolution, which operates on each input channel independently, and point-wise convolution, which integrates information across channels. This decomposition not only lowers computational costs but also enhances the model’s lightweight characteristics, leading to increased inference speed and reduced memory usage.

Furthermore, the incorporation of Squeeze-and-Excitation (SE) modules and Non-Linear (NL) modules significantly bolsters the model’s feature extraction capabilities and its ability to interpret complex scenes. These enhancements optimize the relationships between feature channels and enable the capture of long-range dependencies, resulting in improved model performance. Consequently, MobileNetV3 achieves faster inference speeds and lower resource consumption while maintaining high accuracy, making it particularly suitable for mobile devices and embedded systems. The lightweight design of MobileNetV3 facilitates deployment in resource-constrained environments, thereby broadening the applicability of computer vision technologies and providing valuable insights for the development of other lightweight models in the deep learning domain.

Discussion

The discussion section of the research paper highlights the advancements and challenges in construction safety behavior detection (SBD) using deep learning (DL) and computer vision technologies. It notes that while significant progress has been made, particularly with the YOLO (You Only Look Once) series of algorithms, existing methods still face limitations such as small and non-diverse datasets, overfitting, and difficulties in complex construction environments. The paper emphasizes the importance of real-time monitoring to enhance worker safety and reduce accidents, advocating for the integration of advanced algorithms like YOLOv8, which has been optimized for better performance in detecting unsafe behaviors on construction sites.

The authors propose an improved YOLOv8 lightweight construction SBD model that incorporates MobileNetV3 as the backbone network, along with a Receptive Field Block (RFB) module and a Global Attention Mechanism (GAM). These enhancements aim to increase detection accuracy while reducing computational complexity, making the model more suitable for real-time applications in resource-constrained environments. The results indicate that the improved model significantly outperforms its predecessor in various performance metrics, including mean Average Precision (mAP), precision, recall, and Intersection over Union (IoU). However, the authors acknowledge the need for further improvements in the model’s robustness and generalization capabilities, particularly in dynamic and variable lighting conditions typical of construction sites. Future work will focus on optimizing computational efficiency and expanding the model’s ability to detect a broader range of safety behaviors.