مسح شامل لنماذج كشف الكائنات الخفيفة المعتمدة على التعلم العميق للأجهزة الطرفية A comprehensive survey of deep learning-based lightweight object detection models for edge devices

المجلة: Artificial Intelligence Review، المجلد: 57، العدد: 9
DOI: https://doi.org/10.1007/s10462-024-10877-1
تاريخ النشر: 2024-08-10
المؤلف: Payal Mittal
الموضوع الرئيسي: تطبيقات الشبكات العصبية المتقدمة

نظرة عامة

تتناول هذه الدراسة تطوير نماذج كشف الكائنات الخفيفة المعتمدة على التعلم العميق والمُحسّنة للأجهزة الطرفية، استجابةً للطلب المتزايد على نماذج دقيقة وسريعة ومنخفضة الكمون. توفر الدراسة نظرة شاملة على طرق كشف الكائنات الخفيفة الحديثة، موضحةً الهياكل الأساسية المستخدمة عادةً، مثل ShuffleNet و MobileNetV2. تناقش الورقة عمليات التدريب والاستدلال ذات الصلة بتطبيقات التعلم العميق على الأجهزة الطرفية وتبرز تطبيقات مختلفة لهذه الكاشفات الخفيفة.

تشير النتائج إلى أن نماذج كشف الكائنات الخفيفة واعدة في تعزيز كفاءة الأجهزة المعمارية للشبكات العصبية. تقيم الدراسة أداء هذه النماذج على مجموعات بيانات معروفة مثل MS-COCO و PASCAL-VOC، مقارنتها بناءً على درجات متوسط الدقة (mAP) المعتمدة على COCO. بينما تظهر النماذج إمكانات كبيرة، إلا أنها لا تزال تعاني من تحقيق النتائج المثلى، مع اقتراح تحسينات في الأداء من خلال تقنيات مثل الشبكات الهرمية متعددة المقاييس والفروع (FPNs)، التعلم الفيدرالي، التقليم، وتقطير المعرفة. تختتم الورقة بالتأكيد على الحاجة إلى مزيد من التقدم في كشف الكائنات الخفيفة لتلبية متطلبات التطبيقات المتنوعة.

مقدمة

تناقش مقدمة ورقة البحث التأثير الكبير لتقنيات إنترنت الأشياء (IoT) على تطوير كاشفات الكائنات المعتمدة على التعلم العميق. بينما تحقق العديد من النماذج الحالية دقة عالية واستدلال في الوقت الحقيقي، إلا أنها غالبًا ما تتطلب موارد كبيرة من وحدة المعالجة المركزية (CPU)، مما يجعلها غير مناسبة للنشر على الأجهزة الطرفية. تبرز الورقة استراتيجيات مختلفة لتحسين تطبيقات التعلم العميق لبيئات الحافة، مثل تقسيم مهام المعالجة وتوحيد الموارد الوسيطة لتعزيز إدارة البيانات وتقليل الكمون.

علاوة على ذلك، تهدف ظهور كاشفات الكائنات الخفيفة إلى إنشاء شبكات مدمجة وفعالة مناسبة للإعدادات ذات الموارد المحدودة النموذجية لنشر إنترنت الأشياء. على الرغم من التقدم، لا تزال هناك تحديات في تحقيق دقة كشف كافية على الأجهزة الطرفية، خاصةً للتطبيقات عالية الأداء. تؤكد الورقة على دور الحوسبة الطرفية في معالجة هذه التحديات من خلال تسهيل معالجة البيانات محليًا، مما يقلل من استخدام النطاق الترددي ويعزز قدرات التطبيقات في الوقت الحقيقي عبر مختلف القطاعات، بما في ذلك الرعاية الصحية، والتجزئة، والنقل. يدعو المؤلفون إلى مزيد من تطوير نماذج الكشف الخفيفة لتحسين قابليتها للتطبيق في سيناريوهات الحوسبة الطرفية المتنوعة، مع السعي في النهاية لتحقيق أداء محسّن دون المساس بالدقة.

الطرق

تتكون منهجية نماذج كشف الكائنات الخفيفة على الأجهزة الطرفية من أربعة مكونات رئيسية: الإدخال، الهيكل الأساسي، العنق، ورأس الكاشف. يمكن أن يكون الإدخال صورة أو قطعة أو هرم، يتم معالجته بواسطة هيكل أساسي خفيف مثل CSPDarkNet أو ShuffleNet أو MobileNet أو PeleeNet لتوليد خرائط الميزات. الهيكل الأساسي مسؤول عن استخراج الميزات، وتحويل الإدخال إلى متجهات ميزات، بينما يربط العنق الهيكل الأساسي برأس الكاشف، محولًا خرائط الميزات لمعالجة تحديات كشف الكائنات المختلفة. يركز رأس الكاشف الخفيف، على غرار الشبكة العصبية العميقة، على استخراج مناطق الاهتمام (RoIs) ويستخدم طبقات التجميع لإنهاء ميزات الكائنات المكتشفة، والتي يتم تصنيفها ثم إرجاعها للحصول على إحداثيات صندوق الحدود.

يتضمن تدريب هذه النماذج التعاون بين الأجهزة الطرفية وخوادم السحابة، مما يتطلب تواصلًا فعالًا لإدارة نقل معلمات النموذج والبيانات. تُستخدم تقنيات مثل الانحدار العشوائي على الحافة (eSGD) لتقليل تكاليف الاتصال من خلال نقل التدرجات المهمة فقط. بالإضافة إلى ذلك، تسهل الهياكل الجديدة مثل Agile Condor مهام الرؤية الحاسوبية في الوقت الحقيقي عند حافة الشبكة، بينما تعزز طرق مثل Precog أداء التطبيقات المحمولة من خلال تحميل طلبات التصنيف مسبقًا وتخزينها مؤقتًا. يدعم إطار عمل ECHO أيضًا تحليلات البيانات الموزعة في بيئات الحافة-الضباب-السحابة الهجينة، مما يوفر خدمات أساسية لإدارة الموارد ومراقبة التطبيقات.

المناقشة

تؤكد قسم المناقشة في ورقة البحث على الدور الحاسم لكشف الكائنات في تطبيقات متنوعة، خاصة تلك التي تستخدم الأجهزة الطرفية، مثل كشف الوجه والمراقبة بالفيديو. تبرز التقدمات التي حققها التعلم العميق، والتي تعزز بشكل كبير أداء كشف الكائنات ولكنها تقدم أيضًا تحديات تتعلق بالتعقيد الحسابي. على وجه الخصوص، تتطلب كاشفات الكائنات المعتمدة على التعلم العميق غالبًا عددًا أكبر بكثير من عمليات النقطة العائمة (FLOPs) مقارنة بشبكات التصنيف، مما يعقد نشرها على الأجهزة الطرفية ذات الموارد المحدودة. تحدد الورقة فجوة في الأدبيات بشأن نماذج كشف الكائنات الخفيفة المخصصة لتطبيقات الحافة، داعيةً إلى تقييم شامل للأبحاث الحالية لتوجيه التطورات المستقبلية في هذا المجال.

تصنف هذه القسم كاشفات الكائنات المعتمدة على التعلم العميق إلى نوعين رئيسيين: نماذج ذات مرحلتين ونماذج ذات مرحلة واحدة. تتفوق نماذج المرحلتين، مثل Faster R-CNN، في الدقة ولكنها أبطأ بسبب عمليات اقتراح المنطقة الخاصة بها. في المقابل، تعطي نماذج المرحلة الواحدة، مثل YOLO و SSD، الأولوية للسرعة والكفاءة من خلال اعتبار كشف الكائنات مشكلة انحدار، وبالتالي القضاء على مرحلة اقتراح المنطقة. تناقش الورقة أيضًا الكاشفات المتقدمة التي تبتكر أكثر من خلال إزالة المراسي وتوظيف استراتيجيات كشف جديدة. تحظى النماذج الخفيفة، المصممة للبيئات ذات الموارد المحدودة، باهتمام متزايد، مع تسليط الضوء على هياكل مثل MobileNet و ShuffleNet لكفاءتها. يختتم المؤلفون بالتأكيد على الحاجة إلى مزيد من البحث في نماذج كشف الكائنات الخفيفة، خاصةً في سياق الأجهزة الطرفية، لتحقيق توازن بين الأداء والكفاءة الحسابية.

Journal: Artificial Intelligence Review, Volume: 57, Issue: 9
DOI: https://doi.org/10.1007/s10462-024-10877-1
Publication Date: 2024-08-10
Author(s): Payal Mittal
Primary Topic: Advanced Neural Network Applications

Overview

This study focuses on the development of deep learning-based lightweight object detection models optimized for edge devices, addressing the increasing demand for models that are accurate, fast, and low-latency. It provides a comprehensive overview of recent lightweight object detection methods, detailing the backbone architectures commonly employed, such as ShuffleNet and MobileNetV2. The paper discusses the training and inference processes relevant to deep learning applications on edge devices and highlights various applications of these lightweight detectors.

The findings indicate that lightweight object detection models are promising for enhancing the hardware efficiency of neural network architectures. The study evaluates the performance of these models on well-known datasets like MS-COCO and PASCAL-VOC, comparing them based on COCO-based mean Average Precision (mAP) scores. While the models show significant potential, they still fall short of achieving optimal outcomes, with performance improvements suggested through techniques such as multi-scale and multi-branch Feature Pyramid Networks (FPNs), federated learning, pruning, and knowledge distillation. The paper concludes by emphasizing the need for further advancements in lightweight object detection to meet the demands of diverse applications.

Introduction

The introduction of the research paper discusses the significant impact of Internet of Things (IoT) technologies on the development of deep learning-based object detectors. While many existing models achieve high accuracy and real-time inference, they often require substantial Central Processing Unit (CPU) resources, making them unsuitable for deployment on edge devices. The paper highlights various strategies for optimizing deep learning applications for edge environments, such as partitioning processing tasks and standardizing intermediate resources to enhance data management and reduce latency.

Moreover, the emergence of lightweight object detectors aims to create compact and efficient networks suitable for resource-constrained settings typical of IoT deployments. Despite advancements, challenges remain in achieving sufficient detection accuracy on edge devices, particularly for high-performance applications. The paper emphasizes the role of edge computing in addressing these challenges by facilitating local data processing, thereby minimizing bandwidth usage and enhancing real-time application capabilities across various sectors, including healthcare, retail, and transportation. The authors advocate for further development of lightweight detection models to improve their applicability in diverse edge computing scenarios, ultimately aiming for enhanced performance without compromising accuracy.

Methods

The methodology for lightweight object detection models on edge devices is structured into four primary components: input, backbone, neck, and detector head. The input can be an image, patch, or pyramid, which is processed by a lightweight backbone architecture such as CSPDarkNet, ShuffleNet, MobileNet, or PeleeNet to generate feature maps. The backbone is responsible for feature extraction, converting the input into feature vectors, while the neck connects the backbone to the detector head, transforming feature maps to address various object detection challenges. The lightweight detector head, akin to a deep neural network, focuses on extracting Regions of Interest (RoIs) and utilizes pooling layers to finalize the features of detected objects, which are then classified and regressed to obtain bounding box coordinates.

The training of these models involves collaboration between edge devices and cloud servers, necessitating efficient communication to manage the transfer of model parameters and data. Techniques such as Edge Stochastic Gradient Descent (eSGD) are employed to minimize communication costs by transmitting only significant gradients. Additionally, novel architectures like Agile Condor facilitate real-time computer vision tasks at the network edge, while methods like Precog enhance mobile application performance by prefetching and caching classification requests. The ECHO framework further supports distributed data analytics in hybrid Edge-Fog-Cloud environments, providing essential services for resource management and application monitoring.

Discussion

The discussion section of the research paper emphasizes the critical role of object detection in various applications, particularly those utilizing edge devices, such as face detection and video surveillance. It highlights the advancements brought by deep learning, which significantly enhance object detection performance but also introduces challenges related to computational complexity. Specifically, deep learning-based object detectors often require substantially more Floating Point Operations (FLOPs) compared to classification networks, complicating their deployment on resource-constrained edge devices. The paper identifies a gap in the literature regarding lightweight object detection models tailored for edge applications, advocating for a comprehensive assessment of existing research to guide future developments in this area.

The section categorizes deep learning-based object detectors into two main types: two-stage and one-stage models. Two-stage models, such as Faster R-CNN, excel in accuracy but are slower due to their region proposal processes. In contrast, one-stage models, like YOLO and SSD, prioritize speed and efficiency by treating object detection as a regression problem, thus eliminating the region proposal stage. The paper also discusses advanced-stage detectors that further innovate by removing anchors and employing novel detection strategies. Lightweight models, designed for limited-resource environments, are increasingly gaining attention, with architectures like MobileNet and ShuffleNet being highlighted for their efficiency. The authors conclude by underscoring the need for continued research into lightweight object detection models, particularly in the context of edge devices, to balance performance with computational efficiency.