FBRT-YOLO: أسرع وأفضل لاكتشاف الصور الجوية في الوقت الحقيقي FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

المجلة: Proceedings of the AAAI Conference on Artificial Intelligence، المجلد: 39، العدد: 8
DOI: https://doi.org/10.1609/aaai.v39i8.32937
تاريخ النشر: 2025-04-11
المؤلف: Yao Xiao وآخرون
الموضوع الرئيسي: معالجة الصور الفضائية والقياسات التصويرية

نظرة عامة

تقدم هذه القسم نظرة عامة على البحث حول FBRT-YOLO، وهي عائلة جديدة من الكاشفات في الوقت الحقيقي المصممة لاكتشاف الصور الجوية، مع التركيز بشكل خاص على اكتشاف الأهداف الصغيرة. يحدد البحث التحديات الحالية في تحسين دقة وكفاءة الاكتشاف، والتي تعيق التقدم في التطبيقات في الوقت الحقيقي. لمعالجة هذه القضايا، يتضمن FBRT-YOLO وحدتين خفيفتين: وحدة تكامل الميزات التكميلية (FCM) ووحدة الإدراك متعددة النواة (MKP). تعزز FCM دمج المعلومات المكانية مع التفاصيل الدلالية، مما يحسن تحديد مواقع الأهداف الصغيرة. في الوقت نفسه، تستخدم MKP عمليات الالتفاف بأحجام نواة متغيرة لالتقاط العلاقات بين الأهداف ذات المقاييس المختلفة بشكل أفضل.

تظهر النتائج من التجارب الواسعة التي أجريت على ثلاثة مجموعات بيانات بارزة للصور الجوية—VisDrone وUAVDT وAI-TOD—أن FBRT-YOLO يتفوق بشكل كبير على الكاشفات الحالية في الوقت الحقيقي من حيث الأداء والسرعة. لا يحسن الأسلوب المقترح فقط اكتشاف الأهداف الصغيرة ولكنه يقلل أيضًا من التكرار المعتاد في الكاشفات التقليدية، مما يسرع من كفاءة الشبكة العامة. تم توفير كود FBRT-YOLO لمزيد من البحث والتطبيق.

مقدمة

تناقش مقدمة الورقة التحديات التي تواجه الشبكات العصبية العميقة في اكتشاف الأجسام داخل الصور الجوية عالية الدقة، خاصة عند نشرها على معدات الطيران ذات الموارد المحدودة. تشمل القضايا الرئيسية اكتشاف الأجسام الصغيرة أو المخفية والحاجة إلى أداء في الوقت الحقيقي دون المساس بالدقة. غالبًا ما تزيد الطرق التقليدية من دقة الصورة لتعزيز الاكتشاف ولكن على حساب الكفاءة الحاسوبية. يبرز المؤلفون عدم التوافق بين المعلومات الدلالية منخفضة الدقة من الشبكات العميقة والمعلومات المكانية عالية الدقة من الشبكات الضحلة، مما يعقد تحديد مواقع الأجسام الصغيرة.

لمعالجة هذه التحديات، يقترح المؤلفون بنية شبكة جديدة تُسمى FBRT-YOLO، والتي تدمج وحدتين خفيفتين: وحدة تكامل الميزات التكميلية (FCM) ووحدة الإدراك متعددة النواة (MKP). تعزز FCM دمج المعلومات المكانية والدلالية من خلال ترميز الموقع المكاني في متجهات عالية الأبعاد، مما يحسن توافق الميزات وتحديد مواقع الأجسام الصغيرة. تم تصميم MKP، التي تهدف إلى استبدال طبقة تقليل الحجم النهائية، لاستخدام نوى الالتفاف بأحجام متغيرة لتعزيز تمثيل الميزات متعددة المقاييس. تُظهر التجارب الواسعة على معايير الصور الجوية أن FBRT-YOLO يتفوق بشكل كبير على نماذج YOLO الحالية، محققًا توازنًا ملائمًا بين الحمل الحاسوبي ودقة الاكتشاف. تشمل مساهمات هذا العمل تقديم عائلة جديدة من الكاشفات في الوقت الحقيقي، وFCM لتحسين مطابقة الميزات، وMKP لتعزيز الإدراك متعدد المقاييس.

النتائج

يقدم قسم النتائج تقييمًا شاملاً لنموذج FBRT-YOLO عبر مجموعات بيانات متعددة، مما يظهر أدائه المتفوق مقارنة بالكاشفات الحالية للأجسام في الوقت الحقيقي. في مجموعة بيانات Visdrone، تظهر نماذج FBRT-YOLO تقليلًا كبيرًا في عدد المعلمات وGFLOPs بينما تحقق دقة متوسطة محسنة (AP) عبر مقاييس نموذج مختلفة. على وجه التحديد، تقلل النماذج الصغيرة (FBRT-YOLO-N/S) المعلمات بنسبة 72% و74%، على التوالي، مع تحسينات في AP بنسبة 0.6% و2.3%. تُظهر النماذج المتوسطة (FBRT-YOLO-M) انخفاضًا بنسبة 26% و23% في GFLOPs مقارنة بـ YOLOv8-M وYOLOv9-M، على التوالي، جنبًا إلى جنب مع تحسينات في AP بنسبة 1.3% و1.2%. بالنسبة للنماذج الكبيرة، تحقق FBRT-YOLO-X عددًا أقل من المعلمات بنسبة 66% و23% مقارنة بـ YOLOv8-X وYOLOv10-X، مع تحسينات في AP بنسبة 1.2% و1.4%. بالإضافة إلى ذلك، يتفوق FBRT-YOLO-M/L على RT-DETR-R34/R50 من حيث المعلمات وGFLOPs وسرعة الاكتشاف والأداء.

في مجموعة بيانات UAVDT، تحقق FBRT-YOLO دقة AP تبلغ 18.4%، متجاوزًا طرقًا مثل GLSAN وCEASC، مما يؤكد فعالية إطار الاكتشاف المقترح. تسلط دراسة الإزالة الضوء على مساهمات المكونات المختلفة، بما في ذلك وحدة RR التي تهدف إلى تقليل التكرار، مما يعزز أداء الاكتشاف في الخلفيات المعقدة. أخيرًا، تكشف التقييمات على مجموعة بيانات AI-TOD أن FBRT-YOLO يقلل عدد المعلمات بنسبة 74% وGFLOPs بنسبة 20%، بينما يحقق زيادة بنسبة 2.2% في AP50 وزيادة بنسبة 1.1% في AP، مما يبرز فعاليته في اكتشاف الأجسام الصغيرة. بشكل عام، تؤكد هذه النتائج مكانة FBRT-YOLO ككاشف رائد للصور الجوية في الوقت الحقيقي.

المناقشة

في مناقشة ورقة البحث، يبرز المؤلفون قيود الكاشفات الحالية للأجسام في الوقت الحقيقي، مثل YOLO وFCOS، خاصة في سياقات الصور الجوية عالية الدقة. يقدمون FBRT-YOLO، وهو كاشف متخصص يتفوق بشكل كبير على النماذج الحالية في اكتشاف الأجسام الصغيرة في الصور الجوية. تؤكد الورقة على تحديات اكتشاف الأجسام الصغيرة وتراجع مجموعة متنوعة من الأساليب، بما في ذلك ClusDet وDM-Net، والتي، على الرغم من فعاليتها، غالبًا ما تتنازل عن سرعة الاستدلال والكفاءة. يقترح المؤلفون تحسينات من خلال وحدتين رئيسيتين: وحدة تكامل الميزات التكميلية (FCM) ووحدة الإدراك متعددة النواة (MKP). تدمج FCM المعلومات المكانية مع الميزات الدلالية لتحسين تمثيل الأجسام الصغيرة، بينما تستخدم MKP نوى الالتفاف متعددة المقاييس لالتقاط المعلومات السياقية عبر أحجام الأجسام المختلفة.

يتناول المؤلفون أيضًا التكرار في النماذج التقليدية للاكتشاف، والتي ليست مناسبة جيدًا للصور عالية الدقة. يقترحون نهجًا مفصولًا لتقليل الحجم المكاني وتوسيع القناة، مما يؤدي إلى تصميم شبكة أكثر كفاءة. تُظهر النتائج التجريبية أن FBRT-YOLO يحقق تقليلًا كبيرًا في المعلمات والحمل الحاسوبي مع الحفاظ على دقة تنافسية. تختتم الورقة بأن FBRT-YOLO يمثل تقدمًا كبيرًا في اكتشاف الصور الجوية في الوقت الحقيقي، حيث يوازن بين الدقة والكفاءة بشكل فعال، كما يتضح من الاختبارات الواسعة على مجموعات بيانات متعددة، بما في ذلك VisDrone وUAVDT وAI-TOD.

Journal: Proceedings of the AAAI Conference on Artificial Intelligence, Volume: 39, Issue: 8
DOI: https://doi.org/10.1609/aaai.v39i8.32937
Publication Date: 2025-04-11
Author(s): Yao Xiao et al.
Primary Topic: Satellite Image Processing and Photogrammetry

Overview

The section presents an overview of the research on FBRT-YOLO, a novel family of real-time detectors designed for aerial image detection, particularly focusing on small target detection. The paper identifies existing challenges in optimizing detection accuracy and efficiency, which hinder advancements in real-time applications. To address these issues, FBRT-YOLO incorporates two lightweight modules: the Feature Complementary Mapping Module (FCM) and the Multi-Kernel Perception Unit (MKP). FCM enhances the integration of spatial positional information with semantic details, thereby improving the localization of small targets. Meanwhile, MKP employs convolutions with varying kernel sizes to better capture relationships among targets of different scales.

The findings from extensive experiments conducted on three prominent aerial image datasets—VisDrone, UAVDT, and AI-TOD—demonstrate that FBRT-YOLO significantly outperforms existing real-time detectors in both performance and speed. The proposed method not only improves the detection of small targets but also reduces redundancies typical in conventional detectors, thus accelerating the overall network efficiency. The code for FBRT-YOLO is made available for further research and application.

Introduction

The introduction of the paper discusses the challenges faced by deep neural networks in object detection within high-resolution aerial images, particularly when deployed on resource-constrained flight equipment. Key issues include the detection of small or obscured objects and the need for real-time performance without compromising accuracy. Traditional methods often increase image resolution to enhance detection but at the cost of computational efficiency. The authors highlight the mismatch between low-resolution semantic information from deep networks and high-resolution spatial information from shallow networks, which complicates small object localization.

To address these challenges, the authors propose a novel network architecture named FBRT-YOLO, which integrates two lightweight modules: the Feature Complementary Mapping Module (FCM) and the Multi-Kernel Perception Unit (MKP). The FCM enhances the integration of spatial and semantic information by encoding spatial location into high-dimensional vectors, thereby improving feature alignment and localization of small objects. The MKP, designed to replace the final downsampling layer, employs convolutional kernels of varying sizes to enhance multi-scale feature representation. Extensive experiments on aerial image benchmarks demonstrate that FBRT-YOLO significantly outperforms existing YOLO models, achieving a favorable balance between computational load and detection accuracy. The contributions of this work include the introduction of a new family of real-time detectors, the FCM for improved feature matching, and the MKP for enhanced multi-scale perception.

Results

The results section presents a comprehensive evaluation of the FBRT-YOLO model across multiple datasets, demonstrating its superior performance compared to existing real-time object detectors. In the Visdrone dataset, FBRT-YOLO models exhibit significant reductions in parameter counts and GFLOPs while achieving improved average precision (AP) across various model scales. Specifically, the small models (FBRT-YOLO-N/S) reduce parameters by 72% and 74%, respectively, with AP improvements of 0.6% and 2.3%. Medium models (FBRT-YOLO-M) show a 26% and 23% reduction in GFLOPs compared to YOLOv8-M and YOLOv9-M, respectively, alongside AP enhancements of 1.3% and 1.2%. For large models, FBRT-YOLO-X achieves 66% and 23% fewer parameters than YOLOv8-X and YOLOv10-X, with AP improvements of 1.2% and 1.4%. Additionally, FBRT-YOLO-M/L outperforms RT-DETR-R34/R50 in terms of parameters, GFLOPs, detection speed, and performance.

On the UAVDT dataset, FBRT-YOLO achieves an AP of 18.4%, surpassing methods such as GLSAN and CEASC, further validating the effectiveness of the proposed detection framework. The ablation study highlights the contributions of various components, including the RR module aimed at reducing redundancy, which enhances detection performance in complex backgrounds. Finally, evaluations on the AI-TOD dataset reveal that FBRT-YOLO reduces parameter counts by 74% and GFLOPs by 20%, while achieving a 2.2% increase in AP50 and a 1.1% increase in AP, underscoring its efficacy in detecting small objects. Overall, these results affirm FBRT-YOLO’s status as a leading real-time aerial image detector.

Discussion

In the discussion of the research paper, the authors highlight the limitations of existing real-time object detectors, such as YOLO and FCOS, particularly in high-resolution aerial image contexts. They introduce FBRT-YOLO, a specialized detector that significantly outperforms current models in detecting small objects in aerial imagery. The paper emphasizes the challenges of small object detection and reviews various approaches, including ClusDet and DM-Net, which, while effective, often compromise on inference speed and efficiency. The authors propose enhancements through two core modules: the Feature Complementary Mapping Module (FCM) and the Multi-Kernel Perception Unit (MKP). FCM integrates spatial positional information with semantic features to improve small object representation, while MKP employs multi-scale convolutional kernels to capture contextual information across different object sizes.

The authors also address the redundancy in traditional detection models, which are not well-suited for high-resolution images. They propose a decoupled approach for spatial downsampling and channel expansion, leading to a more efficient network design. Experimental results demonstrate that FBRT-YOLO achieves a significant reduction in parameters and computational load while maintaining competitive accuracy. The paper concludes that FBRT-YOLO represents a substantial advancement in real-time aerial image detection, balancing accuracy and efficiency effectively, as evidenced by extensive testing on multiple datasets, including VisDrone, UAVDT, and AI-TOD.