رؤية السيارات ذات الكمون المنخفض باستخدام كاميرات الأحداث Low-latency automotive vision with event cameras

المجلة: Nature، المجلد: 629، العدد: 8014
DOI: https://doi.org/10.1038/s41586-024-07409-w
PMID: https://pubmed.ncbi.nlm.nih.gov/38811712
تاريخ النشر: 2024-05-29
المؤلف: Daniel Gehrig وآخرون
الموضوع الرئيسي: الذاكرة المتقدمة والحوسبة العصبية

طرق

في هذا القسم، يحدد المؤلفون المنهجيات المستخدمة في بحثهم، مع التركيز على بنية الشبكة العصبية الهجينة المصممة للكشف عن الأجسام بمعدل عالٍ. تتضمن الخطوة الأولى نظرة عامة عامة على البنية ونموذج المعالجة الخاص بها، تليها دراسة مفصلة لشبكة الجراف العصبية غير المتزامنة (GNN) وكتل الشبكة المبتكرة التي تعزز كل من الأداء والكفاءة. يعمل النموذج في وضع معالجة غير متزامن يعتمد على الأحداث، وهو أمر حاسم لوظيفته.

يصنف المؤلفون الطرق الحالية إلى عدة مجموعات: طرق متكررة كثيفة (مثل RED و ASTM-Net)، طرق تغذية أمامية كثيفة (مثل Events + RRC و YOLOv3)، طرق نبضية (مثل Spiking DenseNet)، وطرق غير متزامنة (مثل AEGNN و AsyNet). كما يقدمون طريقة هجينة تجمع بين بيانات الأحداث والصور، تُسمى Events + YOLOX، والتي تولد اكتشافات بناءً على الصور المجمعة وتوزيعات الأحداث. بالإضافة إلى ذلك، يناقشون مقايضة عرض النطاق الترددي والكمون الموجودة في الكاميرات التقليدية مقارنة بكاميرا Prophesee Gen3، التي يمكن أن تحقق عرض نطاق ترددي وكمون أقل بسبب طبيعتها غير المتزامنة. أخيرًا، يبرز المؤلفون قيود أجهزة الكشف المعتمدة على الأحداث الحالية، لا سيما فيما يتعلق بمعلومات القوام، ويقترحون نهجًا جديدًا يعالج الأحداث والإطارات بطريقة نادرة، مما يمثل تقدمًا كبيرًا في المعالجة غير المتزامنة داخل الشبكات الهجينة.

نقاش

يناقش القسم بنية وأداء شبكة الجراف العصبية غير المتزامنة العميقة المقترحة (DAGr) للكشف عن الأجسام، والتي تدمج الشبكات العصبية التلافيفية (CNNs) مع الشبكات العصبية الجرافية غير المتزامنة (GNNs). تعالج DAGr الصور والأحداث لتحقيق دقة زمنية عالية وكمون منخفض في الكشف عن الأجسام. تستخرج CNN الميزات من الصور الواردة، والتي يتم استخدامها بعد ذلك بواسطة GNN لتعزيز أداء الكشف، لا سيما في السيناريوهات التي تحتوي على محفزات أحداث محدودة. تقوم GNN ببناء رسوم بيانية زمانية مكانية من الأحداث وتعالج هذه الرسوم البيانية جنبًا إلى جنب مع ميزات الصورة من خلال سلسلة من طبقات التلافيف والتجمع، مع استخدام ابتكارات مثل طبقات بقايا الجراف وطبقة تجمع ماكس لشبكة الفوكسل المتخصصة لتحسين الكفاءة.

تظهر النتائج أن DAGr تتفوق على أحدث أجهزة الكشف المعتمدة على الأحداث والإطارات من حيث الكفاءة والدقة. بشكل محدد، تتفوق في الكشف عن الأجسام خلال الوقت الأعمى بين الإطارات، مستفيدة من الأحداث للحفاظ على أداء كشف عالٍ. تظهر الطريقة مزايا حسابية كبيرة، حيث تتطلب فقط جزءًا من العمليات مقارنة بالطرق الحالية بينما تحقق درجات دقة متوسطة تنافسية (mAP). علاوة على ذلك، فإن قدرة البنية على معالجة الأحداث والصور بشكل تكيفي يسمح لها بالتعامل بفعالية مع سيناريوهات الحركة المعقدة، مما يبرز إمكاناتها للتطبيقات في الوقت الحقيقي في البيئات الحرجة من حيث السلامة، مثل الأنظمة السيارات. قد تستكشف الأعمال المستقبلية دمج أنماط استشعار إضافية، مثل LiDAR، لتعزيز قدرات الكشف وتقليل التعقيد الحسابي.

Journal: Nature, Volume: 629, Issue: 8014
DOI: https://doi.org/10.1038/s41586-024-07409-w
PMID: https://pubmed.ncbi.nlm.nih.gov/38811712
Publication Date: 2024-05-29
Author(s): Daniel Gehrig et al.
Primary Topic: Advanced Memory and Neural Computing

Methods

In this section, the authors outline the methodologies employed in their research, focusing on a hybrid neural network architecture designed for high-rate object detection. The initial step involves a general overview of the architecture and its processing model, followed by a detailed examination of the asynchronous Graph Neural Network (GNN) and its innovative network blocks that enhance both performance and efficiency. The model operates in an asynchronous, event-based processing mode, which is crucial for its functionality.

The authors categorize existing methods into several groups: dense recurrent methods (e.g., RED and ASTM-Net), dense feedforward methods (e.g., Events + RRC and YOLOv3), spiking methods (e.g., Spiking DenseNet), and asynchronous methods (e.g., AEGNN and AsyNet). They also introduce a hybrid method that combines event and image data, termed Events + YOLOX, which generates detections based on concatenated images and event histograms. Additionally, they discuss the bandwidth-latency trade-off inherent in traditional automotive cameras compared to the Prophesee Gen3 camera, which can achieve lower bandwidth and latency due to its asynchronous nature. Finally, the authors highlight the limitations of current event-based detectors, particularly regarding texture information, and propose a novel approach that processes events and frames in a sparse manner, marking a significant advancement in asynchronous processing within hybrid networks.

Discussion

The section discusses the architecture and performance of the proposed deep asynchronous GNN (DAGr) for object detection, which integrates convolutional neural networks (CNNs) with asynchronous graph neural networks (GNNs). DAGr processes images and events to achieve high temporal resolution and low latency in object detection. The CNN extracts features from incoming images, which are then utilized by the GNN to enhance detection performance, particularly in scenarios with limited event triggers. The GNN constructs spatio-temporal graphs from events and processes these alongside image features through a series of convolutional and pooling layers, employing innovations such as graph residual layers and a specialized voxel grid max pooling layer to optimize efficiency.

The results demonstrate that DAGr outperforms state-of-the-art event and frame-based detectors in both efficiency and accuracy. Specifically, it excels in detecting objects during the blind time between frames, leveraging events to maintain high detection performance. The method exhibits significant computational advantages, requiring only a fraction of the operations compared to existing methods while achieving competitive mean average precision (mAP) scores. Furthermore, the architecture’s ability to adaptively process events and images allows it to effectively handle complex motion scenarios, underscoring its potential for real-time applications in safety-critical environments, such as automotive systems. Future work may explore integrating additional sensor modalities, like LiDAR, to further enhance detection capabilities and reduce computational complexity.

كلمات مفتاحية: استغلال، الاتصالات، التصوير الفوتوغرافي، الحوسبة في الوقت الحقيقي، الذكاء الاصطناعي، السيارات، بشر، حدث (فيزياء الجسيمات)، حساس الصورة، خوارزميات، رؤية الحاسوب، زمن الاستجابة المنخفض (أسواق المال)، زمن الانتظار (صوت)، علوم الحاسوب، عوامل الوقت، قيادة السيارات، كاميرا ذكية، معالجة الصور، بمساعدة الكمبيوتر، معدل الإطارات، نموذج لون RGB