تقييم قدرة كاشفات الأجسام المعتمدة على YOLO وtransformer للكشف عن الأعشاب الضارة في الوقت الحقيقي Assessing the capability of YOLO- and transformer-based object detectors for real-time weed detection

المجلة: Precision Agriculture، المجلد: 26، العدد: 3
DOI: https://doi.org/10.1007/s11119-025-10246-0
تاريخ النشر: 2025-05-21
المؤلف: Alicia Allmendinger وآخرون
الموضوع الرئيسي: الزراعة الذكية والذكاء الاصطناعي

نظرة عامة

تدرس الدراسة فعالية نماذج الكشف عن الأجسام المتقدمة – تحديدًا YOLOv8 و YOLOv9 و YOLOv10 و RT-DETR – في تمييز المحاصيل عن الأعشاب الضارة من أجل الزراعة الدقيقة، باستخدام مجموعة بيانات تتكون من 5611 صورة عبر 16 نوعًا من النباتات. تم استخدام مجموعتين متميزتين من البيانات: الأولى تم تدريبها على الأنواع الفردية، بينما تم تجميع الثانية للأعشاب الضارة في فئات أحادية الفلقة وثنائية الفلقة إلى جانب ثلاثة محاصيل مختارة. تكشف النتائج أن YOLOv9 ونسخه تظهر استرجاعًا قويًا ومعدل دقة متوسط (mAP)، خاصة في مجموعة البيانات 2، بينما يتفوق RT-DETR-l في الدقة، مما يجعله مناسبًا للتطبيقات التي تتطلب تقليل الإيجابيات الكاذبة. تشير أوقات الاستدلال إلى أن نماذج YOLO الأصغر هي الأسرع، مما يبرز التوازن بين حجم النموذج والدقة ومتطلبات الحوسبة.

تؤكد الاستنتاجات أنه بينما النماذج الأحدث واعدة للتطبيقات في الوقت الحقيقي، لا تزال هناك تحديات في تحقيق دقة متسقة عبر بيئات زراعية متنوعة. تؤثر عوامل مثل اختلافات الإضاءة، والانسدادات، وخلفيات التربة بشكل كبير على أداء النموذج. يجب أن تركز الأبحاث المستقبلية على تعزيز قوة النموذج من خلال مجموعات بيانات تدريب متنوعة وتقنيات التعلم التكيفية لتحسين التعميم عبر ظروف متغيرة. يجب أن يتماشى اختيار النماذج مع الاحتياجات التشغيلية؛ على سبيل المثال، يُفضل RT-DETR-l للمهام الدقيقة، بينما تكون نماذج YOLO الأصغر أكثر ملاءمة للتطبيقات في الوقت الحقيقي في البيئات الديناميكية. بالإضافة إلى ذلك، قد تسهل دمج الصور التدريبية الاصطناعية قابلية تطبيق أوسع دون الحاجة إلى جمع بيانات واسعة، على الرغم من أن التحقق من الأداء في العالم الحقيقي يبقى ضروريًا لتقييم الأداء تحت ظروف متغيرة.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الاعتماد المتزايد على المبيدات الحشرية لمكافحة الأعشاب الضارة في الزراعة، بينما تتناول أيضًا المخاوف البيئية والصحية المرتبطة بها. تؤكد على عدم كفاءة تطبيق المبيدات الحشرية بشكل موحد بسبب التوزيع غير المتساوي للأعشاب الضارة والفعالية المتفاوتة للمبيدات عبر الأنواع المختلفة. وبالتالي، هناك تحول نحو إدارة الأعشاب الضارة المحددة بالموقع (SSWM)، والتي تتطلب معرفة دقيقة بتوزيع أنواع الأعشاب الضارة وأنماط نموها. تناقش الورقة التقدم في معالجة الصور والتعلم العميق، وخاصة فعالية الشبكات العصبية التلافيفية (CNNs) وطرق الكشف عن الأجسام، في تحسين تصنيف الأعشاب الضارة للتطبيقات في الوقت الحقيقي.

تهدف الدراسة إلى مقارنة أداء النماذج المتطورة، وتحديدًا YOLO المعتمد على CNN وRT-DETR المعتمد على المحولات، في تصنيف الأعشاب الضارة تحت ظروف حقلية حقيقية، وهو ما يمثل انحرافًا عن الأبحاث السابقة التي غالبًا ما اعتمدت على البيئات المسيطر عليها. كما تسعى لتقييم أوقات الاستدلال لهذه النماذج عبر أجهزة الأجهزة المختلفة لتقييم عمليتها للزراعة الدقيقة. من خلال التركيز على هذه الجوانب، تهدف الأبحاث إلى المساهمة في ممارسات إدارة الأعشاب الضارة الأكثر كفاءة واستدامة من خلال تحسين منهجيات الكشف عن الأجسام.

طرق

تحدد الورقة البحثية الطرق المستخدمة لبناء مجموعة بيانات تجريبية وسير العمل اللاحق لتدريب النموذج وتقييمه. تتكون مجموعة البيانات من 5611 صورة تم التقاطها باستخدام كاميرا Sony Alpha 7R Mark4 في محطة أبحاث Heidfeldhof في ألمانيا، مع التركيز على ثلاثة أنواع من المحاصيل (دوار الشمس، القمح الشتوي، والذرة) واثني عشر نوعًا من الأعشاب الضارة. تم التقاط الصور على مدى 45 يومًا تحت ظروف بيئية متغيرة، مما يقدم تعقيدًا يساعد في تعزيز قوة النموذج ضد الإفراط في التكيف. تعتبر تنوع مجموعة البيانات، الناتج عن اختلافات الإضاءة، وخلفيات التربة، ومراحل نمو النباتات، أمرًا حاسمًا لضمان أن النماذج المدربة على هذه البيانات يمكن أن تعمم بشكل فعال على السيناريوهات الواقعية.

يتكون سير العمل التجريبي من ست مراحل: جمع البيانات، توضيح البيانات، إعداد البيانات، تحليل البيانات، تدريب النموذج، وتقييم الأداء. شمل تحليل البيانات تقييم توزيع الفئات لتحديد عدم التوازن المحتمل، والذي تم معالجته باستخدام استراتيجيات فقدان التركيز وزيادة البيانات من إطار عمل Ultralytics. تم تدريب النماذج على إعدادات أجهزة مختلفة، باستخدام بنى YOLO وRT-DETR مع معلمات فرعية محددة تم تحسينها من خلال دراسات أولية. تم تقييم مقاييس التقييم وأوقات الاستدلال عبر نماذج GPU وCPU المختلفة، وتم تحويل النماذج المدربة للنشر الفعال في الوقت الحقيقي باستخدام تنسيقات TensorRT وOpenVINO وONNX.

نتائج

في هذا القسم، تم تحليل أداء أربعة نماذج للكشف عن الأعشاب الضارة – YOLOv8 l و YOLOv9c و YOLOv10 l و RT-DETR-l – باستخدام مجموعة البيانات 1. تم اختيار النماذج بناءً على أحجامها القابلة للمقارنة، وتم تقييم فعاليتها باستخدام مقاييس مثل الدقة، والاسترجاع، ومتوسط الدقة (mAP) عند IoU 0.50 (mAP50)، ومتوسط الدقة عبر عتبات IoU من 0.50 إلى 0.95 (mAP50-95). تؤكد النتائج على نقاط القوة والضعف المميزة لكل نموذج في الكشف بدقة عن أنواع الأعشاب الضارة المختلفة. تتوفر النتائج التفصيلية لهذه النماذج، إلى جانب نماذج إضافية، في المواد التكميلية.

مناقشة

في قسم المناقشة، تستعرض الورقة التقدم في منهجيات الكشف عن الأجسام، مع التركيز بشكل خاص على الكاشفات ذات المرحلة الواحدة والمعتمدة على المحولات في التطبيقات الزراعية. تسلط الضوء على تحليل مقارن لمختلف النماذج، بما في ذلك YOLO والمحولات للكشف، مشددة على أن الكاشفات ذات المرحلتين، مثل Faster R-CNN، تتفوق عمومًا على الكاشفات ذات المرحلة الواحدة مثل YOLOv4 من حيث متوسط الدقة (mAP). تبني الدراسة على الأبحاث السابقة من خلال استخدام مجموعة بيانات أكبر تتكون من 16 نوعًا، مما يظهر فعالية تقنيات التدريب المتقدمة وأحدث إصدارات YOLO (حتى YOLOv10)، والتي تتضمن آليات تحسين استخراج الميزات والمعالجة. تشير النتائج إلى أنه بينما تكون نماذج YOLO فعالة للتطبيقات في الوقت الحقيقي، فإن النماذج المعتمدة على المحولات، وخاصة RT-DETR، تظهر وعدًا في تعزيز دقة الكشف عن الأجسام، على الرغم من متطلبات الحوسبة الأعلى.

تناقش المناقشة أيضًا أهمية تقنيات الرؤية الحاسوبية التقليدية في الكشف عن الأعشاب الضارة، مشيرة إلى استمرار تطبيقها في الأنظمة الروبوتية على الرغم من صعود طرق التعلم العميق. تؤكد الورقة على أهمية جودة مجموعة البيانات وتنوعها في تدريب النماذج، باستخدام تقنيات زيادة البيانات للتخفيف من عدم التوازن في الفئات. علاوة على ذلك، توضح النهج المنهجي لتعديل المعلمات الفرعية وتقييم النموذج، باستخدام مقاييس مثل الدقة، والاسترجاع، وmAP لتقييم الأداء عبر هياكل مختلفة. تختتم التحليل بالقول إنه بينما هيمنت نماذج YOLO على المجال، فإن دمج الأساليب المعتمدة على المحولات يمكن أن يعزز بشكل كبير من قدرات المراقبة الزراعية في الوقت الحقيقي واتخاذ القرار.

Journal: Precision Agriculture, Volume: 26, Issue: 3
DOI: https://doi.org/10.1007/s11119-025-10246-0
Publication Date: 2025-05-21
Author(s): Alicia Allmendinger et al.
Primary Topic: Smart Agriculture and AI

Overview

The study investigates the efficacy of advanced object detection models—specifically YOLOv8, YOLOv9, YOLOv10, and RT-DETR—in differentiating crops from weeds for precision agriculture, utilizing a dataset of 5611 images across 16 plant species. Two distinct datasets were employed: the first trained on individual species, while the second grouped weeds into monocotyledonous and dicotyledonous categories alongside three selected crops. Results reveal that YOLOv9 and its variants demonstrate strong recall and mean Average Precision (mAP) metrics, particularly in dataset 2, while RT-DETR-l excels in precision, making it suitable for applications where minimizing false positives is crucial. Inference times indicate that smaller YOLO models are the fastest, highlighting a trade-off between model size, accuracy, and computational demands.

The conclusions emphasize that while the latest models are promising for real-time applications, challenges remain in achieving consistent accuracy across diverse agricultural settings. Factors such as lighting variations, occlusions, and soil backgrounds significantly impact model performance. Future research should focus on enhancing model robustness through diverse training datasets and adaptive learning techniques to improve generalization across varying conditions. The selection of models should align with operational needs; for instance, RT-DETR-l is preferable for precision tasks, while smaller YOLO models are better suited for real-time applications in dynamic environments. Additionally, the integration of synthetic training images may facilitate broader applicability without extensive data collection, although real-world validation remains essential for assessing performance under variable conditions.

Introduction

The introduction of this research paper highlights the increasing reliance on herbicides for weed control in agriculture, while also addressing the associated environmental and health concerns. It emphasizes the inefficiencies of uniform herbicide application due to the uneven distribution of weeds and the varying efficacy of herbicides across different species. Consequently, there is a shift towards site-specific weed management (SSWM), which necessitates precise knowledge of weed species distribution and growth patterns. The paper discusses the advancements in image processing and deep learning, particularly the effectiveness of convolutional neural networks (CNNs) and object detection methods, in enhancing weed classification for real-time applications.

The study aims to compare the performance of state-of-the-art models, specifically CNN-based YOLO and transformer-based RT-DETR, in weed classification under real field conditions, which is a departure from previous research that often relied on controlled environments. It also seeks to evaluate the inference times of these models across various hardware devices to assess their practicality for precision agriculture. By focusing on these aspects, the research intends to contribute to more efficient and sustainable weed management practices through improved object detection methodologies.

Methods

The research paper outlines the methods used to construct an experimental dataset and the subsequent workflow for model training and evaluation. The dataset comprises 5611 images captured using a Sony Alpha 7R Mark4 camera at the Heidfeldhof Research Station in Germany, focusing on three crop species (sunflower, winter wheat, and maize) and twelve weed species. Images were taken over 45 days under varying environmental conditions, which introduces complexity that aids in enhancing model robustness against overfitting. The dataset’s variability, stemming from differences in lighting, soil backgrounds, and plant growth stages, is critical for ensuring that models trained on this data can generalize effectively to real-world scenarios.

The experimental workflow consists of six stages: Data Collection, Data Annotation, Data Preparation, Data Analysis, Model Training, and Performance Evaluation. Data analysis included assessing class distribution to identify potential imbalances, which were addressed using the Ultralytics framework’s focal loss and data augmentation strategies. The models were trained on two different hardware setups, utilizing YOLO and RT-DETR architectures with specific hyperparameters optimized through preliminary studies. Evaluation metrics and inference times were assessed across various GPU and CPU models, and trained models were converted for efficient real-time deployment using TensorRT, OpenVINO, and ONNX formats.

Results

In this section, the performance of four weed detection models—YOLOv8 l, YOLOv9c, YOLOv10 l, and RT-DETR-l—was analyzed using dataset 1. The models were selected based on their comparable sizes, and their effectiveness was evaluated using metrics such as precision, recall, mean Average Precision at IoU 0.50 (mAP50), and mean Average Precision across IoU thresholds from 0.50 to 0.95 (mAP50-95). The results underscore the distinct strengths and weaknesses of each model in accurately detecting various weed species. Detailed results for these models, along with additional models, are available in the Supplementary Material.

Discussion

In the discussion section, the paper reviews the advancements in object detection methodologies, particularly focusing on one-stage and transformer-based detectors in agricultural applications. It highlights a comparative analysis of various models, including YOLO and Detection Transformers, emphasizing that two-stage detectors, such as Faster R-CNN, generally outperform one-stage detectors like YOLOv4 in terms of mean Average Precision (mAP). The study builds on previous research by utilizing a larger dataset of 16 species, demonstrating the effectiveness of advanced training techniques and the latest YOLO versions (up to YOLOv10), which incorporate improved feature extraction and processing mechanisms. The findings indicate that while YOLO models are efficient for real-time applications, transformer-based models, particularly RT-DETR, show promise in enhancing object detection accuracy, albeit with higher computational demands.

The discussion also addresses the relevance of traditional computer vision techniques in weed detection, noting their continued application in robotic systems despite the rise of deep learning methods. The paper underscores the importance of dataset quality and diversity in training models, employing data augmentation techniques to mitigate class imbalances. Furthermore, it details the systematic approach to hyperparameter tuning and model evaluation, employing metrics such as precision, recall, and mAP to assess performance across different architectures. The analysis concludes that while YOLO models have dominated the field, the integration of transformer-based approaches could significantly enhance real-time agricultural monitoring and decision-making capabilities.