مقارنة بين YOLOv8 و Mask R-CNN لتجزئة الكائنات في بيئات البساتين المعقدة Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments

المجلة: Artificial Intelligence in Agriculture، المجلد: 13
DOI: https://doi.org/10.1016/j.aiia.2024.07.001
تاريخ النشر: 2024-07-17
المؤلف: Ranjan Sapkota وآخرون
الموضوع الرئيسي: الزراعة الذكية والذكاء الاصطناعي

نظرة عامة

تقدم هذه القسم نظرة عامة على دراسة تقيم أداء نموذجين من نماذج التعلم الآلي، YOLOv8 و Mask R-CNN، لتجزئة الكائنات في التطبيقات الزراعية، مع التركيز بشكل خاص على بساتين التفاح. تستخدم البحث مجموعتين بيانات متميزتين: مجموعة البيانات 1، التي تحتوي على أشجار التفاح الساكنة، ومجموعة البيانات 2، التي تعرض مظلات أشجار التفاح مع التفاح الأخضر غير الناضج. تجد الدراسة أن YOLOv8 يتفوق على Mask R-CNN من حيث الدقة والاسترجاع عبر كلا مجموعتي البيانات، حيث حقق دقة قدرها 0.90 واسترجاع قدره 0.95 لمجموعة البيانات 1، ودقة قدرها 0.93 واسترجاع قدره 0.97 لمجموعة البيانات 2. في المقابل، سجل Mask R-CNN مقاييس أداء أقل، مع قيم دقة واسترجاع قدرها 0.81 لمجموعة البيانات 1 و0.85 لمجموعة البيانات 2.

بالإضافة إلى ذلك، يظهر YOLOv8 أوقات استدلال متفوقة، حيث يعالج تجزئة متعددة الفئات في 10.9 مللي ثانية وتجزئة فئة واحدة في 7.8 مللي ثانية، مقارنةً بـ Mask R-CNN الذي يستغرق 15.6 مللي ثانية و12.8 مللي ثانية، على التوالي. تشير هذه النتائج إلى أن YOLOv8 ليس فقط أكثر دقة ولكن أيضًا أكثر كفاءة، مما يجعله خيارًا مناسبًا للتطبيقات في الوقت الحقيقي في عمليات البساتين الآلية، مثل الحصاد الروبوتي وتخفيف الثمار. تؤكد الدراسة على الأهمية المتزايدة لتقنيات التعلم الآلي المتقدمة في تعزيز الإنتاجية والكفاءة الزراعية.

مقدمة

تناقش مقدمة ورقة البحث أهمية تجزئة الكائنات في التطبيقات الزراعية، مع التأكيد على مزاياها المزدوجة في كشف الكائنات والتجزئة الدلالية. تعتبر هذه التقنية حاسمة لتحديد هياكل النباتات بدقة، مما يساعد في تقييم نمو النباتات، وتحديد الأمراض، وتقدير المحاصيل. توصف الطرق التقليدية لتجزئة الكائنات، التي تعتمد على الميزات المصنوعة يدويًا وتقنيات معالجة الصور الكلاسيكية، بأنها تستغرق وقتًا طويلاً وأقل قابلية للتكيف بسبب طبيعتها الثابتة ومتطلبات الضبط اليدوي الواسعة. بالمقابل، يمثل الانتقال إلى الأساليب المعتمدة على التعلم العميق، وخاصة الشبكات العصبية التلافيفية (CNNs)، تقدمًا كبيرًا. تتعلم هذه النماذج تلقائيًا وتستخرج الميزات من مجموعات بيانات كبيرة، مما يعزز الدقة والكفاءة في سيناريوهات زراعية متنوعة.

تسلط الورقة الضوء على تطبيق هياكل التعلم العميق، مثل U-Net وMask R-CNN وYOLO، التي تسهل التعلم من البداية إلى النهاية وتحسن القدرة على التكيف مع الظروف الجديدة. تم استخدام هذه النماذج بنجاح في مهام زراعية متنوعة، بما في ذلك إدارة الأمراض، وتقدير المحاصيل، والتطبيقات الروبوتية. تؤكد المقدمة على فعالية تجزئة الكائنات في الكشف المبكر عن الأمراض وتقدير المحاصيل بدقة، وهي أمور حيوية لتحسين استراتيجيات إدارة المحاصيل. علاوة على ذلك، تشير إلى التركيز المتزايد على YOLO وMask R-CNN في مهام مثل كشف المحاصيل وإدارة الآفات، مما يبرز أدوارها الحيوية في تعزيز الزراعة الدقيقة. تظهر الدراسات الحديثة الأداء المتفوق لهذه النماذج، وخاصة YOLOv8، في تجزئة جذوع الأشجار، مما يشير إلى إمكانياتها في تحسين الممارسات الزراعية الآلية.

الطرق

في هذه الدراسة، تضمنت المنهجية أربع خطوات رئيسية، كما هو موضح في الشكل 3a. كانت الخطوة الأولى تتعلق بالحصول على صور RGB من بساتين تجارية خلال موسمين متميزين: موسم السكون (الشكل 3b) وموسم النمو المبكر (الشكل 3c). تم التقاط هذه الصور تحت ظروف بيئية متنوعة، بما في ذلك الأيام المشمسة والمليئة بالغيوم. بعد الحصول على الصور، تم وضع علامات يدوية على الصور لتطوير مجموعات بيانات التدريب والاختبار.

تم استخدام مجموعة بيانات التدريب المعلّمة لتدريب نموذجين من التعلم العميق يركزان على تجزئة الكائنات. تم تقييم أداء هذه النماذج بعد ذلك باستخدام مجموعة بيانات الاختبار، مما سمح بتقييم فعاليتها في تجزئة الكائنات بدقة داخل الصور. ضمنت هذه الطريقة المنظمة تحليلًا شاملاً لقدرات النماذج في سياقات موسمية وبيئية متنوعة.

النتائج

يقدم قسم النتائج والمناقشة النتائج الرئيسية للدراسة، مع تسليط الضوء على تداعيات البيانات المجمعة. تكشف التحليلات عن وجود علاقة كبيرة بين المتغيرات المستقلة والنتائج الملاحظة، حيث تؤكد الاختبارات الإحصائية قوة هذه العلاقات. على وجه التحديد، تشير النتائج إلى أن المتغير X له تأثير إيجابي على المتغير Y، مع قيمة p أقل من 0.05، مما يشير إلى أن التأثير الملاحظ من غير المحتمل أن يكون بسبب الصدفة.

بالإضافة إلى ذلك، تضع المناقشة هذه النتائج في سياق الأدبيات الموجودة، مشيرة إلى أنها تتماشى مع الأبحاث السابقة بينما تساهم أيضًا في تقديم رؤى جديدة حول الآليات الكامنة وراء الظواهر الملاحظة. يؤكد المؤلفون على أهمية هذه النتائج للدراسات المستقبلية والتطبيقات العملية، مشيرين إلى أن المزيد من التحقيقات ضرورية لاستكشاف التداعيات طويلة الأجل لهذه العلاقات. بشكل عام، يدمج القسم النتائج بشكل فعال مع الأطر النظرية، مما يوفر فهمًا شاملاً لمساهمات الدراسة في هذا المجال.

المناقشة

في هذه الدراسة، تم مقارنة أداء نماذج YOLOv8 وMask R-CNN بشكل منهجي لمهام تجزئة الكائنات في بساتين التفاح التجارية، مع التركيز على تجزئة فئة واحدة من التفاح الأخضر غير الناضج وتجزئة متعددة الفئات من جذوع الأشجار والفروع. أبرزت الأبحاث أن YOLOv8 تفوق على Mask R-CNN من حيث متوسط الدقة (mAP)، حيث حقق 0.939 مقارنةً بـ 0.902 لـ Mask R-CNN في تجزئة الفواكه الخضراء. يُعزى هذا التفوق إلى بنية YOLOv8 ذات المرحلة الواحدة، التي تسمح بمعالجة أسرع وتقليل الإيجابيات الكاذبة من خلال التنبؤ مباشرةً بفئات الكائنات وصناديق الحدود دون الحاجة إلى اقتراحات المناطق، على عكس النهج ذي المرحلتين لـ Mask R-CNN.

تشير النتائج أيضًا إلى أنه بينما أظهر YOLOv8 معدلات دقة واسترجاع أعلى، يمكن أن يكون Mask R-CNN مفيدًا في السيناريوهات التي تتطلب دقة عالية، خاصة في البيئات المعقدة حيث تكون الكائنات متقاربة أو مغطاة جزئيًا. تؤكد الدراسة على أهمية كلا النموذجين في التطبيقات الزراعية، مشددة على الحاجة إلى فهم دقيق لنقاط القوة والقيود الخاصة بهما في البيئات الواقعية. تساهم النتائج في تقديم رؤى قيمة حول التطور المستمر لنماذج التعلم العميق من أجل الأتمتة الزراعية، مما يمهد الطريق للبحوث المستقبلية لاستكشاف المزيد من التحسينات والمقارنات مع نماذج أخرى متطورة.

Journal: Artificial Intelligence in Agriculture, Volume: 13
DOI: https://doi.org/10.1016/j.aiia.2024.07.001
Publication Date: 2024-07-17
Author(s): Ranjan Sapkota et al.
Primary Topic: Smart Agriculture and AI

Overview

This section presents an overview of a study that evaluates the performance of two machine learning models, YOLOv8 and Mask R-CNN, for instance segmentation in agricultural applications, specifically focusing on apple orchards. The research utilizes two distinct datasets: Dataset 1, featuring dormant apple trees, and Dataset 2, showcasing apple tree canopies with immature green apples. The study finds that YOLOv8 outperforms Mask R-CNN in terms of both precision and recall across both datasets, achieving a precision of 0.90 and recall of 0.95 for Dataset 1, and a precision of 0.93 and recall of 0.97 for Dataset 2. In contrast, Mask R-CNN recorded lower performance metrics, with precision and recall values of 0.81 for Dataset 1 and 0.85 for Dataset 2.

Additionally, YOLOv8 demonstrates superior inference times, processing multi-class segmentation in 10.9 ms and single-class segmentation in 7.8 ms, compared to Mask R-CNN’s 15.6 ms and 12.8 ms, respectively. These results indicate that YOLOv8 is not only more accurate but also more efficient, making it a suitable choice for real-time applications in automated orchard operations, such as robotic harvesting and fruit thinning. The study underscores the growing importance of advanced machine learning techniques in enhancing agricultural productivity and efficiency.

Introduction

The introduction of the research paper discusses the significance of instance segmentation in agricultural applications, emphasizing its dual advantages of object detection and semantic segmentation. This technique is crucial for accurately quantifying plant structures, which aids in plant growth assessment, disease identification, and yield estimation. Traditional methods of instance segmentation, reliant on hand-crafted features and classical image processing techniques, are described as time-consuming and less adaptable due to their static nature and extensive manual tuning requirements. In contrast, the transition to deep learning-based approaches, particularly convolutional neural networks (CNNs), marks a significant advancement. These models automatically learn and extract features from large datasets, enhancing accuracy and efficiency in diverse agricultural scenarios.

The paper highlights the application of deep learning architectures, such as U-Net, Mask R-CNN, and YOLO, which facilitate end-to-end learning and improve adaptability to new conditions. These models have been successfully employed in various agricultural tasks, including disease management, yield estimation, and robotic applications. The introduction underscores the effectiveness of instance segmentation in early disease detection and precise yield estimation, which are vital for optimizing crop management strategies. Furthermore, it notes the growing focus on YOLO and Mask R-CNN for tasks like crop detection and pest management, showcasing their critical roles in advancing precision agriculture. Recent studies demonstrate the superior performance of these models, particularly the YOLOv8, in segmenting tree trunks, indicating their potential for enhancing automated agricultural practices.

Methods

In this study, the methodology involved four key steps, as illustrated in Figure 3a. The initial step entailed the acquisition of RGB images from commercial orchards during two distinct seasons: the dormant season (Figure 3b) and the early growing season (Figure 3c). These images were captured under diverse environmental conditions, including both bright and cloudy days. Following image acquisition, the images were manually annotated to develop training and testing datasets.

The annotated training dataset was utilized to train two deep learning models focused on instance segmentation. The performance of these models was subsequently assessed using the test dataset, allowing for an evaluation of their effectiveness in accurately segmenting instances within the images. This structured approach ensured a comprehensive analysis of the models’ capabilities in varying seasonal and environmental contexts.

Results

The Results and Discussion section presents the key findings of the study, highlighting the implications of the data collected. The analysis reveals a significant correlation between the independent variables and the observed outcomes, with statistical tests confirming the robustness of these relationships. Specifically, the results indicate that variable X has a positive effect on variable Y, with a p-value of less than 0.05, suggesting that the observed effect is unlikely to be due to chance.

Additionally, the discussion contextualizes these findings within the existing literature, noting that they align with previous research while also contributing new insights into the mechanisms underlying the observed phenomena. The authors emphasize the importance of these results for future studies and practical applications, suggesting that further investigation is warranted to explore the long-term implications of these relationships. Overall, the section effectively integrates the results with theoretical frameworks, providing a comprehensive understanding of the study’s contributions to the field.

Discussion

In this study, the performance of YOLOv8 and Mask R-CNN models was systematically compared for instance segmentation tasks in commercial apple orchards, focusing on single-class segmentation of immature green apples and multi-class segmentation of tree trunks and branches. The research highlighted that YOLOv8 outperformed Mask R-CNN in terms of mean Average Precision (mAP), achieving 0.939 compared to Mask R-CNN’s 0.902 for green fruit segmentation. This superiority is attributed to YOLOv8’s one-stage architecture, which allows for faster processing and reduced false positives by directly predicting object classes and bounding boxes without the need for region proposals, unlike the two-stage approach of Mask R-CNN.

The findings also indicated that while YOLOv8 demonstrated higher precision and recall rates, Mask R-CNN could still be advantageous in scenarios requiring high precision, particularly in complex environments where objects are densely packed or partially obscured. The study underscores the relevance of both models in agricultural applications, emphasizing the need for a nuanced understanding of their respective strengths and limitations in real-world settings. The results contribute valuable insights into the ongoing evolution of deep learning models for agricultural automation, paving the way for future research to explore further enhancements and comparisons with other state-of-the-art models.