خوارزمية كشف الكائنات خفيفة الوزن للكشف عن أضرار الثوم Garlic-YOLO-DD: a lightweight object detection algorithm for garlic damage detection

المجلة: Frontiers in Plant Science، المجلد: 16
DOI: https://doi.org/10.3389/fpls.2025.1702045
PMID: https://pubmed.ncbi.nlm.nih.gov/41567398
تاريخ النشر: 2026-01-06
المؤلف: Yun Gao وآخرون
الموضوع الرئيسي: الزراعة الذكية والذكاء الاصطناعي

نظرة عامة

تقدم هذه البحث Garlic-YOLO-DD، وهو خوارزمية خفيفة الوزن للكشف عن الأجسام ذات المرحلة الواحدة مصممة لاكتشاف أضرار الثوم في البيئات ذات الموارد المحدودة. بناءً على إطار عمل YOLOv11n، تعالج النموذج التعقيد الحسابي العالي والبارامترات المفرطة من خلال دمج وحدة ADown لخفض العينة، والتي تقلل بشكل كبير من عدد البارامترات والحمل الحسابي. بالإضافة إلى ذلك، يعزز دمج آلية الانتباه الخالية من البارامترات SimAM قدرة النموذج على تحديد واستخراج الميزات من الآفات الدقيقة، بينما تعمل بنية BiFPN الفعالة على تحسين دمج الميزات متعددة المقاييس، مما يحسن كل من السرعة والفعالية.

تظهر النتائج التجريبية على مجموعة بيانات أضرار الثوم التي تم إنشاؤها ذاتيًا أن Garlic-YOLO-DD تحقق دقة متوسطة مذهلة (mAP) تبلغ 92.58% عند 50% تقاطع على اتحاد (IoU)، مع وجود 1.497 مليون بارامتر فقط وسرعة استدلال تبلغ 167 إطارًا في الثانية. تمثل هذه الأداء انخفاضًا بنسبة 57.96% في البارامترات وانخفاضًا بنسبة 20.63% في الحمل الحسابي مقارنةً بـ YOLOv11n، إلى جانب زيادة بنسبة 15.97% في سرعة الاستدلال. بشكل عام، يظهر Garlic-YOLO-DD مزايا كبيرة على نماذج الكشف الخفيفة الحالية، مما يوفر حلاً قابلاً للتطبيق للكشف السريع وغير المدمر عن أضرار الثوم في الأنظمة الزراعية الذكية.

مقدمة

تؤكد مقدمة الورقة على أهمية الثوم كمحصول نقدي حيوي، مشددة على أن جودته تؤثر مباشرة على ربحية الزراعة وقيمته السوقية. تشكل معالجة ما بعد الحصاد مخاطر الأضرار الميكانيكية والبيئية على بصيلات الثوم، مما يقلل من قيمتها التجارية ويزيد من سرعة التلف، مما يثير مخاوف تتعلق بسلامة الغذاء. وبالتالي، فإن الكشف السريع والدقيق عن أضرار الثوم أمر ضروري للحفاظ على جودة المنتج وتعزيز المعالجة الآلية. تظهر التطورات الأخيرة في تقنيات الكشف عن الأجسام المعتمدة على التعلم العميق وعدًا في الفحص البصري الزراعي؛ ومع ذلك، غالبًا ما تتطلب النماذج عالية الأداء تعقيدًا حسابيًا عاليًا وتحتاج إلى موارد GPU باهظة الثمن، مما يحد من قابليتها للتطبيق في البيئات ذات الموارد المحدودة.

لمعالجة هذه التحديات، يقترح المؤلفون Garlic-YOLO-DD، وهو خوارزمية مبتكرة تعتمد على بنية YOLOv11n، مصممة للكشف الفعال عن أضرار الثوم في السياقات الزراعية العملية. يتضمن النموذج وحدة ADown خفيفة الوزن لتقليل المتطلبات الحسابية وحجم النموذج، بينما تعزز وحدة الانتباه SimAM الكشف عن الأضرار الدقيقة دون إضافة بارامترات. بالإضافة إلى ذلك، يحسن تحسين شبكة دمج الميزات من خلال نموذج BiFPN الاتصالات عبر المقاييس، مما يسرع الاستدلال ويعزز أداء الكشف. يهدف هذا العمل إلى سد الفجوة بين الدقة العالية واستهلاك الموارد المنخفض في تقنيات الكشف الزراعي الذكية.

الطرق

في هذه الدراسة، تم إجراء تجارب على محطة عمل تحتوي على معالج Intel(R) Xeon(R) W-2245 و بطاقة رسومات NVIDIA Quadro RTX 5000، باستخدام بيئة برمجية تشمل Anaconda3 و PyCharm و PyTorch. تم تقييم أداء النموذج باستخدام مقاييس التقييم القياسية مثل الاسترجاع ($R$)، الدقة ($P$)، درجة F1 ($F1$)، الدقة المتوسطة ($AP$)، ومتوسط الدقة المتوسطة ($mAP$). تم تحديد تعريفات الإيجابيات الحقيقية (TP)، الإيجابيات الكاذبة (FP)، السلبيات الكاذبة (FN)، والسلبيات الحقيقية (TN) لتسهيل حساب هذه المقاييس. على وجه التحديد، يتم حساب الدقة كـ $P = \frac{TP}{TP + FP}$، والاسترجاع كـ $R = \frac{TP}{TP + FN}$، ودرجة F1 كـ $F1 = \frac{2 \cdot R \cdot P}{R + P}$، و $mAP$ كمتوسط لـ $AP$ عبر الفئات.

بالإضافة إلى ذلك، أكدت الدراسة على مقاييس التقييم الخفيفة لتقييم الكفاءة الحسابية لنماذج التعلم العميق. شملت هذه المقاييس عدد البارامترات، عمليات النقطة العائمة في الثانية (FLOPs)، حجم النموذج، والإطارات في الثانية (FPS). يعكس عدد البارامترات استهلاك الذاكرة والحمل الحسابي، بينما يقدر FLOPs عبء العمل الحسابي لكل تمريرة للأمام، مما يؤثر على استهلاك الطاقة. يتعلق حجم النموذج بمساحة التخزين المطلوبة للأوزان، وFPS تشير إلى معدل استدلال النموذج، وهو أمر حاسم للتطبيقات في الوقت الحقيقي. تم حساب جميع النتائج من تجارب الإزالة والتجارب المقارنة كمتوسط على خمسة تقييمات مستقلة لضمان الموثوقية.

النتائج

تظهر نتائج تجارب الإزالة فعالية الوحدات الثلاث المقترحة في تعزيز أداء نموذج Garlic-YOLO-DD. حقق نموذج YOLOv11n الأساسي دقة متوسطة (mAP) تبلغ 64.94% عند 50% IoU، والتي تحسنت بشكل كبير إلى 92.58% مع دمج وحدات ADown و BiFPN و SimAM. من الجدير بالذكر أن وحدة ADown لم تحسن فقط mAP@50% إلى 87.55% ولكنها أيضًا خفضت عدد بارامترات النموذج إلى 2.104 مليون وزادت من سرعة الاستدلال. كما عززت وحدة SimAM أداء التعرف بمقدار 11 نقطة مئوية، مما يظهر التأثيرات التآزرية لدمج هذه الوحدات لاستخراج الميزات ودمجها بكفاءة في الكشف عن أضرار الثوم.

كشفت التجارب المقارنة مع تسعة خوارزميات كلاسيكية للكشف عن الأجسام أن Garlic-YOLO-DD تفوقت على جميع المنافسين، محققة أعلى mAP@50% يبلغ 92.58%، متجاوزة النموذج الثاني الأفضل، YOLOv10s، بمقدار 3.52 نقطة مئوية. حافظ النموذج على بنية خفيفة الوزن مع 1.497 مليون بارامتر فقط وتكلفة حسابية تبلغ 5.0 GFLOPs، بينما حقق سرعة استدلال مثيرة للإعجاب تبلغ 167 إطارًا في الثانية. بالإضافة إلى ذلك، حددت الدراسة دقة إدخال مثالية تبلغ 640×640 بكسل لتعظيم أداء الكشف، مما يبرز حساسية النموذج لمقياس الإدخال. أدى تضمين آلية الانتباه SimAM في طبقات أعمق من شبكة العمود الفقري إلى تحسين الأداء بشكل كبير، مع تحقيق أفضل النتائج عند وضعها في الطبقة السفلية، مما يؤكد أهمية آليات الانتباه في تعزيز تمييز الميزات في الكشف عن أضرار الثوم.

المناقشة

في هذه الدراسة، طور المؤلفون نموذج Garlic-YOLO-DD للكشف عن أضرار الثوم، مستفيدين من استراتيجية قوية لجمع البيانات ومعالجتها. تم التقاط صور عالية الجودة لعينات الثوم باستخدام هاتف Honor Magic6 في بيئة غرفة مظلمة محكومة، مما يقلل من تداخل الضوء الخارجي. تتكون مجموعة البيانات من 462 صورة عالية الدقة مصنفة إلى ثلاثة أنواع من الأضرار: ثوم طبيعي، ثوم تالف جزئيًا، وثوم مع أضرار في الجذور. تم تغيير حجم الصور إلى 640×640 بكسل، وتم تطبيق تحسين الدوران العشوائي لتعزيز تعميم النموذج. تم تحسين بنية النموذج، YOLOv11n، من خلال دمج ثلاث وحدات رئيسية: وحدة خفض العينة ADown، آلية الانتباه SimAM، وبنية دمج الميزات BiFPN، والتي حسنت مجتمعة دقة الكشف مع الحفاظ على تصميم خفيف الوزن.

حقق نموذج Garlic-YOLO-DD دقة متوسطة (mAP@50%) تبلغ 92.58% ودرجة F1 تبلغ 85.84%، متفوقًا بشكل كبير على النماذج الخفيفة الحالية مع انخفاض بنسبة 42.04% في البارامترات مقارنةً بـ YOLOv11n. حافظت وحدة ADown على معلومات الميزات بفعالية أثناء خفض العينة، بينما عززت SimAM التركيز على المناطق التالفة دون زيادة العبء الحسابي. سهلت BiFPN تحسين تمثيل الميزات متعددة المقاييس، مما يعزز الكشف عبر أحجام الأضرار المتنوعة. على الرغم من نتائجها الواعدة، تعترف الدراسة بالقيود، بما في ذلك الحاجة إلى التحقق تحت ظروف متنوعة وإمكانية توسيع التعرف على أنواع الأضرار. ستركز الأبحاث المستقبلية على التنفيذ في الوقت الحقيقي في نظام ناقل للفرز الآلي، مما يظهر التطبيق العملي للنموذج في معالجة ما بعد الحصاد.

Journal: Frontiers in Plant Science, Volume: 16
DOI: https://doi.org/10.3389/fpls.2025.1702045
PMID: https://pubmed.ncbi.nlm.nih.gov/41567398
Publication Date: 2026-01-06
Author(s): Yun Gao et al.
Primary Topic: Smart Agriculture and AI

Overview

This research introduces Garlic-YOLO-DD, a lightweight single-stage object detection algorithm designed for garlic damage detection in resource-constrained environments. Building on the YOLOv11n framework, the model addresses high computational complexity and excessive parameters by incorporating the ADown downsampling module, which significantly reduces both the parameter count and computational load. Additionally, the integration of the SimAM parameter-free attention mechanism enhances the model’s ability to localize and extract features from subtle lesions, while the efficient BiFPN architecture optimizes multi-scale feature integration, improving both speed and effectiveness.

Experimental results on a self-constructed garlic damage dataset reveal that Garlic-YOLO-DD achieves a remarkable mean Average Precision (mAP) of 92.58% at 50% Intersection over Union (IoU), with only 1.497 million parameters and an inference speed of 167 frames per second. This performance represents a 57.96% reduction in parameters and a 20.63% decrease in computational load compared to YOLOv11n, alongside a 15.97% increase in inference speed. Overall, Garlic-YOLO-DD demonstrates significant advantages over existing lightweight detection models, offering a viable solution for rapid, non-destructive garlic damage detection in intelligent agricultural systems.

Introduction

The introduction of the paper emphasizes the significance of garlic as a vital cash crop, highlighting that its quality directly influences agricultural profitability and market value. Post-harvest handling poses risks of mechanical and environmental damage to garlic bulbs, which not only reduces their commercial value but also accelerates spoilage, raising food safety concerns. Consequently, the rapid and accurate detection of garlic damage is essential for maintaining product quality and enhancing automated processing. Recent advancements in deep learning-based object detection techniques show promise for agricultural visual inspection; however, existing high-performance models often entail high computational complexity and require expensive GPU resources, limiting their applicability in resource-constrained environments.

To address these challenges, the authors propose Garlic-YOLO-DD, an innovative algorithm based on the YOLOv11n architecture, designed for efficient garlic damage detection in practical agricultural contexts. The model incorporates a lightweight ADown subsampling module to reduce computational demands and model size, while the SimAM attention module enhances the detection of subtle damages without adding parameters. Additionally, the refinement of the feature fusion network through the BiFPN paradigm improves cross-scale connections, thereby accelerating inference and enhancing detection performance. This work aims to bridge the gap between high accuracy and low resource consumption in intelligent agricultural detection technologies.

Methods

In this study, experiments were conducted on a workstation featuring an Intel(R) Xeon(R) W-2245 CPU and an NVIDIA Quadro RTX 5000 GPU, utilizing a software environment that included Anaconda3, PyCharm, and PyTorch. The model’s performance was assessed using standard evaluation metrics such as recall ($R$), precision ($P$), F1 score ($F1$), average precision ($AP$), and mean average precision ($mAP$). The definitions for true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) were established to facilitate the calculation of these metrics. Specifically, precision is calculated as $P = \frac{TP}{TP + FP}$, recall as $R = \frac{TP}{TP + FN}$, F1 score as $F1 = \frac{2 \cdot R \cdot P}{R + P}$, and $mAP$ as the average of $AP$ across categories.

Additionally, the study emphasized lightweight evaluation metrics to assess the computational efficiency of deep learning models. These metrics included the number of parameters, floating-point operations per second (FLOPs), model size, and frames per second (FPS). The number of parameters reflects memory consumption and computational load, while FLOPs estimate the computational workload per forward pass, impacting energy consumption. Model size pertains to the storage space required for weights, and FPS indicates the model’s inference throughput, crucial for real-time applications. All results from the ablation and comparative experiments were averaged over five independent assessments to ensure reliability.

Results

The results of the ablation experiments demonstrate the effectiveness of the three proposed modules in enhancing the performance of the Garlic-YOLO-DD model. The baseline YOLOv11n achieved a mean Average Precision (mAP) of 64.94% at 50% IoU, which improved significantly to 92.58% with the integration of the ADown, BiFPN, and SimAM modules. Notably, the ADown module not only improved mAP@50% to 87.55% but also reduced the model’s parameters to 2.104 million and increased inference speed. The SimAM module further enhanced recognition performance by 11 percentage points, demonstrating the synergistic effects of combining these modules for efficient feature extraction and fusion in garlic damage detection.

Comparative experiments with nine classic object detection algorithms revealed that Garlic-YOLO-DD outperformed all competitors, achieving the highest mAP@50% of 92.58%, surpassing the second-best model, YOLOv10s, by 3.52 percentage points. The model maintained a lightweight architecture with only 1.497 million parameters and a computational cost of 5.0 GFLOPs, while achieving an impressive inference speed of 167 frames per second. Additionally, the study identified an optimal input resolution of 640×640 pixels for maximizing detection performance, highlighting the model’s sensitivity to input scale. The embedding of the SimAM attention mechanism at deeper layers of the backbone network significantly improved performance, with the best results achieved when placed at the bottom layer, confirming the importance of attention mechanisms in enhancing feature discrimination for garlic damage detection.

Discussion

In this study, the authors developed the Garlic-YOLO-DD model for detecting garlic damage, utilizing a robust data collection and preprocessing strategy. High-quality images of garlic samples were captured using the Honor Magic6 smartphone in a controlled darkroom environment, minimizing external light interference. The dataset comprised 462 high-resolution images categorized into three damage types: normal garlic, partially damaged garlic, and garlic with root damage. The images were resized to 640×640 pixels, and random rotation augmentation was applied to enhance model generalization. The model architecture, YOLOv11n, was optimized through the integration of three key modules: the ADown downsampling module, the SimAM attention mechanism, and the BiFPN feature fusion architecture, which collectively improved detection accuracy while maintaining a lightweight design.

The Garlic-YOLO-DD model achieved a mean Average Precision (mAP@50%) of 92.58% and an F1 score of 85.84%, significantly outperforming existing lightweight models with a 42.04% reduction in parameters compared to YOLOv11n. The ADown module effectively preserved feature information during downsampling, while SimAM enhanced focus on damaged areas without increasing computational burden. BiFPN facilitated improved multi-scale feature representation, enhancing detection across varying damage sizes. Despite its promising results, the study acknowledges limitations, including the need for validation under diverse conditions and the potential to expand damage type recognition. Future research will focus on real-time implementation in a conveyor system for automated sorting, demonstrating the model’s practical application in post-harvest processing.