ViT-SENet-Tom: شبكة جديدة هجينة قائمة على التعلم الآلي لتصنيف ثمار الطماطم باستخدام شبكة ضغط-تحفيز وتحويل الرؤية ViT-SENet-Tom: machine learning-based novel hybrid squeeze–excitation network and vision transformer framework for tomato fruits classification

المجلة: Neural Computing and Applications، المجلد: 37، العدد: 9
DOI: https://doi.org/10.1007/s00521-025-10973-5
تاريخ النشر: 2025-01-20
المؤلف: S M Masfequier Rahman Swapno وآخرون
الموضوع الرئيسي: الطيفية والتحليلات الكيميائية

نظرة عامة

تقدم ورقة البحث إطار عمل ViT-SENet-Tom، وهو نموذج جديد للتعلم الآلي مصمم للتصنيف السريع والدقيق للطماطم إلى ثلاث فئات: ناضجة، غير ناضجة، ورفض. يدمج الإطار محول رؤية هجين (ViT) مع كتلة ضغط وتحفيز (SENet)، مما يعزز كفاءة التصنيف من خلال هياكل الشبكات العصبية المتقدمة. تم معالجة التحديات الأولية مع مجموعة بيانات صغيرة باستخدام تقنيات التAugmentation، مما أسفر عن دقة تدريب تبلغ 99.87% ودقة تحقق تبلغ 93.87%. ومن الجدير بالذكر أن النموذج حقق دقة قصوى تبلغ 99.90% خلال التحقق المتقاطع بخمسة أضعاف، مما يبرز فعاليته للتطبيقات الواقعية في الأمن الغذائي والسلامة.

في الختام، يبرز المؤلفون إمكانيات التعلم الآلي لتحسين تصنيف الطماطم، مع خطط للبحث المستقبلي تهدف إلى توسيع مجموعة البيانات لتشمل مجموعة متنوعة من أصناف الطماطم. كما يقترحون دمج الشبكات التنافسية التوليدية (GANs) لتوليد صور اصطناعية، مما يعزز مجموعة البيانات لمزيد من التجارب. تؤكد هذه الدراسة على أهمية أنظمة التصنيف المتقدمة في تعزيز الأمن الغذائي والسلامة المستدامة.

مقدمة

تسلط المقدمة الضوء على أهمية الطماطم كمحصول زراعي حيوي وإمكانية الأتمتة لتعزيز العائد والإنتاجية من خلال ممارسات الزراعة الذكية والذكاء الاصطناعي (AI). يعتبر الحصاد الروبوتي، الذي يستخدم الأذرع الروبوتية وأنظمة رؤية الكمبيوتر، ضروريًا لتحديد الطماطم الناضجة بدقة لضمان قطف الفاكهة القابلة للاستهلاك فقط، بينما تُترك غير الناضجة لتنضج. تشير الورقة إلى أن طرق الكشف التقليدية تعتمد على الخصائص البصرية مثل اللون والملمس والشكل، حيث يعتبر اللون مؤشرًا حاسمًا على النضج، حيث تمر الطماطم عبر خمس مراحل لونية متميزة من الأخضر إلى القرمزي.

تؤكد النص على أهمية تصنيف الطماطم الطازجة من الطماطم الرديئة لتعظيم القيمة السوقية، مع الأخذ في الاعتبار العوامل اللوجستية والتخزينية التي تؤثر على الجودة. كانت التطورات الأخيرة في التعلم الآلي والتعرف على الأنماط محورية في تحسين أنظمة المعالجة والفرز الزراعية. ومع ذلك، يتطلب تنفيذ هذه التقنيات في البيئات الزراعية مرونة وتنفيذًا في الوقت المناسب، مما يقدم تحديات يجب معالجتها لتطوير نظام قوي قادر على تقديم أداء عالٍ من حيث الدقة والسرعة وقابلية التوسع.

النتائج

يقدم قسم النتائج في ورقة البحث تقييمًا شاملاً لنموذج ViT-SENet الهجين المقترح لتصنيف ثمار الطماطم. حقق النموذج دقة تدريب تبلغ 99.87% ودقة تحقق تبلغ 93.87% بعد 50 دورة، مما يدل على أداء قوي دون الإفراط في التكيف، كما يتضح من خسارة تحقق تبلغ 0.03 وخسارة تدريب تبلغ 0.001. تم حساب مقاييس التقييم، بما في ذلك الدقة والاسترجاع ودرجة F1، لتقييم فعالية النموذج في التعامل مع مجموعات البيانات غير المتوازنة، مع أداء ملحوظ عبر ثلاث فئات: ناضجة، غير ناضجة، ورفض. كشفت مصفوفة الارتباك عن دقة توقع تبلغ 95% للناضجة، و99% لغير الناضجة، و97% للرفض، مما يدعم بشكل أكبر قوة النموذج.

بالإضافة إلى ذلك، استخدمت الدراسة التحقق المتقاطع بخمسة أضعاف، مما أسفر عن دقة اختبار قصوى تبلغ 99.90% خلال الطية الخامسة. أظهر تحليل منحنى ROC تمييزًا ممتازًا بين الفئات، مع قيم AUC تبلغ 0.98 للناضجة و0.99 لكل من الفئات غير الناضجة والرفض. تؤكد هذه النتائج قدرة النموذج على تصنيف ثمار الطماطم بدقة، مما يساهم بشكل كبير في التقدم في سلامة وأمن الغذاء من خلال طرق التصنيف الدقيقة. تم تقديم تمثيلات بصرية للنتائج، بما في ذلك منحنيات ROC ومصفوفات الارتباك، لتوضيح أداء النموذج بشكل شامل.

المناقشة

تركز الأبحاث المقدمة في هذه الورقة على تعزيز تصنيف ثمار الطماطم من خلال إطار عمل جديد للتعلم الآلي، نموذج ViT-SENet. يهدف هذا النموذج إلى معالجة قيود طرق التصنيف التقليدية، التي غالبًا ما تكون كثيفة العمالة وذات طابع ذاتي، من خلال توفير وسيلة أكثر دقة وكفاءة لتصنيف الطماطم إلى فئات ناضجة، غير ناضجة، أو مرفوضة. تستند الدوافع وراء هذا العمل إلى الدور الحاسم الذي تلعبه الطماطم في الأمن الغذائي والسلامة العالمية، مع إمكانية تقليل هدر الطعام وتحسين الممارسات الزراعية. يستفيد نموذج ViT-SENet من تقنيات استخراج الميزات المتقدمة وقد أظهر أداءً متفوقًا من حيث السرعة والدقة مقارنة بطرق التصنيف الحالية، مما يضع معيارًا جديدًا في هذا المجال.

تُقسم الورقة إلى عدة أقسام، بدءًا بمراجعة الأعمال ذات الصلة التي تسلط الضوء على التقدم في تصنيف الطعام الآلي باستخدام تقنيات الذكاء الاصطناعي. يوضح قسم المنهجية مجموعة البيانات المستخدمة، والتي تتكون من 2400 صورة للطماطم، ويستعرض تقنيات المعالجة المسبقة المطبقة لتعزيز فعالية مجموعة البيانات. يدمج إطار عمل ViT-SENet المقترح نقاط القوة في محولات الرؤية (ViT) وشبكات الضغط والتحفيز (SENet) لتحسين تمثيل الميزات ودقة التصنيف. تم تصميم بنية النموذج لالتقاط كل من الاعتماديات العالمية والمحلية داخل الصور، مما يسهل قدرات تصنيف قوية. تشير النتائج إلى أن نموذج ViT-SENet لا يحقق دقة عالية فحسب، بل يقلل أيضًا بشكل كبير من الوقت المطلوب للتصنيف، مما يوفر أداة قيمة للقطاع الزراعي ويساهم في تحقيق أهداف الاستدامة الأوسع.

Journal: Neural Computing and Applications, Volume: 37, Issue: 9
DOI: https://doi.org/10.1007/s00521-025-10973-5
Publication Date: 2025-01-20
Author(s): S M Masfequier Rahman Swapno et al.
Primary Topic: Spectroscopy and Chemometric Analyses

Overview

The research paper presents the ViT-SENet-Tom framework, a novel machine learning model designed for the rapid and accurate classification of tomatoes into three categories: ripe, unripe, and reject. The framework integrates a hybrid vision transformer (ViT) with a squeeze and excitation (SENet) block, enhancing classification efficiency through advanced neural network architectures. Initial challenges with a small dataset were addressed using augmentation techniques, resulting in a training accuracy of 99.87% and a validation accuracy of 93.87%. Notably, the model achieved a maximum accuracy of 99.90% during fivefold cross-validation, underscoring its effectiveness for real-world applications in food security and safety.

In the conclusion, the authors highlight the potential of machine learning to improve tomato classification, with plans for future research aimed at expanding the dataset to include a wider variety of tomato cultivars. They also propose the incorporation of generative adversarial networks (GANs) to generate synthetic images, thereby enhancing the dataset for further experiments. This work emphasizes the importance of advanced classification systems in promoting sustainable food security and safety.

Introduction

The introduction highlights the significance of tomatoes as a vital agricultural crop and the potential for automation to enhance yield and productivity through smart agricultural practices and artificial intelligence (AI). Robotic harvesting, which employs robotic arms and computer vision systems, is essential for accurately identifying ripe tomatoes to ensure only consumable fruit is picked, while unripe ones are left to mature. The paper notes that traditional detection methods rely on visual characteristics such as color, texture, and shape, with color being a critical indicator of ripeness, as tomatoes transition through five distinct color phases from green to crimson.

The text emphasizes the importance of classifying fresh tomatoes from inferior ones to maximize market value, considering logistical and storage factors that impact quality. Recent advancements in machine learning and pattern recognition have been pivotal in improving agricultural processing and sorting systems. However, the implementation of these machine learning techniques in agricultural settings requires flexibility and timely execution, presenting challenges that must be addressed to develop a robust system capable of delivering high performance in accuracy, timeliness, and scalability.

Results

The results section of the research paper presents a comprehensive evaluation of the proposed hybrid ViT-SENet model for tomato fruit classification. The model achieved a training accuracy of 99.87% and a validation accuracy of 93.87% after 50 epochs, indicating strong performance without overfitting, as evidenced by a validation loss of 0.03 and a training loss of 0.001. The evaluation metrics, including precision, recall, and F1 score, were calculated to assess the model’s effectiveness in handling unbalanced datasets, with notable performance across three classes: ripe, unripe, and reject. The confusion matrix revealed forecast accuracies of 95% for ripe, 99% for unripe, and 97% for reject, further supporting the model’s robustness.

Additionally, the study employed fivefold cross-validation, yielding a peak testing accuracy of 99.90% during the fifth fold. The ROC curve analysis demonstrated excellent class discrimination, with AUC values of 0.98 for ripe and 0.99 for both unripe and reject classes. These findings underscore the model’s capability to accurately classify tomato fruits, contributing significantly to advancements in food safety and security through precise classification methods. Visual representations of the results, including ROC curves and confusion matrices, are provided to illustrate the model’s performance comprehensively.

Discussion

The research presented in this paper focuses on enhancing tomato fruit classification through a novel machine learning framework, the ViT-SENet model. This model aims to address the limitations of traditional classification methods, which are often labor-intensive and subjective, by providing a more accurate and efficient means of categorizing tomatoes into ripe, unripe, or rejected categories. The motivation behind this work is rooted in the critical role tomatoes play in global food security and safety, with the potential to reduce food waste and optimize agricultural practices. The ViT-SENet model leverages advanced feature extraction techniques and has demonstrated superior performance in terms of speed and accuracy compared to existing classification methods, thus setting a new benchmark in the field.

The paper is structured into several sections, beginning with a review of related works that highlight the advancements in automated food classification using AI techniques. The methodology section details the dataset used, consisting of 2400 images of tomatoes, and outlines the preprocessing techniques applied to enhance the dataset’s effectiveness. The proposed ViT-SENet framework integrates the strengths of Vision Transformers (ViT) and Squeeze-and-Excitation Networks (SENet) to improve feature representation and classification accuracy. The model’s architecture is designed to capture both global and local dependencies within images, facilitating robust classification capabilities. The findings indicate that the ViT-SENet model not only achieves high accuracy but also significantly reduces the time required for classification, thereby offering a valuable tool for the agricultural sector and contributing to broader sustainability goals.