تقييم نماذج التعلم العميق باستخدام الذكاء الاصطناعي القابل للتفسير مع التحليل النوعي والكمّي لاكتشاف أمراض أوراق الأرز Evaluation of deep learning models using explainable AI with qualitative and quantitative analysis for rice leaf disease detection

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-14306-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40883348
تاريخ النشر: 2025-08-29
المؤلف: Hari Kishan Kondaveeti وآخرون
الموضوع الرئيسي: الزراعة الذكية والذكاء الاصطناعي

نظرة عامة

تتناول الورقة البحثية تقييم نماذج التعلم العميق لاكتشاف أمراض أوراق الأرز، مع التأكيد على أهمية الذكاء الاصطناعي القابل للتفسير (XAI) في تقييم أداء النموذج بما يتجاوز مقاييس التصنيف التقليدية. تقدم الدراسة منهجية من ثلاث مراحل: أولاً، حددت تقنيات التعلم الانتقالي نموذج ResNet50 كأفضل نموذج أداء بناءً على المقاييس التقليدية. في المرحلة الثانية، تم استخدام طريقة LIME لتصور أهمية الميزات، مما كشف أن ResNet50 تفوق في مقاييس مثل تقاطع الاتحاد (IoU)، ومعامل تشابه دايس (DSC)، ودقة البكسل (PWA)، بينما كان أداء InceptionV3 ضعيفًا في اختيار الميزات على الرغم من دقة التصنيف العالية. المرحلة النهائية قامت بت quantifying overfitting، مما يبرز ضرورة التحليلات النوعية والكمية لتحديد النماذج الموثوقة.

تسلط النتائج الضوء على الدور الحاسم لـ XAI في تطوير نماذج قوية للتطبيقات في الوقت الحقيقي في الزراعة والطب. تشمل اتجاهات البحث المستقبلية استخدام مجموعات بيانات أكبر وأكثر تنوعًا، وتطبيق تقنيات تعزيز البيانات المتقدمة مثل الشبكات التنافسية التوليدية (GANs)، واستكشاف طرق التجزئة التلقائية لتبسيط توليد الحقيقة الأرضية. بالإضافة إلى ذلك، تقترح الدراسة التحقيق في تقنيات التصور البديلة لفهم النموذج، وتنقيح النماذج لتخفيف overfitting، وتعزيز طرق XAI لتحسين الشفافية والأداء. قد تتضمن الأعمال المستقبلية أيضًا اختبار الأهمية الإحصائية وتقييم الكفاءة الحسابية لضمان الجدوى في البيئات ذات الموارد المحدودة.

الطرق

تتكون المنهجية المقترحة لتقييم تصنيف أمراض أوراق الأرز من ثلاث مراحل. في البداية، تم ضبط ثمانية نماذج تعلم عميق مسبقة التدريب على مجموعة بيانات معالجة مسبقًا باستخدام التعلم الانتقالي. تم تقييم أدائها في التصنيف من خلال مقاييس تقليدية، بما في ذلك الدقة، والدقة، والاسترجاع، ودرجة F1، والخصوصية، المستمدة من مصفوفات الارتباك لتحديد أفضل نموذج أداء. في المرحلة الثانية، تم تحليل قابلية تفسير هذه النماذج باستخدام تقنية التفسيرات المحلية القابلة للتفسير (LIME)، التي أنشأت خرائط أهمية تم مقارنتها مع أقنعة الحقيقة الأرضية المعلّمة من قبل الخبراء. استخدمت هذه المقارنة مقاييس مختلفة مثل تقاطع الاتحاد (IoU)، ومعامل تشابه دايس (DSC)، والحساسية، والخصوصية، ومعامل ارتباط ماثيو (MCC)، ودقة البكسل (PWA)، ومتوسط الخطأ المطلق (MAE) لتقييم تركيز النماذج على الميزات ذات الصلة بالمرض.

تضمنت المرحلة الثالثة تحليلًا مقارنًا للنتائج من المرحلتين الأوليين لتحديد النماذج التي لم تظهر فقط أداءً عاليًا في التصنيف ولكن أيضًا اعتمدت على ميزات مهمة. تم تقديم مقياس جديد، وهو نسبة overfitting، لقياس الاعتماد على الميزات غير ذات الصلة من خلال مقارنة خرائط الأهمية المتوقعة مع الحقيقة الأرضية، مما يكشف عن overfitting في التفسيرات المرئية. تضمن هذا النهج المنظم اختيار نماذج دقيقة وقابلة للتفسير، مما يعزز قابليتها للتطبيق في سيناريوهات الزراعة الواقعية. تم إجراء التجارب باستخدام MATLAB R2023a على محطة عمل عالية الأداء مزودة ببطاقة رسومات NVIDIA GeForce GTX 1080 Ti، وذاكرة عشوائية سعة 64 جيجابايت، ومعالج Intel(R) Xeon(R) W-2125.

المناقشة

في السنوات الأخيرة، تم إحراز تقدم كبير في الكشف التلقائي وتصنيف أمراض أوراق النباتات، مع دمج ملحوظ لتقنيات الذكاء الاصطناعي القابل للتفسير (XAI) لتعزيز قابلية تفسير النموذج. أفادت دراسات مختلفة بمعدلات دقة عالية لنماذج مختلفة، مثل تحقيق Deng وآخرون 91% دقة باستخدام طريقة تجميع، وBijoy وآخرون الذين وصلوا إلى 99.81% باستخدام شبكة عصبية تلافيفية خفيفة الوزن. تم استخدام تقنيات مثل GradCAM وLIME وSHAP عبر هذه الدراسات لتوفير تفسيرات مرئية لتوقعات النموذج، على الرغم من أن العديد منها لم يقيم بشكل كمي فعالية هذه الطرق في استخراج الميزات. تهدف الدراسة الحالية إلى سد هذه الفجوة من خلال دمج مقاييس كمية متعددة لتقييم شامل لأداء النموذج.

تستخدم الدراسة ثمانية نماذج مسبقة التدريب، بما في ذلك ResNet50 وEfficientNetB0، التي تم تهيئتها بأوزان ImageNet للتعلم الانتقالي. تم الحفاظ على إعداد تجريبي متسق عبر النماذج، مما يضمن أن الاختلافات في الأداء يمكن أن تُعزى إلى اختلافات معمارية بدلاً من عدم اتساق التجارب. تتكون مجموعة البيانات، التي تضم صورًا لأربع أمراض لأوراق الأرز، من تعزيز لمعالجة عدم توازن الفئات، مما أسفر عن إجمالي 4,016 صورة. تستخدم الدراسة مقاييس تقليدية مثل دقة التصنيف، والدقة، والاسترجاع، ودرجة F1 لتقييم أداء النموذج، حيث برز ResNet50 كأفضل أداء، محققًا دقة قدرها 99.13%. علاوة على ذلك، تقدم الدراسة مقياس نسبة overfitting الجديد لقياس أهمية الميزات التي حددتها النماذج، مما يوفر رؤى أعمق حول موثوقية النموذج وقابليته للتعميم في التطبيقات الواقعية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-14306-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40883348
Publication Date: 2025-08-29
Author(s): Hari Kishan Kondaveeti et al.
Primary Topic: Smart Agriculture and AI

Overview

The research paper discusses the evaluation of deep learning models for rice leaf disease detection, emphasizing the importance of explainable AI (XAI) in assessing model performance beyond traditional classification metrics. The study introduces a three-phase methodology: first, transfer learning techniques identified ResNet50 as the best-performing model based on conventional metrics. In the second phase, the LIME method was employed to visualize feature significance, revealing that ResNet50 excelled in metrics such as Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and Pixel-wise Accuracy (PWA), while InceptionV3, despite high classification accuracy, performed poorly in feature selection. The final phase quantified overfitting, underscoring the necessity of both qualitative and quantitative analyses to identify reliable models.

The findings highlight the critical role of XAI in developing robust models for real-time applications in agriculture and medicine. Future research directions include utilizing larger and more diverse datasets, employing advanced data augmentation techniques like Generative Adversarial Networks (GANs), and exploring automatic segmentation methods to streamline ground truth generation. Additionally, the study suggests investigating alternative visualization techniques for model interpretability, refining models to mitigate overfitting, and enhancing XAI methods to improve transparency and performance. Future work may also incorporate statistical significance testing and evaluate computational efficiency to ensure feasibility in resource-constrained environments.

Methods

The proposed methodology for evaluating rice leaf disease classification consists of three stages. Initially, eight pre-trained deep learning models were fine-tuned on a pre-processed dataset using transfer learning. Their classification performance was assessed through conventional metrics, including accuracy, precision, recall, F1 score, and specificity, derived from confusion matrices to identify the best-performing model. In the second stage, the interpretability of these models was analyzed using the Local Interpretable Model-agnostic Explanations (LIME) technique, which generated saliency maps that were compared to expert-annotated ground truth masks. This comparison employed various metrics such as Intersection over Union (IoU), Dice Similarity Coefficient (DSC), Sensitivity, Specificity, Matthews Correlation Coefficient (MCC), Pixel-wise Accuracy (PWA), and Mean Absolute Error (MAE) to evaluate the models’ focus on disease-relevant features.

The third stage involved a comparative analysis of the results from the first two stages to identify models that not only exhibited high classification performance but also relied on significant features. A novel metric, the Overfitting Ratio, was introduced to quantify the reliance on irrelevant features by comparing predicted saliency maps with ground truth, thereby detecting overfitting in visual explanations. This structured approach ensures the selection of models that are both accurate and interpretable, enhancing their applicability in real-world agricultural scenarios. The experiments were conducted using MATLAB R2023a on a high-performance workstation equipped with an NVIDIA GeForce GTX 1080 Ti GPU, 64 GB of RAM, and an Intel(R) Xeon(R) W-2125 processor.

Discussion

In recent years, significant advancements have been made in the automatic detection and classification of plant leaf diseases, with a notable integration of Explainable Artificial Intelligence (XAI) techniques to enhance model interpretability. Various studies have reported high accuracy rates for different models, such as Deng et al. achieving 91% accuracy with an ensemble method and Bijoy et al. reaching 99.81% with a lightweight deep convolutional neural network. Techniques like GradCAM, LIME, and SHAP have been employed across these studies to provide visual explanations of model predictions, although many did not quantitatively assess the effectiveness of these methods in feature extraction. The current study aims to fill this gap by incorporating multiple quantitative measures for a comprehensive evaluation of model performance.

The research utilizes eight pre-trained models, including ResNet50 and EfficientNetB0, initialized with ImageNet weights for transfer learning. A consistent experimental setup was maintained across models, ensuring that differences in performance could be attributed to architectural variations rather than experimental inconsistencies. The dataset, comprising images of four rice leaf diseases, was augmented to address class imbalance, resulting in a total of 4,016 images. The study employs conventional metrics such as classification accuracy, precision, recall, and F1-score to evaluate model performance, with ResNet50 emerging as the top performer, achieving an accuracy of 99.13%. Furthermore, the study introduces a novel overfitting ratio metric to quantify the relevance of features identified by the models, providing deeper insights into model reliability and generalizability in real-world applications.