إطار مقترح للتنبؤ بعائد المحاصيل باستخدام نهج اختيار الميزات الهجينة والتعلم الآلي المحسن A proposed framework for crop yield prediction using hybrid feature selection approach and optimized machine learning

المجلة: Neural Computing and Applications، المجلد: 36، العدد: 33
DOI: https://doi.org/10.1007/s00521-024-10226-x
تاريخ النشر: 2024-08-16
المؤلف: Mahmoud Abdel-Salam وآخرون
الموضوع الرئيسي: الزراعة الذكية والذكاء الاصطناعي

نظرة عامة

تقدم ورقة البحث إطارًا جديدًا يهدف إلى تعزيز دقة توقعات إنتاج المحاصيل من خلال معالجة تعقيدات التفاعلات البيئية وتحسين نموذج الانحدار باستخدام دعم المتجهات (SVR). يتكون الإطار من ثلاث مراحل: المعالجة المسبقة، اختيار الميزات الهجين، والتوقع. في مرحلة المعالجة المسبقة، يتم إجراء تطبيع البيانات، يليه تجميع K-means وفلتر قائم على الارتباط (CFS) لإنشاء مجموعة بيانات مصغرة. تقدم مرحلة اختيار الميزات الهجين نهجًا جديدًا، FMIG-RFE، الذي يجمع بين تقنيات الفلترة وإزالة الميزات التكرارية لتحديد الميزات الأكثر أهمية التي تؤثر على إنتاج المحاصيل. تستخدم مرحلة التوقع خوارزمية تحسين جراد البحر المحسنة (ICOA) لتحسين معلمات SVR، مما يؤدي إلى تحسين دقة التوقع وكفاءة الحساب.

تشير النتائج إلى أن الإطار المقترح يتفوق على الأساليب الحالية المتطورة في توقع إنتاج المحاصيل، مما يظهر فعالية ICOA في تحسين المعلمات. تؤكد الورقة على أهمية اختيار الميزات في نماذج التعلم الآلي للزراعة، خاصة في توقع إنتاج الأرز بناءً على معلمات بيئية مختلفة. تشمل اتجاهات البحث المستقبلية استكشاف خوارزميات التجميع المتقدمة المعتمدة على الضبابية ودمج ميزات إضافية لتحسين دقة التوقع بشكل أكبر.

مقدمة

تؤكد مقدمة ورقة البحث هذه على الدور الحاسم للزراعة في معالجة الأمن الغذائي العالمي، خاصة في ضوء القضايا المستمرة للجوع التي تفاقمت بسبب النمو السكاني. حددت الأمم المتحدة هدفًا لعام 2030 لتعزيز الأمن الغذائي وتقليل الجوع، مما يتطلب توقعات دقيقة لإنتاج الزراعة من أجل صنع السياسات الفعالة والتخطيط التشغيلي من قبل المزارعين. ومع ذلك، فإن توقع إنتاج المحاصيل معقد بسبب العوامل المؤثرة المختلفة، بما في ذلك الطقس، وظروف التربة، ووجود الآفات. تسلط الورقة الضوء على إمكانيات طرق التعلم الآلي (ML) لتحسين توقعات الإنتاج، مشيرة إلى أن هذه النماذج يمكن تصنيفها إلى التعلم المراقب وغير المراقب، مع التركيز على تقنيات التعلم المراقب مثل الانحدار باستخدام دعم المتجهات (SVR).

يقترح المؤلفون إطارًا جديدًا يدمج نهج اختيار الميزات الهجين مع نموذج SVR المحسن لتعزيز دقة التوقع. يتكون هذا الإطار من ثلاث مراحل: المعالجة المسبقة، اختيار الميزات، والتوقع. تستخدم مرحلة المعالجة المسبقة تجميع K-means وطريقة تصنيف CFS لمعالجة الأبعاد العالية والازدواجية في مجموعات البيانات الزراعية. تستخدم مرحلة اختيار الميزات كل من طرق الفلترة والتغليف لتحديد الميزات الأكثر صلة، بينما تقدم مرحلة التوقع خوارزمية محسنة، ICOA، التي تحسن معلمات SVR. تؤكد الورقة أن هذا النهج يخفف بشكل فعال من التحديات المرتبطة باختيار الميزات وضبط المعلمات، مما يحسن في النهاية توقعات إنتاج المحاصيل، كما يتضح من خلال التجارب مع مجموعة بيانات إنتاج الأرز.

طرق

في هذا القسم، يوضح المؤلفون المنهجية التجريبية المستخدمة لتقييم فعالية إطارهم المقترح، الذي يدمج مراحل اختيار الميزات الهجين والتوقع. تظهر التجارب أن نهج اختيار الميزات يعزز بشكل كبير دقة نماذج التعلم الآلي (ML) المختلفة من خلال تحديد الميزات الأكثر صلة من مجموعة البيانات. يتم تحسين مرحلة التوقع بشكل أكبر من خلال تنفيذ خوارزمية تحسين المعاملات التكرارية الجديدة (ICOA)، التي تهدف إلى تحسين معلمات نماذج التعلم الآلي.

يتم تقييم أداء الإطار عبر عدة خوارزميات تعلم آلي، بما في ذلك شجرة القرار، الغابة العشوائية، تعزيز التدرج، آلة دعم المتجهات، وجيران K الأقرب (KNN). يتكون التحليل من خمس مراحل: (1) يتم التحقق من نتائج التوقع الأولية باستخدام مقاييس التقييم الإحصائي؛ (2) يتم استخدام تقنيات أهمية الميزات لاختيار الميزات الأساسية؛ (3) يتم بناء النماذج باستخدام الطريقة المقترحة FMIG-RFE-SVM، مع التركيز على الميزات الحرجة؛ (4) يتم تقييم مرحلة التوقع من خلال مقارنة ICOA مع خوارزميات تحسين أخرى؛ و(5) يتم تقييم الإطار العام مقارنة بالأساليب المتطورة. تشير النتائج إلى أن إطار اختيار الميزات الهجين المقترح يتفوق على النماذج الأساسية، مما يبرز إمكانيته في تحسين دقة التوقع في تطبيقات التعلم الآلي.

مناقشة

تؤكد قسم المناقشة في الورقة على أهمية تقدير دقيق لإنتاج المحاصيل في سياق الطلبات الغذائية العالمية المتزايدة. تستعرض الورقة منهجيات مختلفة تم استخدامها على مدار العقد الماضي، بما في ذلك التعلم الآلي (ML)، وتقنيات الانحدار التقليدية، ونماذج المحاصيل، التي تعمل كمحاكاة للدراسات العلمية. تسلط الورقة الضوء على قيود خوارزميات التعلم الآلي الحالية في تجميع البيانات الزراعية بدقة وتقترح طريقة تجميع البيانات باستخدام تعظيم احتمال القرب (PLMDC)، التي تعزز كفاءة التجميع من خلال استخدام خصائص أقل من مجموعات البيانات الزراعية الواسعة. تظهر هذه الطريقة، إلى جانب نهج الانحدار الخطي وخوارزمية الجينات (GA) لاختيار الميزات، قدرات تنبؤية متفوقة مقارنة بالطرق التقليدية.

علاوة على ذلك، تنتقد الورقة الأدبيات الحالية حول توقع إنتاج المحاصيل، مشيرة إلى الحاجة إلى تقنيات أكثر شمولاً تدمج البيانات المناخية وتحسن معلمات النموذج. تحدد الفجوات في تطبيق طرق التعلم العميق (DL) لتوقع الإنتاج، خاصة في البيئات الداخلية، وتقترح أن دمج نهج اختيار الميزات المتقدم وإطارات تحسين يمكن أن يحسن بشكل كبير دقة التوقع. يهدف الإطار المقترح إلى معالجة هذه التحديات من خلال استخدام نهج اختيار الميزات الهجين وخوارزمية جديدة لتحسين المعلمات، مما يعزز في النهاية الأداء التنبؤي لنماذج إنتاج المحاصيل.

Journal: Neural Computing and Applications, Volume: 36, Issue: 33
DOI: https://doi.org/10.1007/s00521-024-10226-x
Publication Date: 2024-08-16
Author(s): Mahmoud Abdel-Salam et al.
Primary Topic: Smart Agriculture and AI

Overview

The research paper presents a novel framework aimed at enhancing the accuracy of crop yield predictions by addressing the complexities of environmental interactions and optimizing the Support Vector Regressor (SVR) model. The framework consists of three phases: preprocessing, hybrid feature selection, and prediction. In the preprocessing phase, data normalization is performed, followed by K-means clustering and a correlation-based filter (CFS) to create a reduced dataset. The hybrid feature selection phase introduces a new approach, FMIG-RFE, which combines filter and recursive feature elimination techniques to identify the most significant features affecting crop yield. The prediction phase employs an improved Crayfish Optimization Algorithm (ICOA) to optimize the SVR hyperparameters, resulting in enhanced prediction accuracy and computational efficiency.

The findings indicate that the proposed framework outperforms existing state-of-the-art methods in crop yield prediction, demonstrating the effectiveness of the ICOA in hyperparameter optimization. The paper emphasizes the importance of feature selection in machine learning models for agriculture, particularly in predicting paddy crop production based on various environmental parameters. Future research directions include exploring advanced fuzzy-based clustering algorithms and incorporating additional features to further improve prediction accuracy.

Introduction

The introduction of this research paper emphasizes the critical role of agriculture in addressing global food security, particularly in light of ongoing hunger issues exacerbated by population growth. The United Nations has set a target for 2030 to enhance food security and reduce hunger, necessitating accurate agricultural yield predictions for effective policymaking and operational planning by farmers. However, predicting crop yields is complex due to various influencing factors, including weather, soil conditions, and pest presence. The paper highlights the potential of machine learning (ML) methods to improve yield predictions, noting that these models can be categorized into supervised and unsupervised learning, with a focus on supervised learning techniques such as Support Vector Regression (SVR).

The authors propose a novel framework that integrates a hybrid feature selection approach with an optimized SVR model to enhance prediction accuracy. This framework consists of three phases: preprocessing, feature selection, and prediction. The preprocessing phase employs k-means clustering and the CFS ranking method to address high dimensionality and redundancy in agricultural datasets. The feature selection phase utilizes both filter and wrapper methods to identify the most relevant features, while the prediction phase introduces an improved algorithm, ICOA, which optimizes SVR hyperparameters. The paper asserts that this approach effectively mitigates challenges associated with feature selection and hyperparameter tuning, ultimately improving crop yield predictions, as demonstrated through experiments with a paddy crop dataset.

Methods

In this section, the authors detail the experimental methodology employed to assess the efficacy of their proposed framework, which integrates hybrid feature selection and prediction phases. The experiments demonstrate that the feature selection approach significantly enhances the accuracy of various machine learning (ML) models by identifying the most relevant features from the dataset. The prediction phase is further refined through the implementation of a novel Iterative Coefficient Optimization Algorithm (ICOA), aimed at optimizing the parameters of the ML models.

The framework’s performance is evaluated across several ML algorithms, including Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, and K-Nearest Neighbors (KNN). The analysis comprises five stages: (1) initial prediction results are validated using statistical assessment measures; (2) feature importance techniques are employed to select essential features; (3) models are constructed using the proposed FMIG-RFE-SVM method, focusing on critical features; (4) the prediction phase is assessed by comparing ICOA with other optimization algorithms; and (5) the overall framework is benchmarked against state-of-the-art approaches. The findings indicate that the proposed hybrid feature selection framework outperforms baseline models, underscoring its potential for improving predictive accuracy in machine learning applications.

Discussion

The discussion section of the paper emphasizes the importance of accurate crop yield estimation in the context of increasing global food demands. It reviews various methodologies employed over the past decade, including machine learning (ML), traditional regression techniques, and crop models, which serve as simulations of scientific studies. The paper highlights the limitations of existing ML algorithms in accurately clustering agricultural data and proposes the Proximity Likelihood Maximization Data Clustering (PLMDC) method, which enhances clustering efficiency by utilizing fewer characteristics from extensive agricultural datasets. This method, alongside a linear regression approach and Genetic Algorithm (GA) for feature selection, demonstrates superior predictive capabilities compared to traditional methods.

Furthermore, the paper critiques the current literature on crop yield prediction, noting the need for more comprehensive techniques that incorporate climatic data and optimize model parameters. It identifies gaps in the application of deep learning (DL) methods for yield prediction, particularly in indoor settings, and suggests that the integration of advanced feature selection and optimization frameworks can significantly improve prediction accuracy. The proposed framework aims to address these challenges by employing a hybrid feature selection approach and a novel algorithm for hyperparameter optimization, ultimately enhancing the predictive performance of crop yield models.