دمج البيانات الجوية ومعلومات المبيدات للتنبؤ بعوائد المحاصيل باستخدام التعلم الآلي Incorporating Meteorological Data and Pesticide Information to Forecast Crop Yields Using Machine Learning

المجلة: IEEE Access، المجلد: 12
DOI: https://doi.org/10.1109/access.2024.3383309
تاريخ النشر: 2024-01-01
المؤلف: Md Jiabul Hoque وآخرون
الموضوع الرئيسي: الزراعة الذكية والذكاء الاصطناعي

نظرة عامة

تقدم البحث نظامًا جديدًا للتنبؤ بعائد المحاصيل مصممًا لتعزيز الأمن الغذائي من خلال معالجة نقاط الضعف في القطاع الزراعي تجاه تغير المناخ واستخدام المبيدات. باستخدام مجموعة بيانات شاملة تتضمن بيانات الأرصاد الجوية وسجلات المبيدات وعوائد المحاصيل التاريخية، يستخدم الدراسة تقنيات التعلم الآلي للتنبؤ بعوائد المحاصيل. تم تدريب وتقييم ثلاثة نماذج – تعزيز التدرج، الجيران الأقرب K، والانحدار اللوجستي المتعدد المتغيرات – بدقة، حيث حقق نموذج تعزيز التدرج معامل تحديد مثير للإعجاب ($R^2$) قدره 99.99%، مما يشير إلى إمكانيته في التنبؤ الدقيق بالعائدات.

شملت منهجية الدراسة الحصول على البيانات بدقة من مصادر موثوقة، تلتها عمليات استخراج وتحويل وتحميل (ETL) لتبسيط مجموعة البيانات. تم تطبيق تقنيات المعالجة المسبقة، بما في ذلك التطبيع وهندسة الميزات، لتحسين جودة البيانات لتدريب النموذج. أظهرت النتائج أن نموذج تعزيز التدرج تفوق على النماذج الأخرى، حيث حقق قيمة $R^2$ قدرها 99.98% ومتوسط خطأ مطلق (MAE) قدره 0.0182 طن/هكتار، بينما حققت نماذج الجيران الأقرب K والانحدار اللوجستي المتعدد المتغيرات قيم $R^2$ قدرها 97.29% و96.13% على التوالي. يساهم هذا البحث في التحول الرقمي في الزراعة من خلال تحسين توقعات العائد وإدارة الموارد، ويضع الأساس لعمل مستقبلي يدمج نموذج تعزيز التدرج مع تقنيات إنترنت الأشياء لمراقبة العوامل البيئية المؤثرة على عوائد المحاصيل في الوقت الحقيقي.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على العلاقة الحرجة بين الزراعة والظروف الجوية، مع التركيز بشكل خاص على الزراعة المعتمدة على الأمطار، التي تشكل حوالي 80% من الأراضي الزراعية العالمية. تؤكد على أن إنتاجية الزراعة تتأثر بشدة بالتساقط والعوامل المناخية الأخرى، حيث يمثل تغير المناخ مخاطر كبيرة مثل انعدام الأمن الغذائي والمجاعة. تناقش الورقة الآثار السلبية للطقس القاسي، بما في ذلك الجفاف والأمطار الغزيرة، على عوائد المحاصيل، وخاصة في الهند، حيث تعتبر فترة الرياح الموسمية حيوية للري. ومن الجدير بالذكر أن الدراسة تشير إلى أدلة على أن ارتفاع درجات الحرارة يمكن أن يؤثر سلبًا على العوائد، خاصة خلال مراحل النمو الحساسة، وتحدد التحديات المتعلقة بتوقع عوائد المحاصيل باستخدام النماذج الإحصائية التقليدية.

لمعالجة هذه التحديات، يقترح المؤلفون استخدام تقنيات التعلم الآلي لتحسين توقعات عوائد المحاصيل لستة محاصيل رئيسية في الهند. يحددون منهجيتهم، التي تشمل جمع البيانات، وهندسة الميزات، وتدريب النموذج، وتقييم الأداء باستخدام خوارزميات التعلم الآلي المختلفة. تفيد الدراسة بأن النماذج المستخدمة حققت معاملات تحديد عالية ($R^2$)، مما يشير إلى قدرات تنبؤية قوية. يهدف المؤلفون إلى تطوير أداة دعم قرار تدمج المتغيرات الجوية لمساعدة المزارعين وصانعي القرار في التكيف مع تغير المناخ وضمان الأمن الغذائي. تم هيكلة الورقة لتوفير مراجعة شاملة للأدبيات ذات الصلة، ومنهجيات مفصلة، ونتائج تجريبية، ومناقشات حول آثار نتائجهم.

طرق

في هذه الدراسة، يتم استخدام نموذج بحث كمي لتطوير نموذج تعلم آلي قوي لتوقع عوائد المحاصيل. تركز المنهجية على التحليل المنهجي للبيانات العددية، مما يتماشى مع الهدف المتمثل في كشف العلاقات والأنماط المعقدة داخل مجموعات البيانات الكبيرة. تشمل هذه المجموعات سجلات الأرصاد الجوية، وبيانات تطبيق المبيدات، ومعلومات العائد التاريخية. من خلال اعتماد نهج قائم على البيانات، تهدف الدراسة إلى تعزيز الممارسات الزراعية المستدامة من خلال تحسين القدرات التنبؤية.

نتائج

تركز نتائج الدراسة على ثلاثة نماذج محسنة: YMLR (الانحدار الخطي المتعدد لعائد المحاصيل)، YK-NN (الجيران الأقرب لعائد المحاصيل)، وYGBR (الانحدار المعزز لعائد المحاصيل). تستخدم هذه النماذج بيانات تاريخية مستمدة من منظمة الأغذية والزراعة التابعة للأمم المتحدة والبنك الدولي لتقييم الإنتاج الزراعي للمحاصيل الرئيسية في الهند.

بعد مرحلة التدريب باستخدام البيانات التاريخية، تم تقييم النماذج باستخدام عينات اختبار – حالات بيانات لم يتم مواجهتها سابقًا. تم قياس فعالية كل نموذج من خلال حساب الخسارة، التي تعكس الفرق بين القيم المتوقعة والفعلية لهذه العينات الاختبارية. تعتبر هذه التقييمات حاسمة لتحديد دقة النماذج التنبؤية وإمكانية تطبيقها في التنبؤ الزراعي.

مناقشة

تسلط المناقشة الضوء على الدور الكبير لتقنيات التعلم الآلي (ML) في تعزيز دقة توقعات عوائد المحاصيل، خاصة في سياق الممارسات الزراعية المتأثرة بتغير المناخ. تم استخدام تقنيات ML المختلفة، مثل الانحدار Ridge، وآلات الدعم الناقل، والشبكات العصبية التلافيفية، لتحليل مجموعات بيانات واسعة، مما يكشف عن علاقات معقدة بين المتغيرات المناخية والإنتاج الزراعي. لقد زاد دمج البيانات الضخمة من إمكانيات هذه المنهجيات، مما أتاح توقعات أكثر دقة لعوائد المحاصيل من خلال معالجة التحديات التي تطرحها الظروف الجوية المتقلبة وغيرها من العوامل البيئية.

على الرغم من التقدم الواعد، تواجه تطبيقات تقنيات التعلم العميق قيودًا، بما في ذلك الحاجة إلى مجموعات بيانات كبيرة مصنفة والضبابية الكامنة في نماذج “الصندوق الأسود”، مما يعقد قابلية التفسير لقرارات الزراعة. تؤكد الدراسة على ضرورة إجراء دراسات أكثر محلية، خاصة في مناطق مثل جنوب آسيا، حيث هيمنت النماذج الإحصائية التقليدية على توقعات العائد. يقترح المؤلفون نموذجًا جديدًا لتوقع عائد المحاصيل يعتمد على التعلم الآلي يجمع بين مصادر بيانات موثوقة متعددة، بهدف إنتاج توقعات تتماشى عن كثب مع الملاحظات التجريبية. يسعى هذا النهج إلى معالجة مشكلات ندرة البيانات السائدة في مجموعات البيانات الزراعية وتعزيز موثوقية توقعات العائد بشكل عام، مما يساهم في الأمن الغذائي والممارسات الزراعية المستدامة.

Journal: IEEE Access, Volume: 12
DOI: https://doi.org/10.1109/access.2024.3383309
Publication Date: 2024-01-01
Author(s): Md Jiabul Hoque et al.
Primary Topic: Smart Agriculture and AI

Overview

The research presents a novel crop yield prediction system designed to enhance food security by addressing the vulnerabilities of the agricultural sector to climate change and pesticide use. Utilizing a comprehensive dataset that includes meteorological data, pesticide records, and historical crop yields, the study employs machine learning techniques to predict crop yields. Three models—Gradient Boosting, K-Nearest Neighbors, and Multivariate Logistic Regression—were rigorously trained and evaluated, with the Gradient Boosting model achieving an impressive coefficient of determination ($R^2$) of 99.99%, indicating its potential for accurate yield predictions.

The study’s methodology involved meticulous data acquisition from reputable sources, followed by Extract, Transform, and Load (ETL) processes to streamline the dataset. Preprocessing techniques, including normalization and feature engineering, were applied to enhance the data quality for model training. The results demonstrated that the Gradient Boosting model outperformed the other models, with a $R^2$ value of 99.98% and a mean absolute error (MAE) of 0.0182 t/ha, while the K-Nearest Neighbors and Multivariate Logistic Regression models yielded $R^2$ values of 97.29% and 96.13%, respectively. This research contributes to the digital transformation in agriculture by optimizing yield predictions and resource management, and it sets the stage for future work integrating the Gradient Boosting model with IoT technologies for real-time monitoring of environmental factors affecting crop yields.

Introduction

The introduction of this research paper highlights the critical relationship between agriculture and meteorological conditions, particularly focusing on rainfed agriculture, which constitutes about 80% of global cropland. It emphasizes that agricultural productivity is heavily influenced by precipitation and other climatic factors, with climate change posing significant risks such as food insecurity and famine. The paper discusses the adverse effects of extreme weather, including droughts and excessive rainfall, on crop yields, particularly in India, where the monsoon season is vital for irrigation. Notably, the study cites evidence that temperature increases can negatively impact yields, especially during sensitive growth phases, and outlines the challenges of predicting crop yields using traditional statistical models.

To address these challenges, the authors propose the use of machine learning techniques to enhance crop yield predictions for six major crops in India. They outline their methodology, which includes data collection, feature engineering, model training, and performance assessment using various machine learning algorithms. The study reports high coefficients of determination ($R^2$) for the models employed, indicating strong predictive capabilities. The authors aim to develop a decision support tool that integrates meteorological variables to assist farmers and decision-makers in adapting to climate change and ensuring food security. The paper is structured to provide a comprehensive review of related literature, detailed methodologies, experimental results, and discussions on the implications of their findings.

Methods

In this study, a quantitative research paradigm is employed to develop a robust machine learning model for forecasting crop yields. The methodology emphasizes the systematic analysis of numerical data, which aligns with the objective of uncovering complex relationships and patterns within large datasets. These datasets include meteorological records, pesticide application data, and historical yield information. By adopting a data-driven approach, the research aims to advance sustainable agricultural practices through enhanced predictive capabilities.

Results

The results of the study focus on three optimized models: YMLR (Yield Multiple Linear Regression), YK-NN (Yield K-Nearest Neighbors), and YGBR (Yield Gradient Boosting Regression). These models utilize historical data sourced from the United Nations Food and Agricultural Organization and the World Bank to assess the agricultural output of key crops in India.

Following the training phase with historical data, the models were evaluated using test samples—data instances not previously encountered. The effectiveness of each model was measured by calculating the loss, which reflects the difference between the predicted and actual values for these test samples. This evaluation is critical for determining the models’ predictive accuracy and their potential application in agricultural forecasting.

Discussion

The discussion highlights the significant role of machine learning (ML) in enhancing the accuracy of crop yield predictions, particularly in the context of agricultural practices influenced by climate change. Various ML techniques, such as Ridge Regression, Support Vector Machines, and Convolutional Neural Networks, have been employed to analyze extensive datasets, revealing intricate relationships between climatic variables and agricultural outputs. The integration of big data has further amplified the potential of these methodologies, enabling more precise forecasting of crop yields by addressing challenges posed by fluctuating weather conditions and other environmental factors.

Despite the promising advancements, the application of deep learning techniques faces limitations, including the need for large labeled datasets and the inherent opacity of “black box” models, which complicates interpretability for agricultural decision-making. The research emphasizes the necessity for more localized studies, particularly in regions like South Asia, where traditional statistical models have dominated yield forecasting. The authors propose a novel ML-based crop yield forecasting model that consolidates multiple reliable data sources, aiming to produce predictions that align closely with empirical observations. This approach seeks to address the data scarcity issues prevalent in agricultural datasets and enhance the overall reliability of yield predictions, thereby contributing to food security and sustainable agricultural practices.