تعزيز توقع محصول القمح من خلال دمج بيانات المناخ والأقمار الصناعية باستخدام تقنيات الذكاء الاصطناعي المتقدمة Enhanced wheat yield prediction through integrated climate and satellite data using advanced AI techniques

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-02700-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40413272
تاريخ النشر: 2025-05-24
المؤلف: Zhenyun Du وآخرون
الموضوع الرئيسي: آثار تغير المناخ على الزراعة

نظرة عامة

تبحث هذه الورقة البحثية في التنبؤ بعوائد القمح الشتوي في جنوب البنجاب، باكستان، باستخدام نهج متعدد المراحل يقسم دورة المحصول إلى أربع مراحل. من خلال دمج صور الأقمار الصناعية، وبيانات الطقس الموسمية، ومعلومات التربة عبر منصة Google Earth Engine، تستخدم الدراسة نماذج مختلفة من التعلم الآلي (ML) والتعلم العميق (DL)، بما في ذلك الغابة العشوائية (RF)، وآلة الدعم الناقل (SVM)، والشبكة العصبية التلافيفية (CNN). تشير النتائج إلى أن هذه التقنيات المتقدمة في الذكاء الاصطناعي تعزز بشكل كبير دقة التنبؤ، حيث تحقق قيم R² تتراوح بين 0.4 و 0.88. ومن الجدير بالذكر أن نموذج RF أظهر قدرات تعميم متفوقة، بينما توقع نموذج CNN زيادة بنسبة 15% في توقعات العائد مقارنةً بالآخرين.

تؤكد النتائج على أهمية دمج مصادر البيانات المتنوعة ومنهجيات الذكاء الاصطناعي لتحسين التنبؤ الزراعي. تبرز الدراسة NDVI كمؤشر حاسم لإنتاج القمح الشتوي وتقترح أن الإطار الذي تم تطويره يمكن تكييفه لتوقع محاصيل أخرى في باكستان. ومع ذلك، تشمل القيود الطلبات الحسابية العالية لنماذج CNN والانخفاض المحتمل في الأداء عبر المناطق الزراعية المتنوعة بسبب اختلاف أنواع التربة وممارسات الزراعة. تهدف الدراسة إلى إبلاغ صانعي السياسات والمزارعين، مما يعزز الأمن الغذائي واستدامة الزراعة من خلال توقعات أكثر دقة للعائد واستراتيجيات أفضل لإدارة الموارد. ستوسع الأعمال المستقبلية التقييم ليشمل مناطق إضافية وتأخذ في الاعتبار العوامل الاجتماعية والاقتصادية التي تؤثر على نتائج العائد.

الطرق

في هذا القسم، يوضح المؤلفون المنهجيات المستخدمة لتقدير عائد المحاصيل باستخدام تقنيات التعلم الآلي والتعلم العميق. بالنسبة للنهج القائم على التعلم الآلي، تم استخدام بيانات السنوات الست الأولى (2017-2022) لتحديد المتغيرات وتكون بمثابة بيانات تدريب. قبل تطوير النموذج، تم تطبيع جميع متغيرات مجموعة البيانات باستخدام طريقة z-score. تم تنفيذ هذه الإجراءات باستخدام Google Earth Engine (GEE) وArcMap 10.3، مع تقديم تمثيل بصري لمساحة استخدام الأراضي للقمح والمحاصيل الأخرى في منطقة ملتان كما هو موضح في الشكل 3.

بالإضافة إلى ذلك، يشير المؤلفون إلى تطبيق ثلاث تقنيات مختلفة للتعلم العميق لتقدير عائد المحاصيل، على الرغم من عدم توضيح تفاصيل محددة بشأن هذه الطرق في هذا القسم. يهدف هذا النهج المزدوج إلى تعزيز دقة وموثوقية توقعات عائد المحاصيل من خلال تقنيات حسابية متقدمة.

النتائج

يستعرض قسم النتائج إعداد التجربة والنتائج الناتجة عن تطبيق ثلاثة نماذج من التعلم الآلي (ML) والتعلم العميق (DL) لتوقع عائد محصول القمح في منطقة محددة. تؤكد الدراسة على أهمية دمج بيانات الاستشعار عن بعد مع بيانات المناخ على مدى فترات زمنية مختلفة، مقارنةً بفعالية استخدام هذه الأنواع من البيانات معًا خلال موسم النمو مقابل استخدامها بشكل منفصل. تم تطوير نموذج شامل لتوقع العائد، يجمع بين بيانات الطقس، والاستشعار عن بعد عبر الأقمار الصناعية، وبيانات التربة، بهدف تعزيز اتخاذ القرار في إدارة ملاذ الغذاء.

تم تقييم أداء النماذج بناءً على قيم جذر متوسط مربع الخطأ (RMSE)، مما يكشف أن الشبكات العصبية التلافيفية (CNN)، والغابة العشوائية (RF)، وآلات الدعم الناقل (SVM) تفوقت على الشبكات العصبية المتكررة (RNN) والانحدار اللوجستي في تقدير العائد. تم تنظيم النتائج في تحليلات متميزة لكل نموذج، بما في ذلك SVM، RF، Lasso، CNN، RNN، والشبكات العصبية الاصطناعية (ANN)، مما يوفر مقارنة شاملة لقدراتهم التنبؤية.

المناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على تزايد الأدبيات التي تركز على توقع العائد الزراعي، مع التأكيد على العوامل المختلفة التي تؤثر على إنتاج المحاصيل. استخدمت الدراسات الحديثة بيانات MODIS NDVI لتوقع العوائد الزراعية، مما يبرز أهمية ظروف التربة، وتوافر المياه، والعوامل المناخية مثل درجة الحرارة وهطول الأمطار. كان دمج تقنيات الاستشعار عن بعد، وخاصة NDVI، محورياً في تعزيز دقة توقعات العائد من خلال توفير بيانات في الوقت الفعلي حول صحة النباتات. بالإضافة إلى ذلك، يتم اعتماد تقنيات التعلم الآلي (ML) والتعلم العميق (DL) بشكل متزايد لتوقع العائد، حيث أظهرت نماذج مثل الشبكات العصبية الاصطناعية (ANNs) والشبكات العصبية التلافيفية (CNNs) القدرة على تحليل مجموعات بيانات معقدة وتحسين دقة التنبؤ.

تنتقد الورقة أيضًا المنهجيات الحالية، مشيرة إلى أنه بينما تم استخدام تقنيات ML التقليدية مثل الانحدار الخطي وأشجار القرار على نطاق واسع، فإنها غالبًا ما تتطلب استخراج ميزات يدوي وقد لا تلتقط بشكل فعال تعقيدات البيانات الزراعية. بالمقابل، يمكن لطرق التعلم العميق استخراج الميزات ذات الصلة تلقائيًا من مجموعات بيانات كبيرة، مما يحسن قدرات التنبؤ. يقترح المؤلفون نهجًا هجينًا يجمع بين تقنيات ML وDL المختلفة لتعزيز توقع العائد، خاصة في سياق منطقة ملتان في البنجاب، باكستان. يؤكدون على الحاجة إلى التحقق الخارجي عبر مناطق زراعية مناخية متنوعة لضمان قوة وعمومية نتائجهم، حيث قد لا تمثل البيانات المحلية الظروف الزراعية الأوسع بشكل كافٍ. بشكل عام، تؤكد المناقشة على أهمية دمج تقنيات حسابية متقدمة مع بيانات بيئية لتحسين الإنتاجية الزراعية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-02700-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40413272
Publication Date: 2025-05-24
Author(s): Zhenyun Du et al.
Primary Topic: Climate change impacts on agriculture

Overview

This research paper investigates the forecasting of winter wheat yields in south Punjab, Pakistan, utilizing a multi-phase approach that segments the crop cycle into four stages. By integrating satellite imagery, seasonal weather data, and soil information through the Google Earth Engine platform, the study employs various machine learning (ML) and deep learning (DL) models, including Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). The results indicate that these advanced AI techniques significantly enhance prediction accuracy, achieving R² values between 0.4 and 0.88. Notably, the RF model demonstrated superior generalization capabilities, while the CNN model projected a 15% increase in yield predictions compared to others.

The findings underscore the importance of combining diverse data sources and AI methodologies for improved agricultural forecasting. The study highlights NDVI as a critical predictor of winter wheat output and suggests that the developed framework can be adapted for forecasting other crops in Pakistan. However, limitations include the high computational demands of CNN models and the potential decrease in performance across varied agricultural zones due to differing soil types and farming practices. The research aims to inform policymakers and farmers, enhancing food security and agricultural sustainability through more accurate yield predictions and better resource management strategies. Future work will expand the assessment to additional districts and consider socio-economic factors influencing yield outcomes.

Methods

In this section, the authors outline the methodologies employed for estimating crop yield using machine learning and deep learning techniques. For the machine-learning approach, data from the first six years (2017-2022) was utilized to establish variables and serve as training data. Prior to model development, all dataset variables were normalized using the z-score method. The implementation of these procedures was conducted using Google Earth Engine (GEE) and ArcMap 10.3, with a visual representation of the land use area for wheat and other crops in the district of Multan provided in Figure 3.

Additionally, the authors indicate the application of three distinct deep-learning techniques for crop yield estimation, although specific details regarding these methods are not elaborated in this section. This dual approach aims to enhance the accuracy and reliability of crop yield predictions through advanced computational techniques.

Results

The results section details the experimental setup and findings from the application of three machine learning (ML) and deep learning (DL) models for predicting wheat crop yield in a specified region. The study emphasizes the importance of integrating remote sensing data with climate data over various timeframes, comparing the effectiveness of using these data types jointly during the growth season against using them in isolation. A comprehensive yield prediction model was developed, incorporating weather, satellite remote sensing, and soil data, aimed at enhancing decision-making in food sanctuary management.

The performance of the models was evaluated based on root mean square error (RMSE) values, revealing that Convolutional Neural Networks (CNN), Random Forest (RF), and Support Vector Machines (SVM) outperformed Recurrent Neural Networks (RNN) and Lasso Regression in yield estimation. The results are organized into distinct analyses for each model, including SVM, RF, Lasso, CNN, RNN, and Artificial Neural Networks (ANN), providing a thorough comparison of their predictive capabilities.

Discussion

The discussion section of the research paper highlights the growing body of literature focused on agricultural yield forecasting, emphasizing various factors influencing crop production. Recent studies have utilized MODIS NDVI data to predict agricultural yields, underscoring the significance of soil conditions, water availability, and climatic factors such as temperature and precipitation. The integration of remote sensing technologies, particularly NDVI, has been pivotal in enhancing the accuracy of yield predictions by providing real-time data on vegetation health. Additionally, machine learning (ML) and deep learning (DL) approaches are increasingly being adopted for yield prediction, with models like Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs) demonstrating the ability to analyze complex datasets and improve prediction accuracy.

The paper also critiques existing methodologies, noting that while traditional ML techniques like linear regression and decision trees have been widely used, they often require manual feature extraction and may not effectively capture the complexities of agricultural data. In contrast, deep learning methods can automatically extract relevant features from large datasets, thus improving prediction capabilities. The authors propose a hybrid approach that combines various ML and DL techniques to enhance yield forecasting, particularly in the context of the Multan District in Punjab, Pakistan. They emphasize the need for external validation across diverse agro-climatic zones to ensure the robustness and generalizability of their findings, as localized data may not adequately represent broader agricultural conditions. Overall, the discussion underscores the importance of integrating advanced computational techniques with environmental data to optimize agricultural productivity.