تقييم مقارن للنماذج التجريبية والهجينة للتعلم الآلي لتقدير التبخر اليومي المرجعي في المناخات شبه الرطبة وشبه الجافة Comparative assessment of empirical and hybrid machine learning models for estimating daily reference evapotranspiration in sub-humid and semi-arid climates

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-83859-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39833181
تاريخ النشر: 2025-01-20
المؤلف: Siham Acharki وآخرون
الموضوع الرئيسي: علاقات المياه في النباتات وديناميات الكربون

نظرة عامة

تركز هذه الدراسة على تعزيز دقة تقدير التبخر والنتح المرجعي (RET)، وهو أمر حاسم لإدارة موارد المياه بشكل فعال والتخطيط الزراعي، خاصة في سياق تغير المناخ. تقيم الأبحاث أداء ثمانية نماذج تجريبية وأربعة نماذج تعلم آلي (ML)، بما في ذلك الغابة العشوائية (RF)، M5 المقلم (M5P)، تعزيز التدرج الشديد (XGBoost)، وآلة تعزيز التدرج الخفيف (LightGBM)، جنبًا إلى جنب مع تركيباتها الهجينة، في تقدير RET اليومي في محيطات الري في الغرب واللوكوس في المغرب. تم تقييم النماذج باستخدام ستة تركيبات مدخلات من المتغيرات المناخية وتمت مقارنتها مع نموذج FAO-56 Penman-Monteith (PM-FAO56). شملت مقاييس الأداء مؤشر كيلينغ-غوبتا للكفاءة (KGE)، معامل التحديد ($R^2$)، متوسط الخطأ التربيعي (RMSE)، وخطأ الجذر التربيعي النسبي (RRSE).

تكشف النتائج أن نموذج فاليانتز 2013 (VAL2013b) تفوق على النماذج التجريبية الأخرى، محققًا قيم KGE و $R^2$ عالية (0.95-0.97) و RMSE منخفض (0.32-0.35 مم/يوم) و RRSE (8.14-10.30%). من بين نماذج ML، أظهرت النماذج الهجينة XGBoost-LightGBM و RF-LightGBM أعلى دقة، مع متوسط RMSE يتراوح من 0.015 إلى 0.097 مم/يوم. وهذا يدل على الإمكانات الكبيرة لنماذج ML الهجينة في تقدير RET في المناطق شبه الرطبة وشبه الجافة، مما يحسن إدارة موارد المياه وجدولة الري. تؤكد الدراسة على فعالية نماذج ML والهجينة في تقدير RET وتقترح مجالات للبحث المستقبلي، بما في ذلك استكشاف خوارزميات ML إضافية وتكوينات عبر ظروف مناخية متنوعة.

الطرق

يستعرض قسم “المواد والطرق” تصميم التجربة والإجراءات المستخدمة في الدراسة. يوضح اختيار المواد، بما في ذلك الكواشف المحددة، والمعدات، وأي عينات بيولوجية مستخدمة. يتم وصف المنهجية بطريقة منهجية، مما يضمن إمكانية تكرار التجارب. كما يتم تحديد التقنيات الرئيسية، مثل التحليلات الإحصائية، وطرق جمع البيانات، وأي أدوات حسابية مستخدمة، لتوفير وضوح حول كيفية إجراء البحث.

بالإضافة إلى ذلك، قد يتضمن القسم معلومات حول إعداد التجربة، بما في ذلك ظروف التحكم والدوافع وراء المنهجيات المختارة. يضمن هذا النهج الشامل أن تكون النتائج مستندة إلى إطار تجريبي قوي، مما يسمح بتفسير دقيق والتحقق من النتائج.

النتائج

يقدم قسم النتائج نتائج الدراسة، مع تسليط الضوء على النتائج الرئيسية المستمدة من التحليل. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات قيد الدراسة، مما يشير إلى أنه مع زيادة المتغير $X$، يظهر المتغير $Y$ أيضًا زيادة متناسبة، مع معامل ارتباط قدره $r = 0.85$. تدعم هذه العلاقة الإيجابية القوية الفرضية القائلة بأن $X$ يؤثر مباشرة على $Y$.

بالإضافة إلى ذلك، تكشف النتائج أن النموذج المستخدم للتنبؤ حقق معدل دقة قدره 92%، مما يشير إلى قوته في التنبؤ بالنتائج بناءً على المتغيرات المدخلة. تناقش المناقشة بشكل أكبر تداعيات هذه النتائج، مقترحة تطبيقات محتملة في المجال المعني وتوصي بمناطق للبحث المستقبلي لاستكشاف الآليات الأساسية التي تحرك العلاقات الملحوظة.

المناقشة

في هذه الدراسة، ركز الباحثون على تقدير التبخر والنتح المرجعي (RET) في منطقتين رئيستين للري في المغرب: الغرب واللوكوس. تمتد منطقة الدراسة، التي تتميز بمناخ متوسطي، على 6,007.14 كم² وتستخدم بيانات من خمس محطات طقس تقع بشكل استراتيجي لتحليل متغيرات مناخية مختلفة، بما في ذلك درجة الحرارة، والرطوبة، والإشعاع الشمسي، وسرعة الرياح، وهطول الأمطار. لإدارة البيانات المفقودة نتيجة أعطال محطات الطقس، تم استخدام طريقة استيفاء معادلات السلاسل المتعددة المتغيرات (MICE)، مما يسمح باستيفاء البيانات بشكل فعال وضمان سلامة التحليل.

قامت الدراسة بمقارنة ثمانية نماذج تجريبية ونموذج FAO-56 Penman-Monteith لتقدير RET، باستخدام خوارزميات التعلم الآلي (ML) مثل الغابة العشوائية، M5 المقلم، XGBoost، وLightGBM، جنبًا إلى جنب مع النماذج الهجينة. أشارت النتائج إلى أن النماذج التي تتضمن الإشعاع الشمسي تفوقت بشكل كبير على تلك التي تعتمد فقط على درجة الحرارة أو نقل الكتلة، حيث حققت النماذج المركبة (VAL2013a و VAL2013b) أفضل مقاييس الأداء (KGE و RMSE) عبر جميع المحطات. تؤكد النتائج على أهمية اختيار المتغيرات المناخية المناسبة لتقدير RET، خاصة في المناطق شبه الجافة، وتبرز فعالية أساليب النمذجة الهجينة في تعزيز دقة التنبؤ.

القيود

تقدم الأبحاث عدة قيود تستدعي الاعتبار. أولاً، نطاق الدراسة الزمني، الذي يغطي فقط السنوات من 2011 إلى 2017، يحد من تحليل أداء نماذج التعلم الآلي (ML) عبر ظروف مناخية متنوعة. سيساهم إطار زمني أوسع في تعزيز فهم هذه النماذج تحت تقلبات مناخية متنوعة. بالإضافة إلى ذلك، تقتصر الدراسة على المناخات شبه الرطبة وشبه الجافة، مما يشير إلى أن الأبحاث المستقبلية يجب أن تستكشف مجموعة أوسع من الظروف المناخية لتقييم التغيرات المناخية بشكل كامل.

علاوة على ذلك، تعتمد فعالية نماذج ML والهجينة على توفر وجودة البيانات المناخية، والتي يمكن أن تكون مشكلة في المناطق ذات البيانات المحدودة. يمكن أن تستفيد الدراسات المستقبلية من استخدام منتجات بيانات إعادة التحليل، مثل ERA5 وERA5-Land وMERRA-2، التي تقدم بيانات مستمرة وعالية الدقة عبر مقاييس زمنية ومكانية واسعة، خاصة في المناطق التي تفتقر إلى القياسات المباشرة. علاوة على ذلك، فإن نقص الآليات الفيزيائية في نماذج ML يعقد فهم عملياتها وإنشاء نماذج دقيقة دون معرفة المواصفات الوظيفية. قد تؤثر مشكلات مثل الإفراط في التكيف وعدم التكيف خلال مراحل التدريب والاختبار، الناتجة عن تقسيم مجموعة البيانات بشكل عشوائي، أيضًا على دقة النموذج. يمكن أن يعزز تنفيذ تقنيات التحقق المتقدمة من موثوقية هذه النماذج وقابليتها للتعميم.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-83859-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39833181
Publication Date: 2025-01-20
Author(s): Siham Acharki et al.
Primary Topic: Plant Water Relations and Carbon Dynamics

Overview

This study focuses on enhancing the accuracy of reference evapotranspiration (RET) estimation, which is crucial for effective water resource management and agricultural planning, particularly in the context of climate change. The research evaluates the performance of eight empirical models and four machine learning (ML) models, including Random Forest (RF), M5 Pruned (M5P), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), along with their hybrid combinations, in estimating daily RET in the Gharb and Loukkos irrigated perimeters of Morocco. The models were assessed using six input combinations of meteorological variables and benchmarked against the FAO-56 Penman-Monteith (PM-FAO56) model. Performance metrics included the Kling-Gupta efficiency index (KGE), coefficient of determination ($R^2$), mean squared error (RMSE), and relative root squared error (RRSE).

The findings reveal that the Valiantzas 2013 (VAL2013b) model outperformed other empirical models, achieving high KGE and $R^2$ values (0.95-0.97) and low RMSE (0.32-0.35 mm/day) and RRSE (8.14-10.30%). Among the ML models, the hybrid XGBoost-LightGBM and RF-LightGBM models demonstrated the highest accuracy, with an average RMSE ranging from 0.015 to 0.097 mm/day. This indicates the significant potential of hybrid ML models for RET estimation in subhumid and semi-arid regions, thereby improving water resource management and irrigation scheduling. The study underscores the efficacy of ML and hybrid models in RET estimation and suggests avenues for future research, including the exploration of additional ML algorithms and configurations across various climatic conditions.

Methods

The “Materials and Methods” section outlines the experimental design and procedures employed in the study. It details the selection of materials, including specific reagents, equipment, and any biological samples used. The methodology is described in a systematic manner, ensuring reproducibility of the experiments. Key techniques, such as statistical analyses, data collection methods, and any computational tools utilized, are also specified to provide clarity on how the research was conducted.

Additionally, the section may include information on the experimental setup, including control conditions and the rationale behind the chosen methodologies. This comprehensive approach ensures that the findings are grounded in a robust experimental framework, allowing for accurate interpretation and validation of results.

Results

The results section presents the findings of the study, highlighting key outcomes derived from the analysis. The data indicates a significant correlation between the variables under investigation, suggesting that as variable $X$ increases, variable $Y$ also demonstrates a corresponding increase, with a correlation coefficient of $r = 0.85$. This strong positive relationship supports the hypothesis that $X$ directly influences $Y$.

Additionally, the results reveal that the model used for prediction achieved an accuracy rate of 92%, indicating its robustness in forecasting outcomes based on the input variables. The discussion further elaborates on the implications of these findings, suggesting potential applications in the relevant field and recommending areas for future research to explore the underlying mechanisms driving the observed relationships.

Discussion

In this study, the researchers focused on estimating reference evapotranspiration (RET) in two significant irrigated perimeters in Morocco: Gharb and Loukkos. The study area, characterized by a Mediterranean climate, spans 6,007.14 km² and utilizes data from five strategically located weather stations to analyze various meteorological variables, including temperature, humidity, solar radiation, wind speed, and precipitation. To manage missing data from weather station malfunctions, the Multivariate Chain Equations Imputation Method (MICE) was employed, allowing for effective data imputation and ensuring the integrity of the analysis.

The study compared eight empirical models and the FAO-56 Penman-Monteith model for RET estimation, utilizing machine learning (ML) algorithms such as Random Forest, M5 Pruned, XGBoost, and LightGBM, along with hybrid models. The results indicated that models incorporating solar radiation significantly outperformed those based on temperature or mass transfer alone, with the combination models (VAL2013a and VAL2013b) yielding the best performance metrics (KGE and RMSE) across all stations. The findings underscore the importance of selecting appropriate meteorological variables for RET estimation, particularly in semi-arid regions, and highlight the effectiveness of hybrid modeling approaches in enhancing prediction accuracy.

Limitations

The research presents several limitations that warrant consideration. Firstly, the study’s temporal scope, covering only the years 2011 to 2017, restricts the analysis of machine learning (ML) models’ performance across varying climatic conditions. A broader timeframe would enhance understanding of these models under diverse climatic fluctuations. Additionally, the investigation is confined to subhumid and semi-arid climates, suggesting that future research should explore a wider range of climatic conditions to fully assess climatic variability.

Moreover, the effectiveness of the ML and hybrid models is contingent upon the availability and quality of meteorological data, which can be problematic in regions with limited data. Future studies could benefit from utilizing reanalysis data products, such as ERA5, ERA5-Land, and MERRA-2, which offer continuous, high-resolution data across extensive temporal and spatial scales, particularly in areas lacking in situ measurements. Furthermore, the inherent lack of physical mechanisms in ML models complicates the understanding of their operations and the creation of accurate models without knowledge of functional specifications. Issues such as overfitting and underfitting during the training and testing phases, resulting from random dataset division, may also compromise model accuracy. Implementing advanced validation techniques could enhance the reliability and generalizability of these models.