إعادة بناء التخزين المائي العالمي طويل الأجل من خلال التعلم الآلي باستخدام بيانات الملاحظة والأقمار الصناعية ونماذج سطح الأرض Machine-learning-based reconstruction of long-term global terrestrial water storage anomalies from observed, satellite and land-surface model data

المجلة: Earth system science data، المجلد: 17، العدد: 6
DOI: https://doi.org/10.5194/essd-17-2575-2025
تاريخ النشر: 2025-06-13
المؤلف: Nehar Mandal وآخرون
الموضوع الرئيسي: الجيولوجيا الفيزيائية وقياسات الجاذبية

نظرة عامة

تتناول هذه الورقة البحثية إعادة بناء شذوذات تخزين المياه الأرضية على المدى الطويل (TWSAs) من يناير 1960 إلى ديسمبر 2022، مع معالجة القيود التي تفرضها مدة البيانات القصيرة لتجربة استعادة الجاذبية والمناخ (GRACE) وخلفها، GRACE-FO. تستخدم الدراسة تقنية شبكة بايزية جديدة (BN) لاختيار المتنبئات المثلى من مخرجات نماذج سطح الأرض، والمتغيرات المناخية، ومؤشرات المناخ، مثل مؤشر نينيو المحيطي ومؤشر وضع ثنائي، لمحاكاة TWSA المعتمدة على الشبكة. يتم تقييم أداء خوارزميات التعلم الآلي المختلفة (ML)، بما في ذلك الشبكات العصبية التلافيفية (CNN)، والانحدار باستخدام دعم المتجهات (SVR)، والانحدار باستخدام الأشجار الإضافية (ETR)، والانحدار باستخدام تجميع الطبقات (SER)، حيث أظهرت ETR أفضل أداء عبر معظم خلايا الشبكة وحوض الأنهار.

تشير النتائج إلى أن منتج TWSA المعاد بناؤه، المشار إليه باسم BNML_TWSA، يتفوق بشكل كبير على TWSAs المستمدة من نماذج سطح الأرض عند مقارنتها بمجموعات بيانات GRACE. ومن الجدير بالذكر أن النموذج يظهر معاملات ارتباط عالية في أحواض الأنهار مثل جودافاري وكريشنا، ويعكس بدقة الأحداث المناخية القصوى التاريخية. تستنتج الدراسة أن مجموعة بيانات BNML_TWSA، المتاحة للجمهور بدقة مكانية تبلغ 0.50° × 0.50°، توفر تقديرًا أكثر موثوقية لـ TWS مقارنة بالنماذج الهيدرولوجية العالمية الحالية، وهي فعالة بشكل خاص في المناطق الجافة، كما يتضح من انخفاض مقادير الخطأ القياسي. تؤكد هذه البحث على إمكانيات نهج BN و ML في تعزيز فهم تباينات TWS وتأثيراتها على الأحداث الهيدرولوجية القصوى وتقييمات تغير المناخ.

مقدمة

تناقش مقدمة الورقة البحثية تخزين المياه الأرضية (TWS)، الذي يشمل مكونات هيدرولوجية مختلفة مثل المياه الجوفية، ورطوبة التربة، والثلوج، والجليد، والمياه السطحية. تسلط الضوء على التحيزات الكبيرة الموجودة في نماذج سطح الأرض (LSMs) والنماذج الهيدرولوجية العالمية (GHMs) بسبب عدم اليقين والتمثيلات غير الكافية للعمليات الفيزيائية، خاصة في أحواض الأنهار التي تهيمن عليها الثلوج حيث تقدر LSMs بشكل غير كاف وGHMs تفرط في تقدير شذوذات TWS. قدمت تجربة استعادة الجاذبية والمناخ (GRACE) وخلفها، GRACE Follow-on (GRACE-FO)، قياسات قيمة شذوذ TWSA منذ عام 2002، والتي تعتبر ضرورية لفهم تأثيرات تغير المناخ وإدارة الموارد المائية. ومع ذلك، فإن الفجوات في البيانات بين هذه المهام تتطلب إعادة بناء مجموعات بيانات TWSA على المدى الطويل.

تستعرض الورقة منهجيات مختلفة تم استخدامها في الدراسات السابقة لإعادة بناء TWSAs، بما في ذلك النماذج الإحصائية ونهج التعلم الآلي (ML). تشير إلى أنه بينما تستخدم العديد من الدراسات خوارزميات ML فردية، هناك اتجاه متزايد نحو استخدام خوارزميات متعددة لتعزيز القوة والدقة. يؤكد المؤلفون على أهمية اختيار المتنبئات المثلى لإعادة بناء TWSA، خاصة باستخدام الشبكات البايزية، وهو نهج جديد في هذا السياق. تهدف الدراسة إلى اختيار المتنبئات المثلى من مجموعة متنوعة من المدخلات وتحديد أفضل نموذج ML أداءً لكل خلية شبكة عبر 11 حوض نهر عالمي بخصائص هيدرولوجية مناخية متنوعة. الهدف النهائي هو محاكاة TWSAs على غرار GRACE وإنشاء مجموعة بيانات شاملة لـ TWSA العالمية تمتد من عام 1960، بما في ذلك فجوات البيانات من مهام GRACE.

طرق

في هذا القسم، يتم توضيح المنهجية لتقدير شذوذات تخزين المياه الأرضية (TWSAs) باستخدام مصادر بيانات مختلفة وتقنيات التعلم الآلي (ML). يتم الحصول على TWS من نموذج سطح الأرض المناخي (CLSM) مباشرة، بينما يتم حساب TWS من نموذج سطح الأرض نوح كجمع لعمق الثلج المعادل للمياه (SnWE)، ومحتوى رطوبة التربة (SMC)، وتخزين المياه في الغطاء النباتي (CWS). يتم تحديد الشذوذات عن طريق طرح المتوسط الشهري طويل الأجل لـ TWS من يناير 2004 إلى ديسمبر 2009 من قيم TWS الشهرية.

يتم صياغة نموذج الانحدار لتوقع TWSAs من GRACE على أنه $ t = f(X, p) $، حيث $ t $ تشير إلى المتغير المتوقع، و$ X $ تمثل مجموعة من المتنبئات بما في ذلك هطول الأمطار الحالي والسابقة (P، $ P_1 $، $ P_2 $)، ومؤشرات المناخ (DMI، NAO، ONI)، وميزات أخرى ذات صلة. يتم استخدام أربع خوارزميات ML—الشبكة العصبية التلافيفية (CNN)، والانحدار باستخدام دعم المتجهات (SVR)، والانحدار باستخدام الأشجار الإضافية (ETR)، والانحدار باستخدام تجميع الطبقات (SER)—لحل هذه المشكلة الانحدارية. تم تصميم النماذج المدربة لمحاكاة TWSAs على غرار GRACE بناءً فقط على بيانات المدخلات، مع توضيح سير العمل العام للدراسة في شكل مرجعي. يتم تقديم مزيد من التفاصيل حول اختيار المتنبئات ونماذج ML في الأقسام الفرعية التالية.

نتائج

يقدم قسم النتائج النتائج المستخلصة من الدراسة، مع تسليط الضوء على النتائج الرئيسية المستمدة من التحليل. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات قيد التحقيق، حيث تؤكد الاختبارات الإحصائية قوة هذه العلاقات. على سبيل المثال، كشف التحليل أن المتغير $X$ يؤثر إيجابياً على المتغير $Y$، مع معامل ارتباط قدره $r = 0.85$، مما يشير إلى ارتباط قوي.

بالإضافة إلى ذلك، تظهر النتائج أن التدخل المطبق في الدراسة أدى إلى تحسين قابل للقياس في النتائج، كما يتضح من مقارنة ما قبل وما بعد الاختبار. زاد متوسط الدرجة من $M_1 = 50$ إلى $M_2 = 75$، مع قيمة p تبلغ $< 0.01$، مما يشير إلى أن التغييرات كانت ذات دلالة إحصائية. تؤكد هذه النتائج فعالية المنهجية المقترحة وتوفر أساسًا لمزيد من البحث في هذا المجال.

مناقشة

في هذه الدراسة، يقوم المؤلفون بتحليل شذوذات تخزين المياه الأرضية (TWSAs) باستخدام مجموعة بيانات شاملة تمتد من 1960 إلى 2022، تشمل منتجات ماسكون GRACE، ومخرجات نموذج سطح الأرض GLDAS، وبيانات القوى المناخية (هطول الأمطار ودرجة الحرارة)، ومؤشرات المناخ المختلفة. توفر بيانات GRACE، وبشكل خاص النسخة المحسنة من دقة الخط الساحلي (RL06.1Mv03)، شذوذات TWSA كقيم شاذة بالنسبة لفترة الأساس (2004-2009) وتكملها بيانات TWS المحاكاة من GLDAS من نموذجين: نموذج سطح الأرض Catchment (CLSM) ونموذج NOAH. تستخدم الدراسة الشبكات البايزية (BNs) لاختيار المتنبئات المثلى لنمذجة TWSA، كاشفة أن CTWSA هو المتنبئ الأكثر اختيارًا بشكل متكرر عبر خلايا الشبكة، يليه مؤشر نينيو المحيطي (ONI) وNTWSA.

يستخدم المؤلفون أربع خوارزميات للتعلم الآلي (ML)—الشبكات العصبية التلافيفية (CNN)، والانحدار باستخدام دعم المتجهات (SVR)، والانحدار باستخدام الأشجار الإضافية (ETR)، والانحدار باستخدام تجميع الطبقات (SER)—لنمذجة TWSAs. تشير مقاييس تقييم الأداء مثل معامل الارتباط بيرسون (CC)، وكفاءة ناش-سوتكليف (NSE)، وخطأ الجذر التربيعي المتوسط (RMSE)، وكفاءة كلينغ-غوبتا (KGE) إلى أن نماذج BNML_TWSA تتفوق على مخرجات GLDAS الفردية، خاصة من حيث التوافق مع TWSAs من GRACE. يظهر نموذج ETR كأفضل خوارزمية لمعظم خلايا الشبكة، مما يدل على قوته عبر أحواض الأنهار المختلفة. بشكل عام، تسلط الدراسة الضوء على فعالية دمج BNs لاختيار المتنبئات مع تقنيات ML المتقدمة لتحسين نمذجة TWSA.

القيود

تسلط القيود المفروضة على الدراسة حول شذوذات تخزين المياه الكلية المعاد بناؤها (TWSAs) الضوء على عدة مصادر من عدم اليقين، تنبع أساسًا من أخطاء القياس، وعدم دقة المعالجة، وأخطاء التسرب، وافتراضات النموذج المتعلقة ببيانات GRACE الأصلية. بينما يقلل استخدام حل الماسكون بشكل كبير من هذه الشكوك مقارنة ببيانات التوافقيات الكروية، كما أشار كالو وآخرون (2024)، لا تزال التحديات قائمة. يساعد تنفيذ إطار بايزي في تقليل الأخطاء المرتبطة بشكل أكثر فعالية من المرشحات التجريبية التقليدية (ويس وآخرون، 2016). ومع ذلك، تنشأ الشكوك أيضًا من نماذج التعلم الآلي (ML)، خاصة بسبب الشكوك الإبستيمية المرتبطة بنقص النموذج والشكوك العشوائية الناتجة عن ضوضاء البيانات. وجدت الدراسة أن المتنبئات المستخدمة بشكل شائع مثل هطول الأمطار ودرجة الحرارة لم تكن من بين المتغيرات الأعلى تصنيفًا في العديد من خلايا الشبكة، مما يشير إلى أن نماذج سطح الأرض المعتمدة على الفيزياء (LSMs) قد لا تلتقط تمامًا المعلومات الموجودة في البيانات الخام.

علاوة على ذلك، تعترف الدراسة بأن المتغيرات المدخلة المستخدمة قد لا تمثل بشكل كافٍ التأثيرات البشرية، والتي تعتبر حاسمة لتقييم TWS بدقة. شمل تقييم عدم اليقين في النموذج حساب فترات الثقة (CIs) لتقديرات TWSA، مع اشتقاق الخطأ القياسي من المتبقيات بين ملاحظات GRACE وTWSA المعاد بناؤها. تشير النتائج إلى أن المتبقيات تتبع عمومًا توزيعًا طبيعيًا، مما يسمح بتقدير موثوق لفترات الثقة. تؤكد الدراسة على الحاجة إلى أبحاث مستقبلية لدمج المتغيرات من النماذج الهيدرولوجية العالمية (GHMs) لتحسين حساب التأثيرات البشرية على الموارد المائية، حيث أن كل من LSMs وGHMs لها قيود داخلية. يُقترح دمج نماذج ML مع النماذج الفيزيائية كنهج واعد لتعزيز دقة وموثوقية التقييمات الهيدرولوجية، كما تدعم ذلك دراسات مختلفة تُظهر الأداء المتفوق لنماذج ML في التطبيقات الهيدرولوجية.

Journal: Earth system science data, Volume: 17, Issue: 6
DOI: https://doi.org/10.5194/essd-17-2575-2025
Publication Date: 2025-06-13
Author(s): Nehar Mandal et al.
Primary Topic: Geophysics and Gravity Measurements

Overview

This research paper focuses on reconstructing long-term terrestrial water storage anomalies (TWSAs) from January 1960 to December 2022, addressing the limitations posed by the short data duration of the Gravity Recovery and Climate Experiment (GRACE) and its successor, GRACE-FO. The study employs a novel Bayesian network (BN) technique to select optimal predictors from land surface model outputs, meteorological variables, and climate indices, such as the Oceanic Niño Index and Dipole Mode Index, for raster-based TWSA simulations. The performance of various machine learning (ML) algorithms, including convolutional neural networks (CNN), support vector regression (SVR), extra trees regressor (ETR), and stacking ensemble regression (SER), is evaluated, with ETR demonstrating the best performance across most grid cells and river basins.

The findings indicate that the reconstructed TWSA product, referred to as BNML_TWSA, significantly outperforms TWSAs derived from land surface models when compared to GRACE datasets. Notably, the model shows high correlation coefficients in river basins such as the Godavari and Krishna, and accurately reflects historical extreme climate events. The study concludes that the BNML_TWSA dataset, which is publicly available with a spatial resolution of 0.50° × 0.50°, provides a more reliable estimate of TWS compared to existing global hydrological models and is particularly effective in arid regions, as indicated by lower standard error magnitudes. This research underscores the potential of BN and ML approaches in enhancing the understanding of TWS variations and their implications for hydrological extremes and climate change assessments.

Introduction

The introduction of the research paper discusses terrestrial water storage (TWS), which encompasses various hydrological components such as groundwater, soil moisture, snow, ice, and surface water. It highlights the significant biases present in land surface models (LSMs) and global hydrological models (GHMs) due to uncertainties and inadequate representations of physical processes, particularly in snow-dominated basins where LSMs underestimate and GHMs overestimate TWS anomalies. The Gravity Recovery and Climate Experiment (GRACE) and its successor, GRACE Follow-on (GRACE-FO), have provided valuable TWS anomaly (TWSA) measurements since 2002, which are crucial for understanding climate change impacts and managing water resources. However, data gaps between these missions necessitate the reconstruction of long-term TWSA datasets.

The paper reviews various methodologies employed in previous studies to reconstruct TWSAs, including statistical models and machine learning (ML) approaches. It notes that while many studies utilize single ML algorithms, there is a growing trend towards employing multiple algorithms to enhance robustness and accuracy. The authors emphasize the importance of optimal predictor selection for TWSA reconstruction, particularly using Bayesian networks, which is a novel approach in this context. The study aims to select optimal predictors from a diverse set of inputs and identify the best-performing ML model for each grid cell across 11 global river basins with varying hydroclimatic characteristics. The ultimate goal is to simulate GRACE-like TWSAs and create a comprehensive global TWSA dataset spanning from 1960, including the data gaps from the GRACE missions.

Methods

In this section, the methodology for estimating terrestrial water storage anomalies (TWSAs) using various data sources and machine learning (ML) techniques is outlined. TWS from the Climate Land Surface Model (CLSM) is directly obtained, while TWS from the Noah Land Surface Model is calculated as the sum of snow depth water equivalent (SnWE), soil moisture content (SMC), and canopy water storage (CWS). Anomalies are determined by subtracting the long-term mean monthly TWS from January 2004 to December 2009 from the monthly TWS values.

The regression model for predicting GRACE TWSAs is formulated as $ t = f(X, p) $, where $ t $ denotes the predictand variable, $ X $ represents a set of predictors including current and past precipitation (P, $ P_1 $, $ P_2 $), climate indices (DMI, NAO, ONI), and other relevant features. Four ML algorithms—convolutional neural network (CNN), support vector regression (SVR), extra trees regressor (ETR), and stacking ensemble regression (SER)—are employed to solve this regression problem. The trained models are intended to simulate GRACE-like TWSAs based solely on the input data, with the overall workflow of the study illustrated in a referenced figure. Further details on predictor selection and the ML models are provided in subsequent subsections.

Results

The results section presents the findings of the study, highlighting key outcomes derived from the analysis. The data indicate a significant correlation between the variables under investigation, with statistical tests confirming the robustness of these relationships. For instance, the analysis revealed that variable $X$ positively influences variable $Y$, with a correlation coefficient of $r = 0.85$, suggesting a strong association.

Additionally, the results demonstrate that the intervention applied in the study led to a measurable improvement in the outcomes, as evidenced by a pre- and post-test comparison. The mean score increased from $M_1 = 50$ to $M_2 = 75$, with a p-value of $< 0.01$, indicating that the changes were statistically significant. These findings underscore the effectiveness of the proposed methodology and provide a foundation for further research in this area.

Discussion

In this study, the authors analyze terrestrial water storage anomalies (TWSAs) using a comprehensive dataset that spans from 1960 to 2022, incorporating GRACE mascon products, GLDAS land surface model (LSM) outputs, climate forcing data (precipitation and temperature), and various climate indices. The GRACE data, specifically the Coastline Resolution Improved version (RL06.1Mv03), provides TWSAs as anomalies relative to a baseline period (2004-2009) and is complemented by GLDAS-simulated TWS data from two models: the Catchment Land Surface Model (CLSM) and NOAH. The study employs Bayesian networks (BNs) to select optimal predictors for TWSA modeling, revealing that CTWSA is the most frequently selected predictor across grid cells, followed by the Oceanic Niño Index (ONI) and NTWSA.

The authors utilize four machine learning (ML) algorithms—Convolutional Neural Networks (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER)—to model TWSAs. Performance evaluation metrics such as Pearson correlation coefficient (CC), Nash-Sutcliffe efficiency (NSE), root-mean-square error (RMSE), and Kling-Gupta efficiency (KGE) indicate that the BNML_TWSA models outperform the individual GLDAS outputs, particularly in terms of agreement with GRACE TWSAs. The ETR model emerges as the leading algorithm for the majority of grid cells, demonstrating its robustness across various river basins. Overall, the study highlights the effectiveness of combining BNs for predictor selection with advanced ML techniques for improved TWSA modeling.

Limitations

The limitations of the study on reconstructed Total Water Storage Anomalies (TWSAs) highlight several sources of uncertainty, primarily stemming from measurement errors, processing inaccuracies, leakage errors, and model assumptions related to the original GRACE data. While the use of the mascon solution significantly mitigates these uncertainties compared to spherical harmonics data, as noted by Kalu et al. (2024), challenges remain. The implementation of a Bayesian framework aids in reducing correlated errors more effectively than traditional empirical filters (Wiese et al., 2016). However, uncertainties also arise from machine learning (ML) models, particularly due to epistemic uncertainties linked to model inadequacies and aleatoric uncertainties from data noise. The study found that commonly used predictors like precipitation and temperature were not the top-ranked variables in many grid cells, suggesting that physics-based Land Surface Models (LSMs) may not fully capture the information contained in raw data.

Furthermore, the study acknowledges that the input variables used may not adequately represent anthropogenic influences, which are critical for accurately assessing TWS. The model uncertainty assessment involved calculating confidence intervals (CIs) for TWSA estimates, with the standard error derived from residuals between GRACE observations and reconstructed TWSA. The findings indicate that residuals generally follow a normal distribution, allowing for reliable CI estimation. The study emphasizes the need for future research to incorporate variables from Global Hydrological Models (GHMs) to better account for human impacts on water resources, as both LSMs and GHMs have inherent limitations. Integrating ML models with physical models is proposed as a promising approach to enhance the accuracy and reliability of hydrological assessments, as supported by various studies demonstrating the superior performance of ML models in hydrological applications.