توقعات السلاسل الزمنية في المواقع غير المراقبة: مسح لتقنيات التعلم الآلي في موارد المياه Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

المجلة: Environmental Data Science، المجلد: 4
DOI: https://doi.org/10.1017/eds.2024.14
تاريخ النشر: 2025-01-01
المؤلف: Jared Willard وآخرون
الموضوع الرئيسي: التنبؤ الهيدرولوجي باستخدام الذكاء الاصطناعي

نظرة عامة

يتناول القسم التحديات والتقدم في توقع المتغيرات البيئية الديناميكية، وخاصة في المواقع غير المراقبة، وهو أمر حاسم لإدارة موارد المياه بشكل فعال. إن عدم كفاية مراقبة المتغيرات الهيدرولوجية الحرجة، الذي تفاقم بسبب تغيرات المناخ واستخدام الأراضي، يتطلب تحسين طرق التنبؤ. يبرز الاستعراض تفوق تقنيات التعلم الآلي الحديثة (ML) على النماذج التقليدية القائمة على العمليات والنماذج التجريبية في توقع السلاسل الزمنية الهيدرولوجية، مشددًا على قدرتها على الاستفادة من مجموعات بيانات كبيرة ومتنوعة. كما يحدد التركيز السائد على أطر التعلم العميق للتنبؤات اليومية في الولايات المتحدة، مع الإشارة إلى نقص المقارنات الشاملة بين طرق ML المختلفة.

تؤكد بيان الأثر على الحاجة إلى مقارنات مفصلة بين تقنيات ML المختلفة للتنبؤات الهيدرولوجية، حيث تظل أفضل الممارسات الحالية غير واضحة. يجمع الاستعراض بين منهجيات ML المتطورة، موضحًا نقاط قوتها وقيودها لتوجيه الباحثين ومديري موارد المياه في اختيار الأطر المناسبة. كما يحدد أسئلة مفتوحة حاسمة للبحث المستقبلي، مثل دمج المدخلات الديناميكية، والفهم الميكانيكي، وتقنيات الذكاء الاصطناعي القابلة للتفسير. تعيد الخاتمة التأكيد على أهمية ML في معالجة الاحتياجات العاجلة للمراقبة التي تفرضها تغيرات المناخ والتحضر، داعيةً إلى التعاون بين التخصصات لتعزيز كل من الأداء التنبؤي وفهم علوم المجال. يهدف الاستطلاع إلى تزويد الباحثين برؤى معاصرة حول تطبيقات ML للتنبؤات غير المراقبة وتعزيز التعاون بين ممارسي ML وخبراء المجال.

مقدمة

تسلط المقدمة الضوء على الفجوة الحرجة في توفر البيانات البيئية لإدارة موارد المياه، وخاصة في سياق توقعات تدفق المياه وجودة المياه في الأحواض غير المراقبة. على الرغم من التقدم في شبكات الاستشعار وتقنيات الاستشعار عن بعد، لا تزال التغطية غير كافية، حيث تفتقر أجزاء كبيرة من الولايات المتحدة ومناطق أخرى إلى مراقبة كافية. يؤكد البحث على أهمية النماذج التنبؤية التي يمكن أن تستفيد من البيانات المتاحة لسد هذه الفجوات، خاصة في ضوء تزايد تقلب المناخ وتأثيرات البشر على موارد المياه. تم استخدام نماذج مختلفة، بما في ذلك النماذج القائمة على العمليات، والتعلم الآلي (ML)، والنماذج الإحصائية، لمعالجة هذه التحديات، مع تحول ملحوظ نحو طرق ML بسبب مرونتها وأدائها المتفوق في التطبيقات واسعة النطاق.

يحدد المؤلفون سيناريو “غير المراقب” على أنه الحالات التي تكون فيها بيانات المراقبة إما غائبة أو غير كافية، مما يتطلب نماذج يمكن أن تستخرج التنبؤات مكانيًا. واجهت النماذج التقليدية القائمة على العمليات تحديات في المعايرة والإقليمية بسبب العلاقات المعقدة بين المعلمات ومشكلة التساوي في النتائج. في المقابل، يمكن لنماذج ML أن تقوم بعملية الإقليمية بشكل ضمني دون الاعتماد على المعرفة الهيدرولوجية المحددة مسبقًا، مما يوفر بديلاً واعدًا. كما توضح المقدمة تركيز الورقة على مراجعة تقنيات ML لتوقعات السلاسل الزمنية في المواقع غير المراقبة، مع استبعاد مجالات أخرى مثل تطبيقات الاستشعار عن بعد. ستستكشف الأقسام التالية أطر ML المختلفة، وتلخص الموضوعات العامة، وتحدد الفجوات في البحث الحالي، بهدف تعزيز مجال الهيدرولوجيا وإدارة موارد المياه من خلال أساليب النمذجة المبتكرة.

نقاش

في هذا القسم، يناقش المؤلفون مختلف منهجيات التعلم الآلي (ML) المطبقة لتوقع متغيرات موارد المياه في المواقع غير المراقبة، مشددين على أهمية الاستفادة من البيانات من المواقع المراقبة. تبدأ العملية عادةً بتطوير نماذج تنبؤية باستخدام الكيانات المراقبة (مثل مقاييس التدفق) ثم تطبيق هذه النماذج على المواقع غير المراقبة. يركز النقاش بشكل رئيسي على النماذج المدركة للكيانات، التي تدمج الخصائص المحددة للموقع – المشار إليها بالسمات أو الصفات – لتعزيز دقة التنبؤ. يبرز المؤلفون استراتيجيات النمذجة المختلفة، بما في ذلك النماذج واسعة النطاق التي تستخدم جميع الكيانات المتاحة أو مجموعات فرعية مختارة، ودمج ML مع المعرفة الميدانية والنماذج القائمة على العمليات.

يتناول النقاش فعالية النماذج واسعة النطاق، التي تهدف إلى التقاط تباين خصائص المواقع لتحسين التنبؤات. بينما تدعو بعض الدراسات إلى استخدام جميع البيانات المتاحة، يقترح آخرون أن اختيار مجموعات فرعية محددة قد يؤدي إلى أداء أفضل، خاصة في حالات عدم توازن البيانات أو الضوضاء. يستكشف المؤلفون أيضًا تقنيات متقدمة مثل الدمج المباشر لخصائص الموقع مع المدخلات الديناميكية، وترميز الخصائص من خلال الشبكات العصبية، واستخدام الشبكات العصبية البيانية (GNNs) لنمذجة الاعتماد المتبادل بين الكيانات. تُلاحظ GNNs، بشكل خاص، لقدرتها على التقاط العلاقات المعقدة في العمليات الهيدرولوجية، مما يسهل تحسين التنبؤات في السيناريوهات التي قد تفشل فيها النماذج التقليدية. بشكل عام، يؤكد المؤلفون على الحاجة إلى مزيد من البحث لتحسين هذه المنهجيات لتوقع الأحداث القصوى وتقييم الأداء المقارن لمختلف أساليب النمذجة.

Journal: Environmental Data Science, Volume: 4
DOI: https://doi.org/10.1017/eds.2024.14
Publication Date: 2025-01-01
Author(s): Jared Willard et al.
Primary Topic: Hydrological Forecasting Using AI

Overview

The section discusses the challenges and advancements in predicting dynamic environmental variables, particularly in unmonitored sites, which is crucial for effective water resources management. The inadequacy of monitoring critical hydrological variables, exacerbated by climate and land use changes, necessitates improved predictive methods. The review highlights the superiority of modern machine learning (ML) techniques over traditional process-based and empirical models in hydrological time series prediction, emphasizing their capability to leverage large and diverse datasets. It identifies a predominant focus on deep learning frameworks for daily predictions in the United States, while noting a lack of comprehensive comparisons among various ML methods.

The impact statement underscores the need for detailed comparisons of different ML techniques for hydrological predictions, as existing best practices remain unclear. The review consolidates state-of-the-art ML methodologies, elucidating their strengths and limitations to guide researchers and water resource managers in selecting appropriate frameworks. It also outlines critical open questions for future research, such as the integration of dynamic inputs, mechanistic understanding, and explainable AI techniques. The conclusion reiterates the importance of ML in addressing the urgent monitoring needs posed by climate change and urbanization, advocating for interdisciplinary collaboration to enhance both predictive performance and domain science understanding. The survey aims to equip researchers with contemporary insights into ML applications for unmonitored predictions and foster synergy between ML practitioners and domain experts.

Introduction

The introduction highlights the critical gap in environmental data availability for water resource management, particularly in the context of streamflow and water quality predictions in unmonitored basins. Despite advancements in sensor networks and remote sensing technologies, the coverage remains insufficient, with significant portions of the United States and other regions lacking adequate monitoring. The paper emphasizes the importance of predictive models that can utilize existing data to fill these gaps, particularly in light of increasing climate variability and human impacts on water resources. Various modeling approaches, including process-based, machine learning (ML), and statistical models, have been employed to address these challenges, with a notable shift towards ML methods due to their flexibility and superior performance in large-scale applications.

The authors define the ‘unmonitored’ scenario as cases where monitoring data is either absent or insufficiently sparse, necessitating models that can extrapolate predictions spatially. Traditional process-based models have faced challenges in calibration and regionalization due to complex parameter relationships and the equifinality problem. In contrast, ML models can implicitly perform regionalization without relying on predefined hydrological knowledge, thus offering a promising alternative. The introduction also outlines the paper’s focus on reviewing ML techniques for time series predictions in unmonitored sites, while excluding other areas such as remote sensing applications. The subsequent sections will explore various ML frameworks, summarize overarching themes, and identify gaps in current research, ultimately aiming to advance the field of hydrology and water resource management through innovative modeling approaches.

Discussion

In this section, the authors discuss various machine learning (ML) methodologies applied to predict water resource variables at unmonitored sites, emphasizing the importance of leveraging data from monitored sites. The process typically begins with developing predictive models using monitored entities (e.g., stream gauges) and then applying these models to unmonitored sites. A key focus is on entity-aware models, which integrate site-specific characteristics—referred to as attributes or traits—to enhance prediction accuracy. The authors highlight different modeling strategies, including broad-scale models that utilize all available entities or selected subgroups, and the integration of ML with domain knowledge and process-based models.

The discussion elaborates on the effectiveness of broad-scale models, which aim to capture the heterogeneity of site characteristics to improve predictions. While some studies advocate for using all available data, others suggest that selecting specific subgroups may yield better performance, particularly in cases of data imbalance or noise. The authors also explore advanced techniques such as direct concatenation of site characteristics with dynamic inputs, encoding characteristics through neural networks, and employing graph neural networks (GNNs) to model interdependencies between entities. GNNs, in particular, are noted for their potential to capture complex relationships in hydrological processes, facilitating improved predictions in scenarios where traditional models may fall short. Overall, the authors emphasize the need for further research to optimize these methodologies for predicting extremes and to evaluate the comparative performance of different modeling approaches.