توقع موثوق لجودة المياه وتحليل بارامتر باستخدام نماذج الذكاء الاصطناعي القابلة للتفسير Reliable water quality prediction and parametric analysis using explainable AI models

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-56775-y
PMID: https://pubmed.ncbi.nlm.nih.gov/38553492
تاريخ النشر: 2024-03-29
المؤلف: M. K. Nallakaruppan وآخرون
الموضوع الرئيسي: التنبؤ الهيدرولوجي باستخدام الذكاء الاصطناعي

نظرة عامة

تناقش ورقة البحث القضية الحرجة لإدارة جودة المياه، مع التأكيد على أهمية مراقبة والتحكم في نقاء المياه بسبب الآثار الصحية والبيئية السلبية للتلوث. تم تحديد المواد الصلبة الذائبة الكلية (TDS) كأهم الملوثات، إلى جانب مجموعة متنوعة من المواد الضارة مثل الرصاص والزرنيخ والنترات. تقترح الدراسة نظام تقدير جودة المياه الآلي باستخدام الذكاء الاصطناعي (AI)، وبشكل خاص الذكاء الاصطناعي القابل للتفسير (XAI)، لتوضيح المعلمات المهمة التي تؤثر على صلاحية المياه للشرب. تم استخدام نماذج تعلم الآلة المختلفة، بما في ذلك الانحدار اللوجستي، آلة الدعم الناقل (SVM)، بايز الغاوسي البسيط، شجرة القرار (DT)، وغابة عشوائية (RF)، حيث أظهر نموذج RF أداءً متفوقًا، محققًا دقة قدرها 0.9999، ودقة 0.9997، واسترجاع 0.998.

تسلط النتائج الضوء على أهمية XAI في توفير الشفافية والتبرير لتصنيف جودة المياه، وخاصة من خلال الشرح SHAPELY، الذي يقدم رؤى حول أهمية الميزات ودرجات التنبؤ. يحدد النموذج “المواد الصلبة” كأهم معلمة تؤثر على صلاحية المياه للشرب. تختتم الدراسة بأن النهج المقترح لا يعمل فقط على تحسين تقدير جودة المياه ولكن أيضًا يضع الأساس للبحوث المستقبلية التي تشمل مجموعات بيانات أكثر تعقيدًا وإمكانية دمج خوارزميات التعلم العميق. يمكن أن يعزز هذا التقدم تحليل الرواسب الصلبة في المياه، مما يساهم في تحسين استراتيجيات إدارة جودة المياه.

طرق

تؤكد الدراسة على الأهمية الحرجة لضمان الوصول إلى مياه شرب آمنة كجانب أساسي من حماية الصحة، بغض النظر عن الفوارق الاجتماعية والاقتصادية. أظهرت الدراسات السابقة أن الاستثمارات في المياه النظيفة والصرف الصحي لا تعزز الصحة العامة فحسب، بل تحقق أيضًا فوائد اقتصادية للدول. تعتبر المؤشرات الرئيسية لجودة المياه، التي تشمل الخصائص الفيزيائية والكيميائية والبيولوجية، ضرورية لتحديد صلاحية المياه – الفئة المستهدفة المعتمدة. تشمل الميزات المستقلة التي تؤثر على الصلاحية قيمة pH، الصلابة، المواد الصلبة الذائبة الكلية (TDS)، الكلورامينات، الكبريتات، الموصلية، الكربون العضوي، ثلاثي الهالوميثان، والعكارة. يتم تفصيل المعلمات وحدودها المقابلة من منظمة الصحة العالمية (WHO) في الجدول 4، بينما يقدم الجدول 5 أوصافًا لهذه الميزات.

تستخدم الدراسة إطار عمل الذكاء الاصطناعي القابل للتفسير (XAI) لتعزيز الشفافية وقابلية تفسير النتائج الناتجة عن خوارزميات تعلم الآلة في تقييم جودة المياه. يهدف هذا النهج إلى تسهيل اتخاذ قرارات دقيقة، مما يعزز الثقة ويحسن فهم سلوك النموذج في تقييم سلامة المياه.

نتائج

في هذه الدراسة، تم تقييم جودة المياه بناءً على تسعة معلمات، بما في ذلك pH، الصلابة، الكبريتات، الكلورامينات، ثلاثي الهالوميثان، الموصلية، الكربون العضوي، والعكارة، مع الإشارة إلى الفئة المستهدفة الثنائية التي تشير إلى الصلاحية (0 لغير الصالح و1 للصالح). أظهر مجموعة البيانات قيمًا مفقودة كبيرة، خاصة بالنسبة للكبريتات، والتي تم التعامل معها من خلال الاستيفاء. كشفت تحليل الارتباط أن الصلابة (0.34) والكلورامينات (0.24) كان لهما أعلى ارتباطات مع الصلاحية، تليها ثلاثي الهالوميثان (0.21) والعكارة (0.16). تم تطبيق نماذج تعلم الآلة المختلفة، بما في ذلك آلة الدعم الناقل (SVM)، الانحدار اللوجستي (LR)، شجرة القرار (DT)، الغابة العشوائية (RF)، وبايز الغاوسي البسيط، حيث أظهر RF أداءً متفوقًا وتم اختياره لمزيد من تحليل القابلية للتفسير باستخدام تقنيات الذكاء الاصطناعي القابل للتفسير (XAI).

استخدم تنفيذ XAI قيم SHAP لتقييم أهمية الميزات في التنبؤ بالصلاحية. أوضح الرسم البياني للقوة درجات التنبؤ، مسلطًا الضوء على الدور الحاسم للكبريتات عند حدود التقاطع. أشار الرسم البياني الملخص إلى أن المواد الصلبة وpH والكبريتات والصلابة كانت مؤشرات مهمة، بينما كشف الرسم البياني للاعتماد عن علاقة دقيقة بين الكبريتات والصلاحية، خاصة في النطاق المتوسط للقيم. أخيرًا، قدمت الرسوم البيانية للقرارات رؤى محلية حول كيفية تأثير قيم الميزات المحددة على تصنيف الصلاحية، مع تصورات مميزة لكل من النتائج الصالحة (1) وغير الصالحة (0).

مناقشة

يستفيد النموذج المقترح من تقنيات الذكاء الاصطناعي القابل للتفسير (XAI)، وبشكل خاص LIME وSHAP، لتعزيز قابلية تفسير توقعات تعلم الآلة في تقييم جودة المياه. من خلال تحديد الميزات المدخلة الرئيسية، يسهل النموذج اتخاذ قرارات مستنيرة في عمليات إدارة المياه. يؤكد المؤلفون على أهمية الشفافية في التوقعات، والتي يتم تحقيقها من خلال تحليل شامل لأهمية الميزات والاعتمادات والأوزان، مما يعزز تصنيف مجموعات بيانات جودة المياه.

تساهم الورقة بشكل كبير في هذا المجال من خلال تقديم وصف تفصيلي لمشكلة التصنيف، مع دمج معالجة البيانات بشكل موسع، ومعالجة استيفاء البيانات المفقودة لتحسين الدقة. يستخدم الإطار تفسيرات قائمة على النموذج وأخرى غير قائمة على النموذج، مما يضمن فهمًا قويًا للعوامل الأساسية التي تؤثر على جودة المياه. بالإضافة إلى ذلك، تستعرض الورقة الأعمال ذات الصلة، مسلطة الضوء على منهجيات وتحديات مختلفة في إدارة جودة المياه، مما يضع النموذج المقترح في سياق أوسع من الأبحاث الحالية. بشكل عام، تؤكد النتائج على الدور الحاسم لـ XAI في تعزيز الشفافية وفعالية تقييمات جودة المياه، مما يدعم في النهاية جهود الاستدامة البيئية.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-56775-y
PMID: https://pubmed.ncbi.nlm.nih.gov/38553492
Publication Date: 2024-03-29
Author(s): M. K. Nallakaruppan et al.
Primary Topic: Hydrological Forecasting Using AI

Overview

The research paper addresses the critical issue of water quality management, emphasizing the importance of monitoring and controlling water purity due to the adverse health and environmental impacts of contamination. Total Dissolved Solids (TDS) are identified as the primary contaminants, alongside various harmful substances such as lead, arsenic, and nitrates. The study proposes an automated water quality estimation system utilizing Artificial Intelligence (AI), specifically Explainable Artificial Intelligence (XAI), to elucidate the significant parameters affecting water potability. Various machine learning models, including Logistic Regression, Support Vector Machine (SVM), Gaussian Naive Bayes, Decision Tree (DT), and Random Forest (RF), are employed, with the RF model demonstrating superior performance, achieving an accuracy of 0.9999, Precision of 0.9997, and Recall of 0.998.

The findings highlight the relevance of XAI in providing transparency and justification for the classification of water quality, particularly through the SHAPELY explainer, which offers insights into feature importance and prediction scores. The model identifies “solid” as the most significant parameter influencing water potability. The study concludes that the proposed approach not only optimizes water quality estimation but also sets the groundwork for future research involving more complex datasets and the potential integration of deep learning algorithms. This advancement could enhance the analysis of solid sediments in water, ultimately contributing to improved water quality management strategies.

Methods

The research emphasizes the critical importance of ensuring access to safe drinking water as a fundamental aspect of health protection, irrespective of social and economic disparities. Previous studies have demonstrated that investments in clean water and sanitation not only enhance public health but also yield economic benefits for nations. Key indicators of water quality, which include physical, chemical, and biological characteristics, are essential for determining potability—the dependent target class. Independent features influencing potability encompass pH value, hardness, Total Dissolved Solids (TDS), chloramines, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity. The parameters and their corresponding World Health Organization (WHO) limits are detailed in Table 4, while Table 5 provides descriptions of these features.

The study employs an Explainable Artificial Intelligence (XAI) framework to enhance the transparency and interpretability of outcomes generated by machine learning algorithms in water quality assessment. This approach aims to facilitate accurate decision-making, thereby fostering trust and improving the understanding of model behavior in evaluating water safety.

Results

In this study, water quality was evaluated based on nine parameters, including pH, hardness, sulphate, chloramines, trihalomethanes, conductivity, organic carbon, and turbidity, with the binary target class indicating potability (0 for non-potable and 1 for potable). The dataset exhibited significant missing values, particularly for sulphate, which were addressed through imputation. Correlation analysis revealed that hardness (0.34) and chloramines (0.24) had the highest correlations with potability, followed by trihalomethanes (0.21) and turbidity (0.16). Various machine learning models, including Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Gaussian Naive Bayes, were applied, with RF demonstrating superior performance and selected for further interpretability analysis using Explainable Artificial Intelligence (XAI) techniques.

The XAI implementation utilized SHAP values to assess feature significance in predicting potability. The force plot illustrated the prediction scores, highlighting sulphate’s critical role at the intersection boundary. The summary plot indicated that solids, pH, sulphate, and hardness were significant predictors, while the dependency plot revealed a nuanced relationship between sulphate and potability, particularly in the mid-range of values. Finally, decision plots provided localized insights into how specific feature values influenced the classification of potability, with distinct visualizations for both potable (1) and non-potable (0) outcomes.

Discussion

The proposed model leverages Explainable AI (XAI) techniques, specifically LIME and SHAP, to enhance the interpretability of machine learning predictions in water quality assessment. By identifying key input features, the model facilitates informed decision-making in water management processes. The authors emphasize the importance of transparency in predictions, which is achieved through a comprehensive analysis of feature significance, dependencies, and weights, thereby optimizing the classification of water quality datasets.

The paper contributes significantly to the field by providing a detailed white-box description of the classification problem, incorporating extensive data preprocessing, and addressing missing data imputation to improve accuracy. The framework employs both model-based and model-agnostic interpretations, ensuring a robust understanding of the underlying factors influencing water quality. Additionally, the paper reviews related works, highlighting various methodologies and challenges in water quality management, thus situating the proposed model within the broader context of existing research. Overall, the findings underscore the critical role of XAI in enhancing the transparency and effectiveness of water quality assessments, ultimately supporting environmental sustainability efforts.