توقعات التعلم الآلي لامتصاص النيتروجين الأموني على الفحم الحيوي مع تقييم النموذج وتحسينه Machine learning prediction of ammonia nitrogen adsorption on biochar with model evaluation and optimization

المجلة: npj Clean Water، المجلد: 8، العدد: 1
DOI: https://doi.org/10.1038/s41545-024-00429-z
تاريخ النشر: 2025-02-22
المؤلف: Chong Liu وآخرون
الموضوع الرئيسي: التقنيات التي تحافظ على الخصوصية في البيانات

نظرة عامة

تتناول هذه الدراسة القضية المتزايدة لتلوث النيتروجين في البيئات المائية من خلال استخدام نهج شامل لتعلم الآلة (ML) للتنبؤ بسعة امتصاص النيتروجين الأموني من البيوكربون وتحديد ظروف الامتصاص المثلى. تم تقييم ما مجموعه اثني عشر نموذجًا من نماذج تعلم الآلة، بما في ذلك النماذج القائمة على الأشجار، والأساليب القائمة على النواة، وتقنيات التعلم العميق، من خلال تحسين بايزي والتحقق المتقاطع. تشير النتائج إلى أن نماذج التجميع القائمة على الأشجار، وخاصة CatBoost، تفوقت على الآخرين، حيث حققت معامل تحديد ($R^2 = 0.9329$) وخطأ الجذر التربيعي المتوسط (RMSE) قدره 0.5378، مما يدل على أداء تنبؤي قوي وقدرات تعميم عالية.

كشفت التحليلات أن الظروف التجريبية (67.2%) والخصائص الكيميائية للبيوكربون (18.2%) كانت العوامل الأكثر أهمية التي تؤثر على سعة الامتصاص. تم تحديد الظروف المثلى لتعزيز الامتصاص على أنها تركيز أولي ($C_0 > 50 \, \text{mg/L}$) ونطاق pH من 6-9. بالإضافة إلى ذلك، طورت الدراسة واجهة مستخدم رسومية (GUI) تعتمد على بايثون تتضمن نموذج CatBoost، مما يسهل التطبيقات العملية في تصميم أنظمة امتصاص البيوكربون الفعالة. من خلال دمج تقنيات تعلم الآلة المتقدمة مع أدوات التفسير، تعزز هذه الأبحاث فهم إمكانيات امتصاص الأمونيوم من البيوكربون وتدعم تطوير استراتيجيات مستدامة للتخفيف من تلوث النيتروجين.

الطرق

توضح قسم “الطرق” في ورقة البحث الإجراءات التجريبية والتحليلية المستخدمة للتحقيق في أسئلة البحث. يتناول اختيار المشاركين، وتصميم الدراسة، والتقنيات المحددة المستخدمة لجمع البيانات وتحليلها. تشمل المنهجية كلاً من الأساليب النوعية والكمية، مما يضمن فحصًا شاملاً للظواهر قيد الدراسة.

تم إجراء التحليلات الإحصائية باستخدام برامج مناسبة، مع تحديد مستويات الدلالة عند p < 0.05. يصف القسم أيضًا النماذج الرياضية المطبقة لتفسير البيانات، بما في ذلك أي معادلات أو خوارزميات ذات صلة. بشكل عام، تم تصميم الطرق المستخدمة لضمان موثوقية وصدق النتائج، مما يسهل استكشافًا قويًا لفرضيات البحث.

النتائج

يقدم قسم النتائج نتائج الدراسة، مع تسليط الضوء على النتائج الرئيسية المستمدة من الأساليب التجريبية أو التحليلية المستخدمة. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات قيد التحقيق، مما يشير إلى أن التغيرات في متغير واحد تؤثر مباشرة على الآخر. تدعم التحليلات الإحصائية، بما في ذلك قيم p وفترات الثقة، قوة هذه النتائج، مما يظهر مستوى عالٍ من الأهمية.

بالإضافة إلى ذلك، توضح المناقشة تداعيات هذه النتائج، مما يضعها في سياق الأدبيات الموجودة. تساهم النتائج في فهم أعمق للظاهرة المدروسة، مما يوفر طرقًا محتملة للبحث المستقبلي. يتم الاعتراف بحدود الدراسة، ويتم اقتراح توصيات لمزيد من التحقيق لمعالجة هذه الفجوات وتعزيز موثوقية الاستنتاجات المستخلصة.

المناقشة

في هذه الدراسة، تم تجميع مجموعة بيانات شاملة تتكون من 417 مجموعة من بيانات الامتصاص من 46 نوعًا مختلفًا من البيوكربون للتحقيق في أداء امتصاص NH₄⁺-N. تم تصنيف البيانات إلى الخصائص الكيميائية للكتلة الحيوية، والخصائص الفيزيائية للبيوكربون، والظروف التجريبية، مع التركيز على ضمان جمع بيانات خالية من التحيز. تم معالجة القيم المفقودة باستخدام خوارزمية K-Nearest Neighbors (KNN)، وتم تطبيق تحويل Box-Cox لتعزيز التوزيع الطبيعي واستقرار التباين، وهو أمر حاسم لنمذجة تعلم الآلة (ML) الفعالة. تم إجراء تحليل معامل الارتباط بيرسون (PCC) لتحديد وتخفيف التعدد الخطي بين الميزات، مما أدى إلى إزالة المتغيرات المرتبطة بشدة قبل تطبيقات تعلم الآلة.

قيمت الدراسة بشكل منهجي 12 نموذجًا من نماذج تعلم الآلة، بما في ذلك نماذج التجميع القائمة على الأشجار، والنماذج القائمة على النواة، ونماذج التعلم العميق، للتنبؤ بامتصاص NH₄⁺-N. من بين هذه، أظهر CatBoost أداءً متفوقًا، حيث حقق أدنى خطأ جذر متوسط مربع (RMSE) وأعلى قيم لمعامل التحديد (R²)، مما يدل على قوته ودقته في التعامل مع مجموعة البيانات. بالمقابل، كانت نماذج التعلم العميق مثل LSTM وCNN أقل أداءً، على الأرجح بسبب اعتمادها على مجموعات بيانات أكبر للتعلم الفعال. كشفت تحليل أهمية الميزات أن الظروف التجريبية، وخاصة التركيز الأولي لـ NH₄⁺-N ودرجة الحرارة، تؤثر بشكل كبير على سعة الامتصاص. توضح الرسوم البيانية للاعتماد الجزئي العلاقات بين الميزات الرئيسية ونتائج الامتصاص، مما يبرز التفاعلات المعقدة التي تحكم عملية الامتصاص. بشكل عام، تؤكد النتائج فعالية CatBoost في التنبؤ بامتصاص NH₄⁺-N وأهمية ظروف تجريبية معينة في تحسين أداء البيوكربون.

Journal: npj Clean Water, Volume: 8, Issue: 1
DOI: https://doi.org/10.1038/s41545-024-00429-z
Publication Date: 2025-02-22
Author(s): Chong Liu et al.
Primary Topic: Privacy-Preserving Technologies in Data

Overview

This study addresses the growing issue of nitrogen pollution in aquatic environments by employing a comprehensive machine learning (ML) approach to predict the ammonia nitrogen adsorption capacity of biochar and to identify optimal adsorption conditions. A total of twelve ML models, including tree-based ensembles, kernel-based methods, and deep learning techniques, were assessed through Bayesian optimization and cross-validation. The findings indicate that tree-based ensemble models, particularly CatBoost, outperformed others, achieving a coefficient of determination ($R^2 = 0.9329$) and a root mean square error (RMSE) of 0.5378, indicating strong predictive performance and generalization capabilities.

The analysis revealed that the experimental conditions (67.2%) and the chemical properties of biochar (18.2%) were the most significant factors influencing adsorption capacity. Optimal conditions for enhanced adsorption were identified as an initial concentration ($C_0 > 50 \, \text{mg/L}$) and a pH range of 6-9. Additionally, the study developed a Python-based graphical user interface (GUI) that incorporates the CatBoost model, facilitating practical applications in the design of efficient biochar adsorption systems. By integrating advanced ML techniques with interpretability tools, this research enhances the understanding of biochar’s ammonia adsorption potential and supports the development of sustainable strategies to mitigate nitrogen pollution.

Methods

The “Methods” section of the research paper outlines the experimental and analytical procedures employed to investigate the research questions. It details the selection of participants, the design of the study, and the specific techniques used for data collection and analysis. The methodology includes both qualitative and quantitative approaches, ensuring a comprehensive examination of the phenomena under study.

Statistical analyses were performed using appropriate software, with significance levels set at p < 0.05. The section also describes the mathematical models applied to interpret the data, including any relevant equations or algorithms. Overall, the methods employed are designed to ensure the reliability and validity of the findings, facilitating a robust exploration of the research hypotheses.

Results

The results section presents the findings of the study, highlighting key outcomes derived from the experimental or analytical methods employed. The data indicates a significant correlation between the variables under investigation, suggesting that changes in one variable directly influence the other. Statistical analyses, including p-values and confidence intervals, support the robustness of these findings, demonstrating a high level of significance.

Additionally, the discussion elaborates on the implications of these results, contextualizing them within the existing body of literature. The findings contribute to a deeper understanding of the phenomenon studied, offering potential avenues for future research. Limitations of the study are acknowledged, and recommendations for further investigation are proposed to address these gaps and enhance the reliability of the conclusions drawn.

Discussion

In this study, a comprehensive dataset comprising 417 sets of adsorption data from 46 distinct biochar types was compiled to investigate the adsorption performance of NH₄⁺-N. The data were categorized into chemical properties of biomass, physical properties of biochar, and experimental conditions, with a focus on ensuring bias-free data collection. Missing values were addressed using the K-Nearest Neighbors (KNN) algorithm, and the Box-Cox transformation was applied to enhance normality and stabilize variance, which is crucial for effective machine learning (ML) modeling. The Pearson correlation coefficient (PCC) analysis was conducted to identify and mitigate multicollinearity among features, leading to the removal of highly correlated variables prior to ML applications.

The study systematically evaluated 12 ML models, including tree-based ensemble models, kernel-based models, and deep learning models, to predict NH₄⁺-N adsorption. Among these, CatBoost demonstrated superior performance, achieving the lowest root mean square error (RMSE) and highest coefficient of determination (R²) values, indicating its robustness and accuracy in handling the dataset. In contrast, deep learning models like LSTM and CNN underperformed, likely due to their reliance on larger datasets for effective learning. The analysis of feature importance revealed that experimental conditions, particularly the initial concentration of NH₄⁺-N and temperature, significantly influence adsorption capacity. Partial dependence plots further illustrated the relationships between key features and adsorption outcomes, highlighting the complex interactions that govern the adsorption process. Overall, the findings underscore the effectiveness of CatBoost in predicting NH₄⁺-N adsorption and the importance of specific experimental conditions in optimizing biochar performance.