توقع جودة الهواء باستخدام التعلم الآلي: تحليل مقارن واستراتيجيات جماعية لتحسين التنبؤ Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction

المجلة: Water Air & Soil Pollution، المجلد: 236، العدد: 7
DOI: https://doi.org/10.1007/s11270-025-08122-8
تاريخ النشر: 2025-05-14
المؤلف: Yıldırım ÖZÜPAK وآخرون
الموضوع الرئيسي: مراقبة جودة الهواء وتوقعاته

نظرة عامة

تتناول هذه الدراسة القضية الحرجة لتلوث الهواء من خلال استخدام تقنيات التعلم الآلي لتعزيز توقع جودة الهواء، وهو أمر حيوي لاستدامة البيئة والصحة العامة. باستخدام مجموعة بيانات تضم 9,357 قياسًا ساعيًّا للملوثات (PM2.5، NOx، CO، والبنزين) تم جمعها على مدار عام من منطقة حضرية ملوثة، قارن البحث أداء عشرة نماذج انحدار، بما في ذلك XGBoost، LightGBM، Random Forest، وانحدار الدعم المتجه (SVR) مع تحسين بايزي. وجدت الدراسة أن تحسين المعلمات الفائقة وطرق التجميع، وخاصة من خلال التكديس، حسنت بشكل كبير دقة النموذج. حقق نموذج SVR المحسن عبر الطرق البايزية درجة R² مثيرة للإعجاب تبلغ 99.94%، ومتوسط خطأ مطلق (MAE) قدره 0.0120، ومتوسط خطأ تربيعي (MSE) قدره 0.0005، مما يوضح فعالية المنهجية المقترحة في التقاط الديناميات المكانية والزمنية لتلوث الهواء.

تسلط النتائج الضوء على أهمية المعالجة الدقيقة للبيانات، وهندسة الميزات، وضبط المعلمات الفائقة في تعزيز أداء النموذج. لم تؤدي دمج تقنيات التعلم الآلي والتحسين إلى تحسين دقة التوقعات فحسب، بل قدمت أيضًا إطارًا قويًا لاتخاذ القرارات المستندة إلى البيانات في إدارة جودة الهواء والسياسة البيئية. تشمل اتجاهات البحث المستقبلية استكشاف نماذج هجينة قائمة على التعلم العميق وتطبيق طرق الذكاء الاصطناعي القابلة للتفسير لتعزيز دقة وتفسير توقعات تلوث الهواء بشكل أكبر.

مناقشة

في هذه الدراسة، تم استخدام نماذج التعلم الآلي للتنبؤ بديناميات تلوث الهواء وكشف العلاقات المعقدة بين مختلف الملوثات، بما في ذلك PM2.5، NO₂، وCO. تم استخدام مجموعة بيانات شاملة تضم 9,357 قياسًا ساعيًّا من أجهزة الاستشعار على جانب الطريق، مما يضمن الاتساق الإحصائي من خلال إزالة القيم الشاذة وتحليل الارتباط. تم تدريب عشرة نماذج انحدار، بما في ذلك XGBoost، LightGBM، وRandom Forest، باستخدام تقنيات تحسين المعلمات الفائقة مثل التحسين البايزي والتحقق المتقاطع العشوائي. أدى دمج هذه النماذج من خلال طريقة تجميع التكديس إلى تعزيز دقة التوقعات بشكل كبير، مما أسفر عن تحسين بنسبة 15-20% في RMSE مقارنة بالنماذج الفردية. تؤكد النتائج على إمكانيات هذا الإطار المنهجي للتطبيقات العملية في إدارة جودة الهواء وتطوير السياسات.

كشفت التحليلات عن تباينات موسمية ملحوظة في تركيزات الملوثات، وخاصة CO وNOₓ، التي بلغت ذروتها خلال أشهر الشتاء بسبب الأنشطة البشرية. أشارت تحليلات الارتباط إلى علاقات قوية بين الملوثات، حيث أظهر CO ارتباطًا إيجابيًا عاليًا مع NOₓ وNO₂، مما يشير إلى مصادر مشتركة للانبعاثات. بالإضافة إلى ذلك، وُجد أن العوامل الجوية مثل درجة الحرارة والرطوبة تؤثر على مستويات الملوثات، حيث ارتبطت درجات الحرارة المرتفعة بزيادة الرطوبة المطلقة وأثرت على انتشار وتفاعلات الملوثات الكيميائية. تسهم هذه الرؤى في فهم أعمق لديناميات جودة الهواء وتبرز أهمية دمج أساليب التعلم الآلي مع البيانات البيئية من أجل إدارة فعالة لجودة الهواء.

Journal: Water Air & Soil Pollution, Volume: 236, Issue: 7
DOI: https://doi.org/10.1007/s11270-025-08122-8
Publication Date: 2025-05-14
Author(s): Yıldırım ÖZÜPAK et al.
Primary Topic: Air Quality Monitoring and Forecasting

Overview

This study addresses the critical issue of air pollution by employing machine learning techniques to enhance air quality prediction, which is vital for environmental sustainability and public health. Utilizing a dataset of 9,357 hourly measurements of pollutants (PM2.5, NOx, CO, and benzene) collected over a year from a polluted urban area, the research compared the performance of ten regression models, including XGBoost, LightGBM, Random Forest, and Support Vector Regression (SVR) with Bayesian Optimization. The study found that hyperparameter optimization and ensemble methods, particularly through stacking, significantly improved model accuracy. The SVR model optimized via Bayesian methods achieved an impressive R² score of 99.94%, a Mean Absolute Error (MAE) of 0.0120, and a Mean Squared Error (MSE) of 0.0005, demonstrating the effectiveness of the proposed methodology in capturing the spatial and temporal dynamics of air pollution.

The findings highlight the importance of rigorous data preprocessing, feature engineering, and hyperparameter tuning in enhancing model performance. The integration of machine learning and optimization techniques not only improved prediction accuracy but also provided a robust framework for data-driven decision-making in air quality management and environmental policy. Future research directions include the exploration of deep learning-based hybrid models and the application of explainable artificial intelligence methods to further enhance the accuracy and interpretability of air pollution forecasts.

Discussion

In this study, machine learning models were employed to predict air pollution dynamics and uncover complex relationships among various pollutants, including PM2.5, NO₂, and CO. A comprehensive dataset comprising 9,357 hourly measurements from roadside sensors was utilized, ensuring statistical consistency through outlier removal and correlation analysis. Ten regression models, including XGBoost, LightGBM, and Random Forest, were trained with hyperparameter optimization techniques such as Bayesian optimization and randomized cross-validation. The integration of these models through a stack ensemble method significantly enhanced prediction accuracy, yielding a 15-20% improvement in RMSE compared to individual models. The findings underscore the potential of this methodological framework for practical applications in air quality management and policy development.

The analysis revealed notable seasonal variations in pollutant concentrations, particularly CO and NOₓ, which peaked during winter months due to anthropogenic activities. Correlation analyses indicated strong relationships among pollutants, with CO showing a high positive correlation with NOₓ and NO₂, suggesting common sources of emissions. Additionally, meteorological factors such as temperature and humidity were found to influence pollutant levels, with higher temperatures correlating with increased absolute humidity and affecting the dispersion and chemical interactions of pollutants. These insights contribute to a deeper understanding of air quality dynamics and highlight the importance of integrating machine learning approaches with environmental data for effective air quality management.

كلمات مفتاحية: الأرصاد الجوية، الجغرافيا، الذكاء الاصطناعي، تعلم الآلة، تعلم جماعي، توقعات جماعية، جودة (فلسفة)، علوم الحاسوب، مؤشر جودة الهواء