توقع تركيز PM2.5 باستخدام خوارزميات التعلم الآلي: نهج لمحطات المراقبة الافتراضية PM2.5 concentration prediction using machine learning algorithms: an approach to virtual monitoring stations

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-92019-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40057563
تاريخ النشر: 2025-03-08
المؤلف: Ahmad Makhdoomi وآخرون
الموضوع الرئيسي: مراقبة جودة الهواء وتوقعاته

نظرة عامة

تدرس الدراسة توقع تركيزات PM 2.5 في مشهد باستخدام أربعة نماذج تعلم آلي: الغابة العشوائية (RF)، LightGBM (LGBM)، XGBoost (XGBR)، والانحدار المعزز بالتدرج (GBR). تم تقييم النماذج بناءً على مقاييس أدائها، بما في ذلك R²، متوسط نسبة الخطأ المطلق (MAPE)، الجذر التربيعي لمتوسط الخطأ (RMSE)، متوسط الخطأ التربيعي (MSE)، ومتوسط الخطأ المطلق (MAE)، باستخدام بيانات من 2016 إلى 2022. أظهر نموذج GBR قدرة تنبؤية متفوقة، حيث حقق قيمة R² تتجاوز 96% لمجموعة البيانات الاختبارية، متفوقًا على النماذج الأخرى.

تسلط النتائج الضوء على أن تركيزات PM 2.5 في مشهد تتجاوز في كثير من الأحيان معيار وكالة حماية البيئة (EPA) البالغ 15 ميكروغرام/م² وإرشادات منظمة الصحة العالمية (WHO) البالغة 10 ميكروغرام/م²، مما يشير إلى حاجة ملحة لاتخاذ تدابير استباقية لمعالجة تلوث الهواء. تشمل التوصيات تعزيز المساحات الخضراء، تحسين وسائل النقل العامة، تنفيذ لوائح أكثر صرامة لانبعاثات الصناعة، واعتماد أنظمة ترشيح الهواء المتقدمة في المناطق ذات الحركة المرورية العالية. تؤكد الدراسة على أهمية التنبؤ الدقيق بـ PM من أجل صنع سياسات بيئية فعالة ومبادرات الصحة العامة في البيئات الحضرية.

الطرق

يستعرض قسم “المواد والطرق” تصميم التجربة والإجراءات المستخدمة في الدراسة. يوضح المواد المحددة المستخدمة، بما في ذلك أي مواد كيميائية، معدات، وعينات بيولوجية، بالإضافة إلى البروتوكولات المتبعة لضمان قابلية التكرار ودقة النتائج. يركز القسم على المنهجيات لجمع البيانات وتحليلها، بما في ذلك التقنيات الإحصائية المطبقة لتفسير النتائج.

بالإضافة إلى ذلك، قد يصف القسم أي ضوابط أو متغيرات تم أخذها في الاعتبار أثناء التجارب، مما يضمن أن النتائج صحيحة وموثوقة. تعتبر وضوح وصحة الطرق أمرًا حاسمًا لمصداقية الدراسة، مما يسمح بإمكانية تكرارها من قبل باحثين آخرين في هذا المجال.

النتائج

يكشف قسم النتائج في الدراسة عن وجود علاقة إيجابية قوية بين مؤشر جودة الهواء (AQI)، PM10، وPM2.5، مع معامل ارتباط قدره 0.95، مما يشير إلى أن مستويات AQI وPM10 تتنبأ بشكل كبير بتركيزات PM2.5. تشير التحليلات إلى أن المراقبة الفعالة والتحكم في PM2.5 يمكن أن يحسن جودة الهواء. بينما أظهرت بعض المتغيرات المناخية علاقات سلبية ضعيفة مع PM2.5، تم الاحتفاظ فقط بتلك التي كانت لها علاقات أكبر من +0.1 أو أقل من -0.1 لنمذجة التنبؤ.

تم استخدام أربعة نماذج تعلم آلي—الانحدار المعزز بالتدرج (GBR)، الغابة العشوائية (RF)، XGBoost (XGBR)، وLightGBM (LGBM)—لتقدير تركيزات PM2.5. تم حساب مقاييس التقييم مثل متوسط الخطأ المطلق (MAE)، متوسط الخطأ التربيعي (MSE)، الجذر التربيعي لمتوسط الخطأ (RMSE)، متوسط نسبة الخطأ المطلق (MAPE)، وR² لكل من مجموعات البيانات التدريبية والاختبارية. أظهر RF أفضل أداء في تقليل الأخطاء المطلقة (MAE = 0.47) والانحرافات النسبية المئوية (MAPE = 1.5%)، بينما تفوق GBR في تقليل الأخطاء الكبيرة (MSE = 5.33، RMSE = 2.31). أظهر XGBR قوة تنبؤية عالية (R² = 0.9781) لكنه كان لديه أعلى MAPE (2.7%). كان أداء LGBM الأضعف عبر المقاييس. أشارت المقارنات الإحصائية إلى وجود اختلافات كبيرة بين RF وGBR، وبين GBR وXGBR، مما يبرز قدراتهم التنبؤية المميزة. تم مقارنة توقعات النماذج بصريًا مع تركيزات PM2.5 الفعلية، مما كشف أن GBR وRF توافقا بشكل وثيق مع القيم المرصودة، خاصة خلال ذروة التلوث، بينما واجهت LGBM وXGBR صعوبة مع مستويات PM2.5 العالية. تشير الدراسة إلى أن دمج مصادر بيانات إضافية يمكن أن يعزز دقة التنبؤ خلال فترات ارتفاع التلوث.

المناقشة

تناقش الورقة البحثية اتجاهات جودة الهواء في مشهد، إيران، مع التركيز على تركيزات الجسيمات (PM2.5 وPM10) من 2016 إلى 2022. تسلط الدراسة الضوء على أن مشهد، التي يبلغ عدد سكانها أكثر من 3.3 مليون ونشاط صناعي كبير، تواجه تحديات خطيرة في تلوث الهواء تفاقمت بسبب حركة المرور الكثيفة والتحضر. بشكل ملحوظ، كانت تركيزات PM10 تتجاوز باستمرار العتبة التي وضعتها منظمة الصحة العالمية خلال أشهر معينة، بينما أدت جائحة COVID-19 إلى تقليل مستويات التلوث مؤقتًا. تؤكد الدراسة أيضًا على أهمية العوامل المناخية، مثل هطول الأمطار والرياح، في التأثير على جودة الهواء، حيث يرتبط زيادة هطول الأمطار بانخفاض قيم مؤشر جودة الهواء (AQI).

لتوقع تركيزات PM2.5، استخدمت الدراسة أربعة نماذج تعلم آلي: آلة الانحدار المعزز الخفيف (LGBM)، الانحدار المعزز بالتدرج المتطرف (XGBR)، الغابة العشوائية (RF)، والانحدار المعزز بالتدرج (GBR). من بين هذه، أظهر GBR قدرة تنبؤية متفوقة، حيث حقق قيمة R² تتجاوز 96% على مجموعة البيانات الاختبارية. تؤكد النتائج على الحاجة إلى اتخاذ تدابير استباقية للتخفيف من تلوث الهواء، بما في ذلك ضوابط أكثر صرامة على انبعاثات الصناعة وتحسين وسائل النقل العامة. تختتم الدراسة بأن التنبؤ الدقيق بـ PM أمر حيوي لصنع سياسات بيئية فعالة ومبادرات الصحة العامة في المناطق الحضرية مثل مشهد.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-92019-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40057563
Publication Date: 2025-03-08
Author(s): Ahmad Makhdoomi et al.
Primary Topic: Air Quality Monitoring and Forecasting

Overview

The study investigates the prediction of PM 2.5 concentrations in Mashhad using four machine learning models: Random Forest (RF), LightGBM (LGBM), XGBoost (XGBR), and Gradient Boosting Regression (GBR). The models were evaluated based on their performance metrics, including R², Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Error (MAE), using data from 2016 to 2022. The GBR model demonstrated superior predictive capability, achieving an R² value exceeding 96% for the testing dataset, outperforming the other models.

The findings highlight that PM 2.5 concentrations in Mashhad frequently surpass the Environmental Protection Agency (EPA) standard of 15 µg/m² and the World Health Organization (WHO) guideline of 10 µg/m², indicating a pressing need for proactive measures to address air pollution. Recommendations include enhancing green spaces, improving public transportation, implementing stricter industrial emission regulations, and adopting advanced air filtration systems in high-traffic areas. The study underscores the importance of accurate PM forecasting for effective environmental policymaking and public health initiatives in urban settings.

Methods

The “Materials and Methods” section outlines the experimental design and procedures employed in the study. It details the specific materials used, including any reagents, equipment, and biological samples, as well as the protocols followed to ensure reproducibility and accuracy of results. The section emphasizes the methodologies for data collection and analysis, including statistical techniques applied to interpret the findings.

Additionally, the section may describe any controls or variables considered during the experiments, ensuring that the results are valid and reliable. The clarity and rigor of the methods are crucial for the study’s credibility, allowing for potential replication by other researchers in the field.

Results

The results section of the study reveals a strong positive correlation between Air Quality Index (AQI), PM10, and PM2.5, with a correlation coefficient of 0.95, indicating that AQI and PM10 levels significantly predict PM2.5 concentrations. The analysis suggests that effective monitoring and control of PM2.5 could enhance air quality. While some meteorological variables showed weak negative correlations with PM2.5, only those with correlations greater than +0.1 or less than -0.1 were retained for predictive modeling.

Four machine learning models—Gradient Boosting Regressor (GBR), Random Forest (RF), XGBoost (XGBR), and LightGBM (LGBM)—were employed to estimate PM2.5 concentrations. Evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and R² were calculated for both training and testing datasets. RF demonstrated the best performance in minimizing absolute errors (MAE = 0.47) and relative percentage deviations (MAPE = 1.5%), while GBR excelled in reducing large errors (MSE = 5.33, RMSE = 2.31). XGBR showed high predictive power (R² = 0.9781) but had the highest MAPE (2.7%). LGBM performed the weakest across metrics. Statistical comparisons indicated significant differences between RF and GBR, and between GBR and XGBR, emphasizing their distinct predictive capabilities. The models’ predictions were visually compared against actual PM2.5 concentrations, revealing that GBR and RF aligned closely with observed values, particularly during pollution spikes, while LGBM and XGBR struggled with high PM2.5 levels. The study suggests that incorporating additional data sources could enhance predictive accuracy during pollution surges.

Discussion

The research paper discusses air quality trends in Mashhad, Iran, focusing on particulate matter (PM2.5 and PM10) concentrations from 2016 to 2022. The study highlights that Mashhad, with a population of over 3.3 million and significant industrial activity, faces severe air pollution challenges exacerbated by heavy vehicular traffic and urbanization. Notably, PM10 concentrations consistently exceeded the World Health Organization’s threshold during specific months, while the COVID-19 pandemic temporarily reduced pollution levels. The study also emphasizes the importance of meteorological factors, such as precipitation and wind, in influencing air quality, with increased rainfall correlating with lower Air Quality Index (AQI) values.

To predict PM2.5 concentrations, the study employed four machine learning models: Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting Regressor (XGBR), Random Forest (RF), and Gradient Boosting Regressor (GBR). Among these, GBR demonstrated superior predictive capability, achieving an R² value above 96% on the testing dataset. The findings underscore the need for proactive measures to mitigate air pollution, including stricter industrial emissions controls and enhanced public transportation. The study concludes that accurate PM forecasting is vital for effective environmental policymaking and public health initiatives in urban areas like Mashhad.