نموذج تعلم آلي محسن للتنبؤ بمؤشر جودة الهواء في المدن الكبرى في الهند Optimized machine learning model for air quality index prediction in major cities in India

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-54807-1
PMID: https://pubmed.ncbi.nlm.nih.gov/38514669
تاريخ النشر: 2024-03-21
المؤلف: Suresh Kumar Natarajan وآخرون
الموضوع الرئيسي: مراقبة جودة الهواء وتوقعاته

نظرة عامة

تقدم هذه الدراسة نموذج تعلم آلي محسن للتنبؤ بمؤشر جودة الهواء (AQI) في المدن الكبرى في الهند، مع معالجة الحاجة الملحة لإدارة فعالة لجودة الهواء بسبب ارتفاع مستويات التلوث الناتجة عن الأنشطة الصناعية وانبعاثات المركبات. يدمج النموذج تحسين الذئب الرمادي (GWO) مع خوارزمية شجرة القرار (DT)، مستفيدًا من بيانات جودة الهواء المستمدة من مستودع كاجل. تركز التحليل على مدن تشمل نيودلهي، بنغالور، كولكاتا، حيدر أباد، تشيناي، وفيزاخاباتنام، مع استخدام مقاييس الأداء مثل R-Square، RMSE، MSE، MAE، والدقة لتقييم فعالية النموذج.

يظهر نموذج GWO-DT المقترح قدرات تنبؤية متفوقة مقارنة بأساليب التعلم الآلي التقليدية، محققًا أقصى دقة تصل إلى 88.98% لنيودلهي، 91.49% لبنغالور، 94.48% لكولكاتا، 97.66% لحيدر أباد، 95.22% لتشيناي، و97.68% لفيزاخاباتنام. تشير هذه النتائج إلى تحسين كبير مقارنة بالطرق الحالية، بما في ذلك الانحدار باستخدام دعم المتجهات، وجيران الأقرب، والانحدار باستخدام الغابة العشوائية. تقترح الدراسة أن العمل المستقبلي يمكن أن يعزز دقة التنبؤ بشكل أكبر من خلال دمج تقنيات التعلم العميق في النموذج.

طرق

في هذه الدراسة، أجرى المؤلفون تحليل محاكاة باستخدام بيانات جودة الهواء المرجعية من الهند، تغطي السنوات من 2015 إلى 2020، المستمدة من مستودع كاجل. تشمل مجموعة البيانات مؤشرات جودة الهواء لـ 26 مدينة رئيسية، مع التركيز على ست مدن—نيودلهي، كولكاتا، حيدر أباد، بنغالور، تشيناي، وفيزاخاباتنام—تم اختيارها بناءً على تصنيفات مؤشر جودة الهواء (AQI) الخاصة بها، والتي تتراوح من جيدة إلى سيئة جداً. تم تصدير البيانات لهذه المدن إلى ملف CSV لمزيد من تحليل مستويات التلوث.

شملت المنهجية استخراج الميزات المثلى من خلال نموذج تحسين، تلاها التصنيف باستخدام مصنف شجرة القرار. تم تنفيذ جميع التجارب باستخدام بايثون، مع دمج وظائف المكتبة الأساسية لكل من عمليات التحسين والتصنيف. يتم تفصيل المعلمات الفائقة المستخدمة خلال التجارب في الجدول 1، مما يوفر نظرة شاملة على الإطار المنهجي المستخدم في التحليل.

نتائج

يقدم قسم النتائج النتائج المستخلصة من التجارب التي تم إجراؤها، موضحًا النتائج وآثارها. يكشف التحليل عن اتجاهات وأنماط ملحوظة تتماشى مع الفرضيات الأولية. تم قياس مقاييس رئيسية، وتم إجراء تحليلات إحصائية للتحقق من النتائج، مما يضمن القوة والموثوقية.

تشير النتائج إلى أن الظروف التجريبية كان لها تأثير ملحوظ على المتغيرات الملاحظة، مع تحقيق معلمات معينة نتائج مثلى. من الجدير بالذكر أن البيانات تدعم النموذج المقترح، مما يظهر علاقة قوية بين المتغيرات المستقلة والتابعة. تسهم هذه النتائج في فهم أعمق للآليات الأساسية وتقترح تطبيقات محتملة في المجال المعني.

مناقشة

يوفر قسم المناقشة في الورقة نظرة شاملة على التقدمات الأخيرة في نماذج التنبؤ بمؤشر جودة الهواء (AQI)، مع التركيز على تطبيق تقنيات التعلم الآلي والتعلم العميق. أظهرت دراسات مختلفة فعالية نماذج مختلفة، بما في ذلك الشبكات العصبية التلقائية غير الخطية، ونماذج السحابة الغاوسية، والنهج الهجينة التي تجمع بين عدة خوارزميات. من الجدير بالذكر أن النماذج التي تستخدم الانحدار باستخدام دعم المتجهات وشبكات الذاكرة طويلة وقصيرة الأجل (LSTM) قد أظهرت دقة محسنة في تصنيف قيم AQI. بالإضافة إلى ذلك، أدى دمج تقنيات معالجة البيانات المتقدمة، مثل تحليل الوضع المتغير وطرق استخراج الميزات، إلى تحسين أداء التنبؤ عبر دراسات مختلفة.

تسلط الأدبيات الضوء على أهمية معالجة التعقيدات الكامنة في التنبؤ بـ AQI، لا سيما في البيئات الحضرية حيث تكون مصادر التلوث متعددة الأوجه. استخدمت عدة نماذج بيانات في الوقت الحقيقي وأجهزة استشعار إنترنت الأشياء لمراقبة جودة الهواء، محققة دقة أفضل وتقليل التعقيد الحسابي مقارنة بالطرق التقليدية. تختتم المناقشة بالتأكيد على الحاجة إلى تركيز الأبحاث المستقبلية على تحسين الكفاءة الحسابية ودمج البيانات في الوقت الحقيقي لأنظمة مراقبة بيئية أكثر موثوقية. توضح هذه التركيبة من الأدبيات الحالية المشهد المتطور لأساليب التنبؤ بـ AQI وآثارها على الصحة العامة وإدارة البيئة.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-54807-1
PMID: https://pubmed.ncbi.nlm.nih.gov/38514669
Publication Date: 2024-03-21
Author(s): Suresh Kumar Natarajan et al.
Primary Topic: Air Quality Monitoring and Forecasting

Overview

This research presents an optimized machine learning model for predicting the Air Quality Index (AQI) in major Indian cities, addressing the urgent need for effective air quality management due to rising pollution levels from industrial activities and vehicle emissions. The model integrates Grey Wolf Optimization (GWO) with the Decision Tree (DT) algorithm, utilizing air quality data sourced from the Kaggle repository. The analysis focuses on cities including New Delhi, Bangalore, Kolkata, Hyderabad, Chennai, and Visakhapatnam, with performance metrics such as R-Square, RMSE, MSE, MAE, and accuracy employed to evaluate the model’s effectiveness.

The proposed GWO-DT model demonstrates superior predictive capabilities compared to traditional machine learning approaches, achieving maximum accuracies of 88.98% for New Delhi, 91.49% for Bangalore, 94.48% for Kolkata, 97.66% for Hyderabad, 95.22% for Chennai, and 97.68% for Visakhapatnam. These results indicate a significant improvement over existing methods, including Support Vector Regression, K-Nearest Neighbors, and Random Forest Regression. The study suggests that future work could enhance prediction accuracy further by incorporating deep learning techniques into the model.

Methods

In this study, the authors conducted a simulation analysis utilizing benchmark Air Quality Data from India, covering the years 2015 to 2020, sourced from the Kaggle repository. The dataset encompasses air quality indices for 26 major cities, with a focus on six cities—New Delhi, Kolkata, Hyderabad, Bangalore, Chennai, and Visakhapatnam—selected based on their air quality index (AQI) classifications, which range from good to very poor. The data for these cities were exported into a CSV file for further analysis of pollution levels.

The methodology involved extracting optimal features through an optimization model, followed by classification using a decision tree classifier. All experiments were executed using Python, incorporating essential library functions for both the optimization and classification processes. The hyperparameters utilized during the experimentation are detailed in Table 1, providing a comprehensive overview of the methodological framework employed in the analysis.

Results

The results section presents the findings from the conducted experiments, detailing the outcomes and their implications. The analysis reveals significant trends and patterns that align with the initial hypotheses. Key metrics were measured, and statistical analyses were performed to validate the results, ensuring robustness and reliability.

The findings indicate that the experimental conditions had a marked effect on the observed variables, with specific parameters yielding optimal results. Notably, the data supports the proposed model, demonstrating a strong correlation between the independent and dependent variables. These results contribute to a deeper understanding of the underlying mechanisms and suggest potential applications in the relevant field.

Discussion

The discussion section of the paper provides a comprehensive overview of recent advancements in air quality index (AQI) prediction models, emphasizing the application of machine learning and deep learning techniques. Various studies have demonstrated the effectiveness of different models, including non-linear autoregressive neural networks, Gaussian plume models, and hybrid approaches that combine multiple algorithms. Notably, models utilizing support vector regression and long short-term memory (LSTM) networks have shown improved accuracy in classifying AQI values. Additionally, the integration of advanced data processing techniques, such as variational mode decomposition and feature extraction methods, has enhanced prediction performance across various studies.

The literature highlights the significance of addressing the complexities inherent in AQI prediction, particularly in urban environments where pollution sources are multifaceted. Several models have successfully utilized real-time data and IoT sensors to monitor air quality, achieving better accuracy and reduced computational complexity compared to traditional methods. The discussion concludes by underscoring the need for future research to focus on optimizing computational efficiency and integrating real-time data for more reliable environmental monitoring systems. This synthesis of existing literature illustrates the evolving landscape of AQI prediction methodologies and their implications for public health and environmental management.