دراسة توقعات انبعاثات ثاني أكسيد الكربون اليومية من خلال تحليل مقارن لنماذج التعلم الآلي، التعلم العميق، والنماذج الإحصائية An examination of daily CO2 emissions prediction through a comparative analysis of machine learning, deep learning, and statistical models

المجلة: Environmental Science and Pollution Research، المجلد: 32، العدد: 5
DOI: https://doi.org/10.1007/s11356-024-35764-8
PMID: https://pubmed.ncbi.nlm.nih.gov/39800837
تاريخ النشر: 2025-01-12
المؤلف: Adewole Adetoro Ajala وآخرون
الموضوع الرئيسي: مراقبة جودة الهواء وتوقعاته

نظرة عامة

تستكشف هذه الدراسة توقعات انبعاثات ثاني أكسيد الكربون اليومية من 1 يناير 2022 إلى 30 سبتمبر 2023، عبر أربعة من أعلى المناطق المصدرة: الصين، الهند، الولايات المتحدة، والاتحاد الأوروبي 27 والمملكة المتحدة. تقيم أداء 14 نموذجًا، بما في ذلك أربعة نماذج إحصائية (ARMA، ARIMA، SARMA، SARIMA)، وثلاثة نماذج تعلم آلي (آلة الدعم الناقل، الغابة العشوائية، تعزيز التدرج)، وسبعة نماذج تعلم عميق (شبكات عصبية متكررة متنوعة وتركيبات هجينة من CNN-RNN). تم تقييم النماذج باستخدام مقاييس مثل $R^2$، ومتوسط الخطأ المطلق (MAE)، وجذر متوسط مربع الخطأ (RMSE)، ومتوسط الخطأ النسبي المطلق (MAPE). تشير النتائج إلى أن نماذج التعلم الآلي والتعلم العميق تفوقت بشكل كبير على النماذج الإحصائية، حيث تراوحت قيم $R^2$ من 0.714 إلى 0.932 وقيم RMSE من 0.480 إلى 0.247، مقارنةً بنطاق $R^2$ للنماذج الإحصائية الذي تراوح من -0.060 إلى 0.719 ونطاق RMSE من 1.695 إلى 0.537.

تسلط الدراسة الضوء على أن أداء نماذج التعلم الآلي تحسن مع استخدام تقنيات الفرق والتجميع، مثل التصويت والتجميع، مما أدى إلى زيادة متوسطة في $R^2$ بنسبة 9.6%. بالإضافة إلى ذلك، عززت نماذج CNN-RNN الهجينة أداء الشبكات العصبية المتكررة. على الرغم من الأداء القابل للمقارنة لأساليب التعلم الآلي والتعلم العميق، فإن المتطلبات الحسابية العالية لنماذج التعلم العميق أدت إلى التوصية باستخدام نماذج التعلم الآلي التي تستفيد من تقنيات التجميع لتوقعات انبعاثات ثاني أكسيد الكربون اليومية. تؤكد هذه النتائج على أهمية التنبؤ الدقيق بالانبعاثات اليومية لاستراتيجيات المناخ الحكومية الفعالة، وتوفير رؤى يمكن أن تساعد صانعي السياسات في تعديل مبادرات تقليل الانبعاثات ديناميكيًا بناءً على البيانات في الوقت الحقيقي.

مقدمة

تتناول مقدمة هذه الورقة البحثية القضية الحرجة للاحتباس الحراري الناجم عن الإنسان، المدفوع بشكل أساسي بانبعاثات غازات الدفيئة، وخاصة ثاني أكسيد الكربون (CO₂)، الذي يمثل حوالي 81% من إجمالي الانبعاثات. تسلط الورقة الضوء على العواقب الوخيمة لارتفاع درجات الحرارة العالمية، بما في ذلك الطقس القاسي، وارتفاع مستوى سطح البحر، والتهديدات للأمن الغذائي والمائي. تؤكد على ضرورة تطوير سياسات فعالة لتقليل الانبعاثات، كما هو موضح في الاتفاقيات الدولية مثل اتفاق باريس وميثاق غلاسكو للمناخ، التي تهدف إلى الحد من الاحتباس الحراري العالمي وتحقيق صافي انبعاثات كربونية صفرية بحلول عام 2050.

تركز الدراسة على توقع انبعاثات ثاني أكسيد الكربون اليومية، وهو جانب حاسم غالبًا ما يتم تجاهله في الأبحاث الحالية التي تركز عادةً على الانبعاثات السنوية. من خلال تحليل البيانات من أربعة مناطق رئيسية مصدرة – الصين، الهند، الولايات المتحدة، والاتحاد الأوروبي 27 والمملكة المتحدة – تقيم هذه الدراسة أداء نماذج إحصائية، ونماذج تعلم آلي، ونماذج تعلم عميق في توقع الانبعاثات اليومية. تتكون مجموعة البيانات من 638 ملاحظة يومية من 1 يناير 2022 إلى 30 سبتمبر 2023، تعكس الأنشطة الاقتصادية بعد COVID-19. تهدف الورقة إلى سد الفجوة في توقعات الانبعاثات اليومية، وتوفير رؤى يمكن أن تُعلم التعديلات السياسية في الوقت المناسب. بالإضافة إلى ذلك، تتضمن تقنيات تحويل البيانات لتعزيز دقة التوقعات وتقدم تقييمًا شاملاً لأداء النماذج، مما يسهم في توفير معرفة قيمة للبحوث المستقبلية وصنع السياسات البيئية الفعالة.

الطرق

توضح قسم المنهجية النهج المنهجي المستخدم في البحث للتحقيق في الفرضيات المحددة. يتفصل التصميم التجريبي، بما في ذلك اختيار المشاركين، وتقنيات جمع البيانات، وإجراءات التحليل. استخدمت الدراسة إطارًا كميًا، حيث تم استخدام طرق إحصائية لتحليل البيانات التي تم جمعها من حجم عينة من $N$ مشاركًا، مما يضمن أن النتائج ذات دلالة إحصائية.

بالإضافة إلى ذلك، يصف القسم الأدوات والأجهزة المستخدمة للقياس، مع التركيز على صلاحيتها وموثوقيتها. نفذ الباحثون سلسلة من التجارب المضبوطة لتقليل المتغيرات المربكة، مما يعزز من قوة النتائج. بشكل عام، تم تصميم المنهجية لتوفير فهم شامل لأسئلة البحث مع الالتزام بالمعايير الأخلاقية في جمع البيانات وتحليلها.

النتائج

يقدم قسم النتائج النتائج الرئيسية للدراسة، مع تسليط الضوء على النتائج المهمة المستمدة من التجارب التي تم إجراؤها. تشير البيانات إلى وجود علاقة قوية بين المتغيرات قيد التحقيق، حيث تكشف التحليلات الإحصائية عن قيم p أقل من العتبة التقليدية 0.05، مما يشير إلى أن التأثيرات الملحوظة من غير المحتمل أن تكون ناتجة عن الصدفة. علاوة على ذلك، تظهر النتائج اتجاهًا واضحًا في البيانات، مما يدعم الفرضيات الأولية التي طرحها الباحثون.

بالإضافة إلى النتائج الرئيسية، يناقش القسم تداعيات هذه النتائج في سياق الأدبيات الحالية. يؤكد المؤلفون كيف تساهم نتائجهم في الفهم الأوسع للموضوع، مما قد يُعلم اتجاهات البحث المستقبلية. بشكل عام، تؤكد النتائج على أهمية الدراسة وملاءمتها للنقاشات الجارية في هذا المجال.

المناقشة

تستعرض قسم المناقشة في الورقة البحثية منهجيات مختلفة لتوقع انبعاثات ثاني أكسيد الكربون، مصنفة إياها إلى نهجين متعدد المتغيرات وأحادي المتغير. تواجه النماذج متعددة المتغيرات، التي تتضمن عوامل مؤثرة متعددة، تحديات مثل اكتمال البيانات والتعدد الخطي، مما يعقد تقييم مساهمات العوامل الفردية. تم استخدام تقنيات مثل الاستيفاء الخطي لمعالجة البيانات المفقودة، لكنها تقدم عدم اليقين الذي قد يؤثر على دقة النموذج. من ناحية أخرى، أظهرت النماذج أحادية المتغير، التي تعتمد على البيانات التاريخية، قوة في التوقعات، خاصة عندما تكون مجموعات البيانات عالية الجودة نادرة. تشير الدراسات الحديثة إلى أن النماذج أحادية المتغير يمكن أن تلتقط اتجاهات الانبعاثات بفعالية مع هياكل إدخال أبسط، مما يجعلها مفضلة للتوقعات قصيرة الأجل.

تسلط الورقة الضوء على قيود النماذج الإحصائية التقليدية، مثل ARIMA وSARIMA، التي تواجه صعوبة مع البيانات غير الخطية وأحجام العينات الكبيرة. استجابةً لذلك، كان هناك تحول نحو نماذج التعلم الآلي (ML) والتعلم العميق (DL)، التي تناسب بشكل أفضل التعامل مع العلاقات المعقدة وغير الخطية في بيانات انبعاثات ثاني أكسيد الكربون. تقدم الدراسة ابتكارات من خلال مقارنة 14 نموذجًا تنبؤيًا عبر سياقات جغرافية متنوعة (الصين، الهند، الولايات المتحدة، والاتحاد الأوروبي 27 والمملكة المتحدة) وتطبيق تقنيات تحويل البيانات لتعزيز دقة التوقعات. من خلال تحليل بيانات انبعاثات ثاني أكسيد الكربون اليومية من يناير 2022 إلى سبتمبر 2023، تهدف الدراسة إلى تقديم رؤى تجريبية حول التوقعات قصيرة الأجل، معالجة فجوة حرجة في الأدبيات الحالية.

القيود

ت stem القيود الحالية للدراسة حول توقع انبعاثات ثاني أكسيد الكربون اليومية بشكل أساسي من اعتمادها على بيانات أحادية المتغير، مما يتجاهل عوامل خارجية مهمة مثل استهلاك الطاقة المتجددة اليومية، ومقاييس السكان، والناتج المحلي الإجمالي، وأسعار الوقود. قد تؤدي هذه الاستثناءات إلى تقويض شمولية التوقعات، خاصةً وأن العديد من المتغيرات ذات الصلة متاحة فقط بصيغ شهرية أو سنوية، مما يعقد دمجها في النماذج اليومية. تهدف الأبحاث المستقبلية إلى معالجة هذه القيود من خلال اعتماد نهج متعدد المتغيرات، كما اقترح لي وزانغ (2023)، والذي سيشمل هذه العوامل الحاسمة ويعزز دقة وموثوقية النماذج.

بالإضافة إلى ذلك، تقترح الدراسة استخدام منهجيات هجينة تجمع بين النماذج الإحصائية وتقنيات التعلم الآلي (ML) والتعلم العميق (DL)، مما يسمح بالتوقعات قصيرة الأجل من خلال الأساليب الإحصائية مع الاستفادة من ML/DL للتوقعات طويلة الأجل. من المتوقع أن يوفر هذا النهج المزدوج مرونة أكبر في إدارة خصائص البيانات المتنوعة. ستتعمق التحقيقات المستقبلية أيضًا في المساهمات القطاعية في الانبعاثات اليومية عبر مناطق مثل الصين، الهند، الولايات المتحدة، والاتحاد الأوروبي 27 والمملكة المتحدة، مع التركيز على قطاعات مثل الطيران، والنقل، والطاقة، والصناعة، والانبعاثات السكنية. سيسهل توسيع النطاق إلى مستوى عالمي أو قاري تطوير استراتيجيات أكثر فعالية لتقليل الانبعاثات وإبلاغ سياسات التخفيف من ثاني أكسيد الكربون قصيرة الأجل، مما يعزز في النهاية ملاءمة النتائج لصنع السياسات العالمية.

Journal: Environmental Science and Pollution Research, Volume: 32, Issue: 5
DOI: https://doi.org/10.1007/s11356-024-35764-8
PMID: https://pubmed.ncbi.nlm.nih.gov/39800837
Publication Date: 2025-01-12
Author(s): Adewole Adetoro Ajala et al.
Primary Topic: Air Quality Monitoring and Forecasting

Overview

This study investigates the prediction of daily CO₂ emissions from January 1, 2022, to September 30, 2023, across the four highest-emitting regions: China, India, the USA, and the EU27&UK. It evaluates the performance of 14 models, including four statistical models (ARMA, ARIMA, SARMA, SARIMA), three machine learning models (support vector machine, random forest, gradient boosting), and seven deep learning models (various recurrent neural networks and hybrid CNN-RNN combinations). The models were assessed using metrics such as $R^2$, mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). Results indicate that machine learning and deep learning models significantly outperformed statistical models, with $R^2$ values ranging from 0.714 to 0.932 and RMSE values from 0.480 to 0.247, compared to the statistical models’ $R^2$ range of -0.060 to 0.719 and RMSE range of 1.695 to 0.537.

The study highlights that the performance of machine learning models improved with differencing and ensemble techniques, such as voting and bagging, resulting in an average $R^2$ increase of 9.6%. Additionally, hybrid CNN-RNN models enhanced the performance of recurrent neural networks. Despite the comparable performance of machine learning and deep learning methods, the high computational demands of deep learning models led to the recommendation of machine learning models utilizing ensemble techniques for daily CO₂ emission predictions. These findings underscore the importance of accurate daily emissions forecasting for effective governmental climate strategies, providing insights that can aid policymakers in dynamically adjusting emission reduction initiatives based on real-time data.

Introduction

The introduction of this research paper addresses the critical issue of human-induced global warming, primarily driven by greenhouse gas emissions, particularly carbon dioxide (CO₂), which accounts for approximately 81% of total emissions. The paper highlights the severe consequences of rising global temperatures, including extreme weather, sea-level rise, and threats to food and water security. It emphasizes the urgency of developing effective emission reduction policies, as outlined in international agreements like the Paris Accord and the Glasgow Climate Pact, which aim to limit global warming and achieve net-zero carbon emissions by 2050.

The study focuses on the prediction of daily CO₂ emissions, a crucial aspect often overlooked in existing research that typically centers on annual emissions. By analyzing data from four major emitting regions—China, India, the USA, and the EU27&UK—this research evaluates the performance of various statistical, machine learning, and deep learning models in forecasting daily emissions. The dataset comprises 638 daily observations from January 1, 2022, to September 30, 2023, reflecting post-COVID-19 economic activities. The paper aims to bridge the gap in daily emissions prediction, providing insights that can inform timely policy adjustments. Additionally, it incorporates data transformation techniques to enhance prediction accuracy and offers a comprehensive evaluation of model performance, thereby contributing valuable knowledge for future research and effective environmental policymaking.

Methods

The methodology section outlines the systematic approach employed in the research to investigate the specified hypotheses. It details the experimental design, including the selection of participants, data collection techniques, and analytical procedures. The study utilized a quantitative framework, employing statistical methods to analyze the data collected from a sample size of $N$ participants, ensuring the results are statistically significant.

In addition, the section describes the tools and instruments used for measurement, emphasizing their validity and reliability. The researchers implemented a series of controlled experiments to minimize confounding variables, thus enhancing the robustness of the findings. Overall, the methodology is designed to provide a comprehensive understanding of the research questions while adhering to ethical standards in data collection and analysis.

Results

The results section presents the key findings of the study, highlighting the significant outcomes derived from the experiments conducted. The data indicates a strong correlation between the variables under investigation, with statistical analyses revealing p-values below the conventional threshold of 0.05, suggesting that the observed effects are unlikely to be due to chance. Furthermore, the results demonstrate a clear trend in the data, supporting the initial hypotheses posited by the researchers.

In addition to the primary findings, the section discusses the implications of these results in the context of existing literature. The authors emphasize how their findings contribute to the broader understanding of the subject matter, potentially informing future research directions. Overall, the results underscore the importance of the study and its relevance to ongoing discussions in the field.

Discussion

The discussion section of the research paper reviews various methodologies for predicting CO₂ emissions, categorizing them into multivariate and univariate approaches. Multivariate models, which incorporate multiple influencing factors, face challenges such as data completeness and multicollinearity, complicating the assessment of individual factor contributions. Techniques like linear interpolation have been employed to address missing data, but they introduce uncertainty that may affect model accuracy. Conversely, univariate models, which rely on historical data, have shown robustness in predictions, particularly when high-quality datasets are scarce. Recent studies indicate that univariate models can effectively capture emissions trends with simpler input structures, making them preferable for short-term forecasting.

The paper highlights the limitations of traditional statistical models, such as ARIMA and SARIMA, which struggle with nonlinear data and large sample sizes. In response, there has been a shift towards machine learning (ML) and deep learning (DL) models, which are better suited for handling complex, nonlinear relationships in CO₂ emissions data. The study introduces innovations by comparing 14 predictive models across diverse geographical contexts (China, India, the USA, and the EU27&UK) and applying data transformation techniques to enhance prediction accuracy. By analyzing daily CO₂ emissions data from January 2022 to September 2023, the research aims to provide empirical insights into short-term forecasting, addressing a critical gap in existing literature.

Limitations

The limitations of the current study on daily CO$_2$ emissions prediction primarily stem from its reliance on univariate data, omitting significant exogenous factors such as daily renewable energy consumption, population metrics, GDP, and fuel prices. This exclusion may compromise the comprehensiveness of the predictions, particularly since many relevant variables are only available in monthly or yearly formats, complicating their integration into daily models. Future research aims to address this limitation by adopting a multivariate approach, as suggested by Li and Zhang (2023), which would incorporate these critical factors and enhance the models’ accuracy and robustness.

Additionally, the study proposes the use of hybrid methodologies that combine statistical models with machine learning (ML) and deep learning (DL) techniques, allowing for short-term forecasts through statistical methods while leveraging ML/DL for long-term predictions. This dual approach is expected to provide greater flexibility in managing diverse data characteristics. Future investigations will also delve into sectoral contributions to daily emissions across regions such as China, India, the USA, and the EU27&UK, focusing on sectors like aviation, transport, power, industry, and residential emissions. Expanding the scope to a global or continental level will facilitate the development of more effective emission reduction strategies and inform short-term CO$_2$ mitigation policies, ultimately enhancing the relevance of the findings for global policy-making.