إطار عمل مُحسّن للتعلم الآلي لتوقع وتفسير سلوك غسل الأخضر في الشركات An optimized machine learning framework for predicting and interpreting corporate ESG greenwashing behavior

المجلة: PLoS ONE، المجلد: 20، العدد: 3
DOI: https://doi.org/10.1371/journal.pone.0316287
PMID: https://pubmed.ncbi.nlm.nih.gov/40048435
تاريخ النشر: 2025-03-06
المؤلف: Fanlong Zeng وآخرون
الموضوع الرئيسي: الاستدامة البيئية في الأعمال

نظرة عامة

تؤكد هذه الفقرة من البحث على الحاجة الملحة للتنبؤ الدقيق وتفسير سلوكيات الشركات المتعلقة بالغسيل الأخضر في مجالات البيئة والاجتماع والحوكمة (ESG) لتعزيز الشفافية وفعالية التنظيم. يقدم المؤلفون إطار عمل محسّن للتعلم الآلي يجمع بين خوارزمية تحسين الصياد-الفريسة المحسّنة (IHPO) ونموذج تعزيز التدرج المتطرف (XGBoost) ونظرية الشرح الإضافي لشابلي (SHAP). يهدف هذا الإطار إلى معالجة قيود النماذج الحالية في تحسين المعلمات الفائقة وقابلية التفسير. تم تطوير مجموعة بيانات شاملة من خلال مراجعة الأدبيات ومقابلات مع خبراء، وتم استخدام خوارزمية IHPO لتحسين معلمات نموذج XGBoost، مما أسفر عن نموذج تعلم جماعي IHPO-XGBoost.

تشير النتائج إلى أن نموذج IHPO-XGBoost يظهر أداءً تنبؤيًا استثنائيًا، حيث حقق قيم R² وRMSE وMAE وR² المعدلة تبلغ 0.9790 و0.1376 و0.1000 و0.9785، على التوالي. يتفوق هذا النموذج على تحسين المعلمات الفائقة التقليدية ونماذج XGBoost المدمجة مع خوارزميات أخرى. تكشف تحليل قابلية التفسير باستخدام SHAP عن الميزات الرئيسية التي تؤثر على التنبؤات، مما يبرز مساهمات تفاعلات الميزات وميزات العينات الفردية. توفر النتائج رؤى قيمة للمراقبين والمستثمرين، مما يسهل تحديد وتقييم سلوكيات الغسيل الأخضر ESG للشركات، وبالتالي تحسين كفاءة التنظيم واتخاذ قرارات الاستثمار.

مقدمة

في السنوات الأخيرة، اكتسبت قضايا البيئة والاجتماع والحوكمة (ESG) اهتمامًا كبيرًا من المستثمرين والجهات التنظيمية والمستهلكين، لا سيما فيما يتعلق بأداء التنمية المستدامة. وقد أكدت بنك الشعب الصيني على الدعم المالي للمبادرات الخضراء، مما يعكس هذا الاتجاه. في الوقت نفسه، أصبح دمج تقنيات الذكاء الاصطناعي القابل للتفسير (XAI)، مثل SHAP (الشرح الإضافي لشابلي)، أمرًا حيويًا لتعزيز قابلية تفسير نماذج التعلم الآلي المعقدة. أظهرت الدراسات فعالية إطار عمل XGBoost-SHAP عبر مجالات متعددة، بما في ذلك المرونة الاقتصادية والتقييمات التعليمية، مما يبرز تطبيقه الواسع وأهمية تحسين المعلمات الفائقة لتحسين الأداء التنبؤي.

يتناول هذا البحث الفجوة في التنبؤ بسلوكيات الغسيل الأخضر ESG للشركات من خلال اقتراح خوارزمية تحسين الصياد-الفريسة المحسّنة (IHPO) لضبط المعلمات الفائقة لنموذج XGBoost. تجمع الدراسة بيانات متعلقة بـ ESG من الشركات المدرجة في السوق الصينية A وتبني مجموعة بيانات لتنبؤ الغسيل الأخضر للشركات. من خلال تطبيق خوارزمية IHPO، تهدف الدراسة إلى تعزيز دقة التنبؤ لنموذج XGBoost وتفسير النتائج باستخدام نظرية SHAP. تشمل مساهمات هذه الدراسة تطوير نموذج تنبؤ جديد للغسيل الأخضر ESG للشركات، وتحسين قابلية تفسير مخرجات النموذج، وإطار عمل للتعلم الآلي قد يكون مرجعًا للبحوث المستقبلية في المجالات ذات الصلة.

النتائج

في هذه الدراسة، تم استخدام مجموعة بيانات الغسيل الأخضر للشركات لتدريب نموذج XGBoost، مع مجموعة تدريب بنسبة 80% ومجموعة اختبار بنسبة 20%، باستخدام طريقة التحقق المتقاطع بخمس طيات لتعزيز التعميم على البيانات غير المرئية. تم تطبيق معلمات التنظيم، تحديدًا L1 (ألفا) وL2 (لامدا)، للتخفيف من الإفراط في التكيف، بينما تم تحسين المعلمات الفائقة مثل $n_{\text{estimators}}$ و$learning\_rate$ و$max\_depth$ باستخدام خوارزمية IHPO. كانت القيم المحسّنة التي تم الحصول عليها هي $n_{\text{estimators}} = 450$ و$learning\_rate = 0.1$ و$max\_depth = 6$، مع استخدام متوسط الخطأ التربيعي (MSE) كدالة للياقة.

أظهر التحقق من أداء النموذج التنبؤي على مجموعة الاختبار أن القيم المتوقعة تتماشى عن كثب مع القيم الفعلية، كما هو موضح في التصورات. كانت الأخطاء المطلقة تقع في الغالب ضمن النطاق المنخفض من 0 إلى 0.2، مما يشير إلى قدرات تنبؤية قوية. أظهر الرسم البياني لتشتت القيم الفعلية مقابل القيم المتوقعة ارتباطًا قويًا، حيث تقع معظم النقاط ضمن فترة الثقة 95% لخط الملاءمة المثالي $y = x$. تم الإبلاغ عن مقاييس الأداء لنموذج IHPO-XGBoost كـ $R^2 = 0.9790$ وRMSE = 0.1376 وMAE = 0.1000 وR² المعدلة = 0.9785، مما يبرز الأداء التنبؤي الممتاز للنموذج.

المناقشة

في هذا القسم، يناقش البحث المنهجية لتنبؤ سلوكيات الغسيل الأخضر ESG (البيئة، الاجتماع، والحوكمة) للشركات من خلال تطوير مؤشر الغسيل الأخضر للشركات (GWI). يتم حساب GWI باستخدام صيغة تقارن بين درجة إفصاح ESG الخاصة بالشركة (من بلومبرغ) ودرجتها الفعلية في الأداء ESG (من ويند)، مع الإشارة إلى أن GWI إيجابي يدل على درجة أعلى من الغسيل الأخضر. تحدد الدراسة مجموعة شاملة من 19 متغيرًا مميزًا مصنفة إلى خصائص الشركة، والأداء المالي، والكفاءة التشغيلية، والتأثيرات الخارجية، والمسؤولية الاجتماعية، والتي تعتبر أساسية لتنبؤ سلوكيات الغسيل الأخضر.

تتكون مجموعة البيانات من ملاحظات من الشركات المدرجة في السوق الصينية A من 2017 إلى 2022، مع استبعاد بعض الصناعات والشذوذات المالية. تم تصميم GWI ليكون متأخرًا عامًا واحدًا عن المتغيرات المميزة لتسهيل التحليل التنبؤي. تستخدم الدراسة نموذج XGBoost، المحسن من خلال خوارزمية تحسين المعلمات الفائقة الذكية (IHPO)، لتعزيز دقة التنبؤ. يتم تقييم أداء النموذج باستخدام مقاييس مثل معامل التحديد (R²) وجذر متوسط الخطأ التربيعي (RMSE) ومتوسط الخطأ المطلق (MAE)، بينما يتم تحقيق قابلية التفسير من خلال قيم SHAP، التي توضح مساهمة كل ميزة في تنبؤات النموذج. تهدف هذه المقاربة المنظمة إلى تقديم رؤى حول سلوكيات الشركات فيما يتعلق بالتزامات ESG وإمكانية الغسيل الأخضر.

Journal: PLoS ONE, Volume: 20, Issue: 3
DOI: https://doi.org/10.1371/journal.pone.0316287
PMID: https://pubmed.ncbi.nlm.nih.gov/40048435
Publication Date: 2025-03-06
Author(s): Fanlong Zeng et al.
Primary Topic: Environmental Sustainability in Business

Overview

This research paper section emphasizes the critical need for accurate prediction and interpretation of corporate Environmental, Social, and Governance (ESG) greenwashing behaviors to enhance transparency and regulatory effectiveness. The authors introduce an optimized machine learning framework that combines an Improved Hunter-Prey Optimization (IHPO) algorithm with an eXtreme Gradient Boosting (XGBoost) model and SHapley Additive exPlanations (SHAP) theory. This framework aims to address the limitations of existing models in hyperparameter optimization and interpretability. A comprehensive dataset was developed through literature review and expert interviews, and the IHPO algorithm was utilized to optimize the XGBoost model’s hyperparameters, resulting in an IHPO-XGBoost ensemble learning model.

The findings indicate that the IHPO-XGBoost model demonstrates exceptional predictive performance, achieving R², RMSE, MAE, and adjusted R² values of 0.9790, 0.1376, 0.1000, and 0.9785, respectively. This model outperforms traditional hyperparameter optimization and XGBoost models combined with other algorithms. The interpretability analysis using SHAP reveals key features influencing predictions, highlighting the contributions of feature interactions and individual sample features. The results provide valuable insights for regulators and investors, facilitating the identification and assessment of corporate ESG greenwashing behaviors, thereby improving regulatory efficiency and investment decision-making.

Introduction

In recent years, Environmental, Social, and Governance (ESG) issues have gained significant attention from investors, regulators, and consumers, particularly regarding sustainable development performance. The People’s Bank of China has emphasized financial support for green initiatives, reflecting this trend. Concurrently, the integration of Explainable Artificial Intelligence (XAI) techniques, such as SHAP (SHapley Additive exPlanations), has become crucial for enhancing the interpretability of complex machine learning models. Studies have demonstrated the effectiveness of the XGBoost-SHAP framework across various domains, including economic resilience and educational assessments, highlighting its broad applicability and the importance of hyperparameter optimization for improving predictive performance.

This research addresses the gap in predicting corporate ESG greenwashing behaviors by proposing an improved Hunter-Prey Optimization (IHPO) algorithm for hyperparameter tuning of the XGBoost model. The study collects ESG-related data from Chinese A-share listed companies and constructs a corporate greenwashing prediction dataset. By applying the IHPO algorithm, the research aims to enhance the predictive accuracy of the XGBoost model and interpret the results using SHAP theory. The contributions of this study include the development of a novel prediction model for corporate ESG greenwashing, improved interpretability of model outputs, and a machine learning framework that may serve as a reference for future research in related fields.

Results

In this study, a corporate greenwashing dataset was utilized to train an XGBoost model, with an 80% training set and a 20% test set, employing a five-fold cross-validation method to enhance generalization to unseen data. Regularization parameters, specifically L1 (alpha) and L2 (lambda), were applied to mitigate overfitting, while hyperparameters such as $n_{\text{estimators}}$, $learning\_rate$, and $max\_depth$ were optimized using the IHPO algorithm. The optimized values obtained were $n_{\text{estimators}} = 450$, $learning\_rate = 0.1$, and $max\_depth = 6$, with Mean Squared Error (MSE) serving as the fitness function.

Validation of the model’s predictive performance on the test set revealed that the predicted values closely aligned with actual values, as illustrated in the visualizations. The absolute errors predominantly fell within the low range of 0 to 0.2, indicating robust prediction capabilities. The scatter plot of actual versus predicted values demonstrated a strong correlation, with most points lying within the 95% confidence interval of the ideal fit line $y = x$. Performance metrics for the IHPO-XGBoost model were reported as $R^2 = 0.9790$, RMSE = 0.1376, MAE = 0.1000, and Adjusted $R^2 = 0.9785$, underscoring the model’s excellent predictive performance.

Discussion

In this section, the research discusses the methodology for predicting corporate ESG (Environmental, Social, and Governance) greenwashing behaviors through the development of a Corporate Greenwashing Index (GWI). The GWI is calculated using a formula that compares a company’s ESG disclosure score (from Bloomberg) and its actual ESG performance score (from Wind), with a positive GWI indicating a higher degree of greenwashing. The study identifies a comprehensive set of 19 feature variables categorized into company characteristics, financial performance, operational efficiency, external influences, and social responsibility, which are essential for predicting greenwashing behaviors.

The dataset comprises observations from Chinese A-share listed companies from 2017 to 2022, excluding certain industries and financial anomalies. The GWI is designed to lag one year behind the feature variables to facilitate predictive analysis. The research employs an XGBoost model, optimized through a novel Intelligent Hyperparameter Optimization (IHPO) algorithm, to enhance predictive accuracy. The model’s performance is evaluated using metrics such as the coefficient of determination (R²), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), while interpretability is achieved through SHAP values, which elucidate the contribution of each feature to the model’s predictions. This structured approach aims to provide insights into corporate behaviors regarding ESG commitments and the potential for greenwashing.