نظام محرك توقع مخاطر الائتمان القائم على التعلم الآلي باستخدام مصنف مكدس وطريقة اختيار ميزات قائمة على الفلتر A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method

المجلة: Journal Of Big Data، المجلد: 11، العدد: 1
DOI: https://doi.org/10.1186/s40537-024-00882-0
تاريخ النشر: 2024-02-01
المؤلف: Ileberi Emmanuel وآخرون
الموضوع الرئيسي: التنبؤ بالضغوط المالية والإفلاس

نظرة عامة

تتناول ورقة البحث تحسين توقع مخاطر الائتمان من خلال نهج مصنف متراص مبتكر يدمج تقنية اختيار الميزات المعتمدة على الفلتر (FS) المستندة إلى نظرية كسب المعلومات (IG). يستخدم النموذج المقترح عدة مقدرين أساسيين، وتحديداً الغابة العشوائية (RF)، وتعزيز التدرج (GB)، وتعزيز التدرج المتطرف (XGB)، والتي ترتبط تسلسلياً لتحسين الأداء. تم تقييم النموذج بدقة باستخدام مقاييس الدقة، وF1-Score، والمساحة تحت المنحنى (AUC) عبر ثلاثة مجموعات بيانات: الأسترالية، والألمانية، وتايوان. أظهرت النتائج التجريبية أن النموذج المتراص حقق AUCs قدرها 0.934، 0.944، و0.870، على التوالي، متفوقاً على المقدرين الفرديين وطرق تقليدية أخرى مثل الشبكات العصبية الاصطناعية (ANN)، وأشجار القرار (DT)، وأقرب الجيران (KNN).

في الختام، نجحت الدراسة في تطوير نموذج توقع مخاطر الائتمان المعتمد على التعلم الآلي الذي أظهر مقاييس أداء متفوقة، بما في ذلك دقة قدرها 86.23%، وF1-Score قدره 84.58%، وAUC قدره 0.934 لمجموعة البيانات الأسترالية. حققت مجموعة البيانات الألمانية دقة قدرها 82.80% وAUC قدره 0.944، بينما سجلت مجموعة البيانات التايوانية دقة قدرها 85.80% وAUC قدره 0.870. تؤكد هذه النتائج فعالية المصنف المتراص في توقع مخاطر الائتمان. ستستكشف الأبحاث المستقبلية تقنيات اختيار الميزات المتقدمة وطرق التعزيز، بالإضافة إلى التطبيق المحتمل للهياكل المعتمدة على المحولات لتعزيز أداء النموذج في هذا المجال.

مقدمة

تسلط مقدمة ورقة البحث الضوء على أهمية التعلم الآلي في توقع مخاطر الائتمان، وهي مهمة حاسمة للمؤسسات المالية تهدف إلى تقييم احتمال تخلف العملاء عن سداد القروض والخدمات الائتمانية. يتم التأكيد على تحدي عدم التوازن في البيانات، حيث تتجاوز المعاملات المتخلفة بشكل كبير المعاملات غير المتخلفة، كعامل يمكن أن يؤثر سلباً على أداء نماذج التعلم الآلي. تم اقتراح تقنيات مختلفة لمعالجة هذه المشكلة، بما في ذلك التعلم الجماعي، والتعلم الحساس للتكلفة، وطرق إعادة أخذ العينات، مع الإشارة بشكل خاص إلى فعالية التعلم الجماعي.

تقترح الورقة طريقة اختيار ميزات معتمدة على الفلتر (FS) تستخدم كسب المعلومات (IG) لتعزيز أداء النموذج من خلال اختيار السمات الأكثر صلة قبل عملية النمذجة. بالإضافة إلى ذلك، تم تطوير نموذج جماعي متعدد المستويات باستخدام التراص، الذي يدمج مخرجات عدة مصنفات—تحديداً تعزيز التدرج، والغابة العشوائية، وتعزيز التدرج المتطرف—في نموذج توقع نهائي. تهدف الدراسة إلى إثبات فعالية طريقة اختيار الميزات المعتمدة على فلتر IG والنموذج المتراص من خلال مقارنة أدائها ضد المقدرين الفرديين عبر مجموعات بيانات مخاطر الائتمان المتعددة. تم توضيح هيكل الورقة، مشيراً إلى الأقسام المخصصة للأعمال ذات الصلة، وطرق التعلم الآلي، ومجموعات البيانات، ومنهجية البحث، واختيار الميزات، والإطار المقترح، ومقاييس الأداء.

طرق

توضح قسم الطرق في ورقة البحث تقنيات التعلم الآلي المستخدمة، بما في ذلك الغابة العشوائية (RF)، وأقرب الجيران (KNN)، والشبكات العصبية الاصطناعية (ANNs). يستخدم خوارزمية RF مجموعة من أشجار القرار (DTs) لإجراء التوقعات من خلال آلية تصويت الأغلبية، حيث يتم اختيار الفئة التي حصلت على أكبر عدد من الأصوات. من ناحية أخرى، يصنف KNN نقاط البيانات بناءً على المسافة الإقليدية إلى أقرب الجيران، مع افتراض أن النقاط المتشابهة ستنتج توقعات مشابهة. تُبرز الشبكات العصبية الاصطناعية، وتحديداً الشبكات الأمامية، لبساطتها وكفاءتها في التدريب، القادرة على معالجة كل من المشكلات الخطية وغير الخطية من خلال وظائف تنشيط متنوعة.

تشمل منهجية البحث إعدادًا تجريبيًا تم تنفيذه على Google Colab، باستخدام Scikit-Learn كإطار عمل للتعلم الآلي. تشمل مقاييس الأداء لتقييم فعالية المصنفات الدقة، وF1-Score، والمساحة تحت منحنى ROC (AUC). تُشتق هذه المقاييس من العد الإيجابي الحقيقي (TP)، والعد السلبي الحقيقي (TN)، والعد الإيجابي الزائف (FP)، والعد السلبي الزائف (FN)، مع توفير F1-Score مقياسًا أكثر توازنًا في حالات مجموعات البيانات غير المتوازنة. تعكس AUC قدرة النموذج على التمييز بين العينات الإيجابية والسلبية، مما يدل على أدائه العام في التصنيف.

نتائج

تظهر نتائج التجارب التي أجريت في بيئة محاكاة فعالية النموذج المتراص عبر مجموعات بيانات متعددة. لمجموعة البيانات الأسترالية، حقق النموذج المتراص دقة قدرها 86.23%، وF1-Score قدره 84.58%، وAUC قدره 0.934، متفوقاً على النماذج الفردية مثل KNN، الذي سجل دقة قدرها 70.28%، وRF، الذي حقق 87.68%. بالمثل، في مجموعة البيانات الألمانية، قاد النموذج المتراص بدقة قدرها 82.80%، وF1-Score قدره 86.35%، وAUC قدره 0.944، متجاوزاً بشكل كبير طريقة KNN، التي كانت لديها دقة قدرها 68.40%.

أكدت مجموعة البيانات التايوانية هذه النتائج، حيث حقق النموذج المتراص دقة قدرها 86.23%، تليه نموذج RF بنسبة 87%. ومن الجدير بالذكر أن النموذج المتراص تفوق باستمرار على منهجيات مختلفة تم الإشارة إليها في الأبحاث السابقة، بما في ذلك ANN، وNB، وKNN، بفروقات قدرها 5.35%، و5.6%، و10.6% على التوالي في مجموعة البيانات الألمانية. بشكل عام، تشير النتائج إلى أن النهج المتراص يوفر أداءً متفوقًا من حيث الدقة وAUC عبر جميع مجموعات البيانات المختبرة، مما يعزز إمكانيته كتقنية نمذجة تنبؤية قوية.

مناقشة

في قسم المناقشة، يستعرض المؤلفون دراسات مختلفة حول تقييم مخاطر الائتمان باستخدام تقنيات التعلم الآلي (ML)، مع تسليط الضوء على المنهجيات ومقاييس الأداء المستخدمة. استخدم Pande وآخرون مصنفات مثل الشبكات العصبية الاصطناعية (ANN)، وأقرب الجيران (KNN)، ونايف بايز (NB) على مجموعة بيانات المخاطر الائتمانية الألمانية، محققين دقة قدرها 77.45%، و77.20%، و72.20%، على التوالي، لكنهم لم يقيموا مقاييس إضافية مثل F1-Score أو المساحة تحت المنحنى (AUC). بالمثل، أفاد Zhang وآخرون بدقة قدرها 80% لآلة الدعم التكيفية (AdaSVM) على مجموعة البيانات الأسترالية دون تقييم مقاييس إضافية. ركزت دراسات أخرى، بما في ذلك تلك التي أجراها ناصر ومريم، وHsu وآخرون، وHa وآخرون، أيضًا بشكل أساسي على الدقة، وغالبًا ما أهملت التقييمات الشاملة للأداء، مما يحد من قوة نتائجهم.

يؤكد المؤلفون على أهمية استخدام مجموعة متنوعة من مقاييس الأداء، بما في ذلك F1-Score وAUC، لتوفير تقييم أكثر دقة لفعالية النموذج. يقترحون إطار عمل جديد لتوقع مخاطر الائتمان يدمج طريقة اختيار الميزات المعتمدة على كسب المعلومات (IG) وخوارزمية التراص التي تشمل عدة مصنفات. أظهر النموذج المقترح أداءً متفوقًا عبر ثلاث مجموعات بيانات، محققًا دقة قدرها 86.23%، وF1-Score قدره 84.58%، وAUC قدره 0.934 على مجموعة البيانات الأسترالية، متفوقًا على المصنفات الفردية والخوارزميات الحالية. يقترح المؤلفون اتجاهات البحث المستقبلية التي تركز على تعزيز اختيار الميزات واستكشاف الهياكل المعتمدة على المحولات لتحسين قدرات توقع مخاطر الائتمان بشكل أكبر.

Journal: Journal Of Big Data, Volume: 11, Issue: 1
DOI: https://doi.org/10.1186/s40537-024-00882-0
Publication Date: 2024-02-01
Author(s): Ileberi Emmanuel et al.
Primary Topic: Financial Distress and Bankruptcy Prediction

Overview

The research paper focuses on enhancing credit risk prediction through a novel stacked classifier approach that integrates a filter-based feature selection (FS) technique grounded in information gain (IG) theory. The proposed model employs multiple base estimators, specifically Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB), which are sequentially linked to optimize performance. The model was rigorously evaluated using accuracy, F1-Score, and Area Under the Curve (AUC) metrics across three datasets: Australian, German, and Taiwan. The experimental results indicated that the stacked model achieved AUCs of 0.934, 0.944, and 0.870, respectively, outperforming individual estimators and other conventional methods such as Artificial Neural Networks (ANN), Decision Trees (DT), and k-Nearest Neighbors (KNN).

In conclusion, the study successfully developed a machine learning-based credit risk prediction model that demonstrated superior performance metrics, including an accuracy of 86.23%, F1-Score of 84.58%, and AUC of 0.934 for the Australian dataset. The German dataset yielded an accuracy of 82.80% and an AUC of 0.944, while the Taiwan dataset recorded an accuracy of 85.80% and an AUC of 0.870. These findings underscore the efficacy of the stacked classifier in credit risk prediction. Future research will explore advanced feature selection and augmentation techniques, as well as the potential application of transformer-based architectures to further enhance model performance in this domain.

Introduction

The introduction of the research paper highlights the significance of machine learning in credit risk prediction, a critical task for financial institutions that aims to assess the likelihood of customers defaulting on loans and credit services. The challenge of class imbalance in datasets, where default transactions significantly outnumber non-default transactions, is emphasized as a factor that can adversely affect the performance of machine learning models. Various techniques have been proposed to address this issue, including ensemble learning, cost-sensitive learning, and re-sampling methods, with ensemble learning being particularly noted for its effectiveness.

The paper proposes a filter-based feature selection (FS) method utilizing Information Gain (IG) to enhance model performance by selecting the most relevant attributes before the modeling process. Additionally, a multilevel ensemble model is developed using stacking, which integrates the outputs of multiple classifiers—specifically Gradient Boosting, Random Forest, and Extreme Gradient Boosting—into a final prediction model. The research aims to demonstrate the efficacy of the IG filter-based FS method and the stacked model by comparing their performance against individual estimators across multiple credit-risk datasets. The structure of the paper is outlined, indicating sections dedicated to related work, machine learning methods, datasets, research methodology, feature selection, the proposed framework, and performance metrics.

Methods

The methods section of the research paper outlines the machine learning techniques employed, including Random Forest (RF), K-Nearest Neighbor (KNN), and Artificial Neural Networks (ANNs). The RF algorithm utilizes an ensemble of decision trees (DTs) to make predictions through a majority voting mechanism, where the class with the most votes is selected. KNN, on the other hand, classifies data points based on the Euclidean distance to their nearest neighbors, assuming that similar points will yield similar predictions. ANNs, specifically feed-forward networks, are highlighted for their simplicity and efficiency in training, capable of addressing both linear and non-linear problems through various activation functions.

The research methodology includes an experimental setup executed on Google Colab, utilizing Scikit-Learn as the machine learning framework. Performance metrics for evaluating classifier effectiveness include accuracy, F1-score, and Area Under the ROC Curve (AUC). These metrics are derived from true positive (TP), true negative (TN), false positive (FP), and false negative (FN) counts, with F1-score providing a more balanced measure in cases of imbalanced datasets. AUC reflects the model’s ability to distinguish between positive and negative samples, indicating its overall classification performance.

Results

The results of the experiments conducted in a simulated environment demonstrate the effectiveness of the Stacked model across multiple datasets. For the Australian dataset, the Stacked model achieved an accuracy of 86.23%, an F1-Score of 84.58%, and an AUC of 0.934, outperforming individual models such as KNN, which recorded an accuracy of 70.28%, and RF, which achieved 87.68%. Similarly, in the German dataset, the Stacked model led with an accuracy of 82.80%, an F1-Score of 86.35%, and an AUC of 0.944, significantly surpassing the KNN method, which had an accuracy of 68.40%.

The Taiwan dataset further corroborated these findings, with the Stacked model yielding an accuracy of 86.23%, closely following the RF model at 87%. Notably, the Stacked model consistently outperformed various methodologies referenced in prior research, including ANN, NB, and KNN, by margins of 5.35%, 5.6%, and 10.6% respectively on the German dataset. Overall, the results indicate that the Stacked approach provides superior performance in terms of accuracy and AUC across all datasets tested, reinforcing its potential as a robust predictive modeling technique.

Discussion

In the discussion section, the authors review various studies on credit risk assessment using machine learning (ML) techniques, highlighting the methodologies and performance metrics employed. Pande et al. utilized classifiers such as Artificial Neural Networks (ANN), k-Nearest Neighbors (KNN), and Naive Bayes (NB) on the German credit risk dataset, achieving accuracies of 77.45%, 77.20%, and 72.20%, respectively, but did not assess additional metrics like F1-Score or Area Under the Curve (AUC). Similarly, Zhang et al. reported an accuracy of 80% for their adaptive support vector machine (AdaSVM) on the Australian dataset without further evaluation metrics. Other studies, including those by Nasser and Maryam, Hsu et al., and Ha et al., also focused primarily on accuracy, often neglecting comprehensive performance assessments, which limits the robustness of their findings.

The authors emphasize the importance of using a diverse set of performance metrics, including F1-Score and AUC, to provide a more nuanced evaluation of model effectiveness. They propose a novel credit risk prediction framework that incorporates a feature selection method based on Information Gain (IG) and a stacking algorithm involving multiple classifiers. The proposed model demonstrated superior performance across three datasets, achieving an accuracy of 86.23%, F1-Score of 84.58%, and AUC of 0.934 on the Australian dataset, outperforming individual classifiers and existing algorithms. The authors suggest future research directions focused on enhancing feature selection and exploring transformer-based architectures to further improve credit risk prediction capabilities.