توقع مخاطر الائتمان باستخدام التعلم الآلي والتعلم العميق: دراسة على عملاء بطاقات الائتمان Credit Risk Prediction Using Machine Learning and Deep Learning: A Study on Credit Card Customers

المجلة: Risks، المجلد: 12، العدد: 11
DOI: https://doi.org/10.3390/risks12110174
تاريخ النشر: 2024-11-04
المؤلف: Victor Chang وآخرون
الموضوع الرئيسي: التنبؤ بالضغوط المالية والإفلاس

نظرة عامة

تتناول هذه الورقة البحثية التحدي المتزايد لإدارة مخاطر الائتمان الذي تواجهه شركات بطاقات الائتمان العالمية بسبب زيادة إنفاق المستهلكين ونمو السكان. تركز الدراسة على تصنيف عملاء بطاقات الائتمان إلى فئات “جيدة” أو “سيئة” للتخفيف من الخسائر المحتملة في رأس المال. يتم استخدام نماذج متعددة من التعلم الآلي، بما في ذلك الشبكات العصبية، والانحدار اللوجستي، وأدا بوست، وXGBoost، وLightGBM، للتنبؤ بحالة التخلف عن السداد للعملاء. يتم تقييم أداء هذه النماذج باستخدام مقاييس مثل الدقة، والدقة، والاسترجاع، ودرجة F1، وROC، ومعامل ارتباط ماثيو (MCC).

تكشف النتائج أن XGBoost تحقق أعلى دقة بنسبة 99.4%، متفوقة على النماذج الأخرى المختبرة. تؤكد هذه الدراسة على أهمية تحليل مخاطر الائتمان الفعال في تسهيل اتخاذ قرارات الإقراض المستنيرة وتبرز التحسينات الكبيرة في دقة التنبؤ التي توفرها تطبيقات خوارزميات التعلم الآلي والتعلم العميق في هذا المجال.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الدور الحاسم لبطاقات الائتمان في المعاملات المالية الحديثة وضرورة تقييم المؤسسات المالية لمخاطر الائتمان بشكل فعال. أصبحت بطاقات الائتمان أساسية للمستهلكين، ويجب على البنوك تقييم جدارة المتقدمين للحصول على الائتمان للتخفيف من مخاطر التخلف عن السداد. تعرف الورقة العملاء “الجيدين” و”السيئين” بناءً على تاريخ الدفع الخاص بهم، مع التأكيد على أهمية درجات الائتمان كأداة لإدارة المخاطر. تحدد خمس فئات من العملاء، حيث تعتبر فئة “المتعثرين” مربحة بشكل خاص لشركات بطاقات الائتمان، بينما تشكل فئة “غير المدفوعين” مخاطر كبيرة.

تهدف الدراسة إلى تعزيز تقييم مخاطر الائتمان من خلال خوارزميات التعلم الآلي والتعلم العميق المتقدمة، مما يمكّن من تصنيف العملاء بدقة أكبر وتقليل الخسائر. تؤكد الورقة على الطبيعة الديناميكية للمشهد المالي، المتأثر بعوامل مثل الركود الاقتصادي والأوبئة، والتي يمكن أن تزيد من الأصول غير المنتجة (NPAs). من خلال تبسيط عملية تقييم مخاطر الائتمان، تسعى الدراسة إلى مساعدة المؤسسات المالية في الحفاظ على معدلات NPAs قابلة للإدارة وتعزيز النمو المستدام. تشمل المنهجيات المستكشفة تقنيات التعلم الآلي المختلفة، مثل الغابات العشوائية والشبكات العصبية، لتحديد فئات العملاء الأكثر خطورة وتحسين اتخاذ القرار في إصدار بطاقات الائتمان.

الطرق

تستخدم الدراسة مجموعة بيانات ثانوية مأخوذة من Kaggle، تتكون من ملفين رئيسيين: `application_record.csv` و`credit_record.csv`. يحتوي مجموعة بيانات `application_record` على خصائص متنوعة للمتقدمين، بما في ذلك المعلومات الديموغرافية، والحالة المالية، وتفاصيل الاتصال، بينما تتعقب مجموعة بيانات `credit_record` استخدام بطاقات الائتمان وتاريخ الدفع، مع التركيز على حالة المدفوعات على مر الزمن. ترتبط مجموعات البيانات بواسطة معرف عميل فريد، مما يسهل التحليل الشامل للعلاقة بين خصائص المتقدمين وسلوك الائتمان.

تشمل المنهجية عملية تحليلية من خمس خطوات، كما هو موضح في الشكل 1. في البداية، خضعت مجموعة البيانات للاختبار والتنظيف لتحسين جودة البيانات. بعد ذلك، تم إجراء تحليل استكشافي للبيانات للحصول على رؤى حول هيكل وخصائص مجموعة البيانات. بعد ذلك، تم استخدام ست منهجيات متميزة لمقارنة النتائج المتوقعة مع النتائج الفعلية، باستخدام مصفوفة الارتباك للتقييم. تم تقييم أداء كل منهجية من خلال مقاييس متنوعة، بما في ذلك الدقة، والاسترجاع، والدقة، ودرجة F1، وROC-AUC، ومعامل ارتباط ماثيو (MCC)، مما يضمن تحليلًا قويًا لقدرات النماذج التنبؤية.

النتائج

في هذا القسم، يقدم المؤلفون مصفوفة الارتباك كإطار أساسي لتقييم أداء خوارزميات التعلم الآلي المنفذة. توفر مصفوفة الارتباك نظرة شاملة على نتائج التصنيف، مما يسمح بتقييم الإيجابيات الحقيقية، والإيجابيات الزائفة، والسلبيات الحقيقية، والسلبيات الزائفة.

بعد هذه المقدمة، يقدم المؤلفون النتائج التي تم الحصول عليها من نماذج التعلم الآلي المختارة، مع تسليط الضوء على مقاييس الأداء المستمدة من مصفوفة الارتباك. بالإضافة إلى ذلك، يتم إجراء تحليل مفصل على النموذج الأفضل أداءً، مما يوفر رؤى حول فعاليته والمجالات المحتملة للتحسين. تسهل هذه الطريقة المنظمة فهمًا واضحًا لقدرات النماذج وآثار نتائجها.

المناقشة

تهدف الدراسة إلى تطوير طريقة آلية للتنبؤ بحالة التخلف عن السداد للمستهلكين بناءً على بيانات طلب بطاقة الائتمان. تشمل الأهداف الرئيسية تحديد الميزات الحاسمة للتنبؤ بالتخلف، وتنفيذ تقنيات توازن البيانات، واستكشاف طرق التعلم الآلي والتعلم العميق المختلفة، وتقييم مقاييس أداء النموذج. تؤكد الدراسة على أهمية النماذج القوية التي تحافظ على الدقة عبر ظروف اقتصادية مختلفة. ومن الجدير بالذكر أن XGBoost ظهرت كأفضل خوارزمية من بين ستة تم اختبارها، حيث تصنف العملاء بفعالية إلى فئات “جيدة” و”سيئة” بناءً على ميزات مثل العمر، والدخل، ومدة العمل.

تسلط المناقشة الضوء على أهمية تحليل مخاطر الائتمان في مختلف الصناعات، وخاصة في القطاع المصرفي، حيث يعد التمييز بين المتقدمين المؤهلين وغير المؤهلين أمرًا حاسمًا. تؤكد مراجعة الأدبيات على الاعتماد المتزايد على تقنيات التعلم الآلي، بما في ذلك طرق التجميع والتعلم العميق، لتعزيز دقة تقييم الائتمان. تتناول الورقة أيضًا التحديات المتعلقة بتطبيق تقنيات التعلم المعزز والتحسين في السياقات المالية، داعية إلى دمج نماذج التعلم الآلي التي توازن بين الأداء التنبؤي والامتثال التنظيمي. بشكل عام، تسهم النتائج في التطور المستمر لأساليب تقييم مخاطر الائتمان، مع التأكيد على الحاجة إلى خوارزميات تكيفية وفعالة في مواجهة ظروف السوق الديناميكية.

القيود

تعترف الدراسة بعدة قيود يمكن معالجتها في الأبحاث المستقبلية. أولاً، فإن دمج بيانات بطاقات الائتمان الأكثر حداثة وتنوعًا، التي تشمل مجموعة أوسع من الميزات ومجموعة بيانات أكبر، من شأنه تعزيز قوة النموذج. يبرز استبعاد خوارزمية آلة الدعم الشعاعي (SVM)، بسبب تكلفتها العالية من حيث الحساب والوقت، الحاجة إلى نماذج تنبؤية أكثر كفاءة. علاوة على ذلك، تعترف الدراسة بأن العوامل الاقتصادية الكلية—مثل معدل التضخم، ومعدل الفائدة، والناتج المحلي الإجمالي، ومعدل البطالة—تؤثر بشكل كبير على حالة التخلف عن السداد، مما يشير إلى أن مجموعات البيانات المستقبلية يجب أن تدمج هذه المتغيرات لتحسين دقة التنبؤ.

لتقدم هذا البحث، يجب أن تتضمن الأعمال المستقبلية منهجيات التحقق من صحة K-fold جنبًا إلى جنب مع مصنفات التعلم الآلي المستخدمة في الدراسة الحالية، مما يتيح الاستفادة من التقنيات المعاصرة والذكاء الاصطناعي القابل للتوسع لتحديد العملاء المؤهلين بشكل أفضل. بالإضافة إلى ذلك، يمكن أن يسمح استكشاف طرق التعلم التكيفية، مثل التعلم عبر الإنترنت، بإجراء تعديلات ديناميكية على معلمات النموذج استجابةً لأنماط البيانات المتطورة. يُوصى أيضًا بمراقبة الأداء المستمرة في التطبيقات الواقعية للكشف بسرعة عن أي تدهور في دقة النموذج وتصحيحه، مما يضمن بقاء النموذج فعالًا مع مرور الوقت.

Journal: Risks, Volume: 12, Issue: 11
DOI: https://doi.org/10.3390/risks12110174
Publication Date: 2024-11-04
Author(s): Victor Chang et al.
Primary Topic: Financial Distress and Bankruptcy Prediction

Overview

This research paper addresses the growing challenge of credit risk management faced by global credit card companies due to increasing consumer spending and population growth. The study focuses on the classification of credit card customers into “good” or “bad” categories to mitigate potential capital losses. Various machine-learning models, including neural networks, logistic regression, AdaBoost, XGBoost, and LightGBM, are employed to predict customer default status. The performance of these models is evaluated using metrics such as accuracy, precision, recall, F1 score, ROC, and Matthews correlation coefficient (MCC).

The findings reveal that XGBoost achieves the highest accuracy at 99.4%, outperforming the other models tested. This research underscores the importance of effective credit risk analysis in facilitating informed lending decisions and highlights the significant improvements in predictive accuracy afforded by the application of machine-learning and deep-learning algorithms in this field.

Introduction

The introduction of this research paper highlights the critical role of credit cards in modern financial transactions and the necessity for financial institutions to assess credit risk effectively. Credit cards have become essential for consumers, and banks must evaluate applicants’ creditworthiness to mitigate the risk of defaults. The paper defines “good” and “bad” customers based on their payment history, emphasizing the importance of credit scores as a risk management tool. It identifies five customer categories, with “revolvers” being particularly profitable for credit card companies, while “non-payers” pose significant risks.

The research aims to enhance credit risk assessment through advanced machine-learning and deep-learning algorithms, enabling more accurate customer classification and minimizing losses. The paper underscores the dynamic nature of the financial landscape, influenced by factors such as economic downturns and pandemics, which can increase non-performing assets (NPAs). By streamlining the credit risk evaluation process, the study seeks to help financial institutions maintain manageable NPA rates and foster sustainable growth. The methodologies explored include various machine-learning techniques, such as random forest and neural networks, to identify the riskiest customer categories and improve decision-making in credit card issuance.

Methods

The research utilizes a secondary dataset sourced from Kaggle, comprising two primary files: `application_record.csv` and `credit_record.csv`. The `application_record` dataset contains various applicant characteristics, including demographic information, financial status, and contact details, while the `credit_record` dataset tracks credit card usage and payment history, with a focus on the status of payments over time. The datasets are linked by a unique client ID, facilitating a comprehensive analysis of the relationship between applicant attributes and credit behavior.

The methodology encompasses a five-step analytical process, as illustrated in Figure 1. Initially, the dataset underwent testing and cleaning to enhance data quality. Subsequently, exploratory data analysis was performed to gain insights into the dataset’s structure and characteristics. Following this, six distinct methodologies were employed to compare predicted outcomes against actual results, utilizing a confusion matrix for evaluation. The performance of each methodology was assessed through various metrics, including accuracy, recall, precision, F1 score, ROC-AUC, and Matthews correlation coefficient (MCC), ensuring a robust analysis of the predictive capabilities of the models.

Results

In this section, the authors introduce the confusion matrix as the foundational framework for evaluating the performance of the implemented machine-learning algorithms. The confusion matrix provides a comprehensive overview of the classification results, allowing for the assessment of true positives, false positives, true negatives, and false negatives.

Following this introduction, the authors present the results obtained from their selected machine-learning models, highlighting the performance metrics derived from the confusion matrix. Additionally, a detailed analysis is conducted on the best-performing model, offering insights into its effectiveness and potential areas for improvement. This structured approach facilitates a clear understanding of the models’ capabilities and the implications of their results.

Discussion

The research aims to develop an automated method for predicting consumer default status based on credit card application data. Key objectives include identifying critical features for predicting defaults, implementing data balancing techniques, exploring various machine-learning and deep-learning methods, and assessing model performance metrics. The study emphasizes the importance of robust models that maintain accuracy across different economic conditions. Notably, XGBoost emerged as the top-performing algorithm among six tested, effectively classifying customers into “good” and “bad” categories based on features such as age, income, and employment duration.

The discussion highlights the significance of credit risk analysis in various industries, particularly in banking, where distinguishing between creditworthy and non-creditworthy applicants is crucial. The literature review underscores the growing reliance on machine learning techniques, including ensemble methods and deep learning, to enhance credit scoring accuracy. The paper also addresses the challenges of applying reinforcement learning and optimization techniques in financial contexts, advocating for the integration of machine-learning models that balance predictive performance with regulatory compliance. Overall, the findings contribute to the ongoing evolution of credit risk assessment methodologies, emphasizing the need for adaptive and efficient algorithms in the face of dynamic market conditions.

Limitations

The study acknowledges several limitations that could be addressed in future research. Firstly, the incorporation of more recent and diverse credit card data, encompassing a broader range of features and a larger dataset, would enhance the model’s robustness. The exclusion of the Support Vector Machine (SVM) algorithm, due to its high computational cost and time requirements, highlights the need for more efficient predictive models. Furthermore, the research recognizes that macroeconomic factors—such as inflation rate, interest rate, GDP, and unemployment rate—significantly influence credit default status, suggesting that future datasets should integrate these variables for improved predictive accuracy.

To advance this research, future work should implement K-fold validation methodologies alongside the machine-learning classifiers utilized in the current study, thereby leveraging contemporary techniques and scalable AI to better identify creditworthy customers. Additionally, exploring adaptive learning methods, such as online learning, could allow for dynamic adjustments to model parameters in response to evolving data patterns. Continuous performance monitoring in real-world applications is also recommended to swiftly detect and rectify any declines in model accuracy, ensuring the model remains effective over time.