Ex-ADA: إطار عمل قابل للتفسير يعتمد على SHAP للتنبؤ بالطلاب المعرضين للخطر Ex-ADA: a SHAP-based explainable AdaBoost framework for predicting at-risk students

المجلة: Frontiers in Education، المجلد: 10
DOI: https://doi.org/10.3389/feduc.2025.1728070
تاريخ النشر: 2026-01-14
المؤلف: Emrah Arslan وآخرون
الموضوع الرئيسي: التعلم عبر الإنترنت والتحليلات

نظرة عامة

تقدم ورقة البحث Ex-ADA، وهو إطار عمل قابل للتفسير يعتمد على AdaBoost مصمم لتعزيز التعرف المبكر على الطلاب المعرضين للخطر أكاديميًا في التعليم العالي. غالبًا ما تفتقر النماذج التنبؤية التقليدية إلى الشفافية، مما يقوض فعاليتها وثقة المعلمين. يدمج Ex-ADA SHapley Additive exPlanations (SHAP) مع قدرات التعلم الجماعي لـ AdaBoost، مستفيدًا من بيانات 642 طالبًا في دورة برمجة في جامعة بلوفديف. يحقق الإطار دقة مثيرة للإعجاب تبلغ 84.12% وAUC تبلغ 92.31%، متجاوزًا المصنفات التقليدية مثل الجيران الأقرب k وأشجار القرار. يحدد تحليل SHAP الحضور، وأداء منتصف الفصل، وإكمال الواجبات المنزلية كمتنبئات حاسمة لنجاح الطلاب، مما يوفر للمدرسين رؤى قابلة للتنفيذ للتدخلات في الوقت المناسب.

تؤكد الدراسة على قوتي Ex-ADA المزدوجتين في الدقة التنبؤية وقابلية التفسير، مما يمكّن من تقديم تفسيرات شخصية لعوامل خطر الطلاب الفردية. لا تساعد هذه الطريقة فقط في استراتيجيات التدخل المبكر ولكنها تعزز أيضًا عملية اتخاذ القرار الأكثر إبلاغًا للمعلمين. ستهدف الأبحاث المستقبلية إلى توسيع تطبيق الإطار من خلال دمج البيانات متعددة الوسائط، وتطوير أدوات المراقبة في الوقت الحقيقي، ومعالجة العدالة والمساءلة في التحليلات التنبؤية. سيكون التحقق عبر مجموعات بيانات متنوعة أمرًا حاسمًا لتأسيس متانة Ex-ADA ضمن أنظمة الإنذار المبكر المؤسسية، مما يعزز في النهاية فائدته في تعزيز الاحتفاظ بالطلاب والنجاح الأكاديمي.

مقدمة

تسلط مقدمة ورقة البحث هذه الضوء على التبني السريع لبيئات التعلم الرقمية، والتي أنتجت كميات هائلة من البيانات التعليمية، مما يقدم فرصًا لتعزيز الاحتفاظ بالطلاب وفعالية التدريس. تعتبر التحديات الحرجة في التعليم العالي هي التعرف في الوقت المناسب على الطلاب المعرضين لخطر التسرب الأكاديمي بسبب عوامل سلوكية وعاطفية. تشير الأبحاث إلى أن الكشف المبكر والتدخلات المخصصة يمكن أن تحسن بشكل كبير النجاح الأكاديمي ومعدلات الاحتفاظ. غالبًا ما تعتمد أنظمة الإنذار المبكر التقليدية (EWS) على مؤشرات ثابتة، مما يفتقر إلى القابلية للتفسير والقدرة على التكيف اللازمة لكسب ثقة المعلمين. تهدف ظهور الذكاء الاصطناعي القابل للتفسير (XAI) إلى معالجة هذه النواقص من خلال تقديم توقعات شفافة وقابلة للتفسير تدعم اتخاذ القرار التربوي.

تقدم الورقة Ex-ADA، وهو إطار عمل قابل للتفسير يعتمد على AdaBoost يجمع بين قدرات التجميع التكيفية لـ AdaBoost وقوى التفسير لـ SHapley Additive exPlanations (SHAP). تم تصميم هذا النظام الهجين للتنبؤ بدقة بالطلاب المعرضين للخطر مع تقديم رؤى واضحة حول العوامل المساهمة، مثل GPA السابق ومقاييس المشاركة. يسمح دمج SHAP بتقديم تفسيرات فردية، مما يعزز فائدة النموذج كأداة تشخيصية. تظهر الأبحاث أن Ex-ADA يحقق منطقة تحت المنحنى (AUC) تبلغ 92.31% ودقة تبلغ 84.12% في التنبؤ بالطلاب المعرضين للخطر، بينما يعالج أيضًا قيود النماذج التقليدية من خلال التكيف مع مسارات التعلم الفردية وضمان نشر الذكاء الاصطناعي بشكل أخلاقي. توضح الورقة مساهماتها، بما في ذلك تحسين الشفافية في اتخاذ القرارات التعليمية والتحقق ضد نماذج المعايير، مما يمهد الطريق لاستكشافات إضافية في الأقسام اللاحقة.

الطرق

تحدد المنهجية المقترحة إطارًا تحليليًا منظمًا يهدف إلى تطوير نموذج تنبؤي لمجموعات البيانات التعليمية التي تتميز ببيانات غير مكتملة، ونقاط شاذة، وعدم توازن الفئات، مع ضمان القابلية للتفسير وإمكانية التكرار. تتكون المنهجية من سبع خطوات، تبدأ باستيراد البيانات ومعالجتها مسبقًا لضمان جودة البيانات وجاهزيتها للتحليل. بعد ذلك، يتم تقسيم مجموعة البيانات إلى مجموعات تدريب واختبار من خلال تقنية التحقق المتقاطع الطبقي لتسهيل تقييم شامل.

بعد تدريب وتقييم خمسة خوارزميات تعلم إشرافي، يتم تنفيذ عملية اختيار الميزات لتعزيز كفاءة النموذج وقابليته للتفسير. ثم يتم إعادة تدريب النموذج الأفضل أداءً باستخدام الميزات المختارة، ويتم تقييم أدائه باستخدام مقاييس معتمدة. أخيرًا، يسمح استخدام SHAP (SHapley Additive exPlanations) بإجراء تحليل قابلية تفسير متعمق، مما يوفر رؤى محلية وعالمية حول عملية اتخاذ القرار للنموذج. يتم تقديم نظرة عامة تخطيطية عن سير العمل المنهجي هذا في الشكل 1.

النتائج

تظهر نتائج الدراسة اختلافات كبيرة في أداء خمسة مصنفات—الجيران الأقرب (KNN)، شجرة القرار، AdaBoost، بايز الساذج، والبيرسيبترون متعدد الطبقات (MLP)—في تحديد الطلاب المعرضين للخطر. حقق AdaBoost أعلى قيمة لمنطقة تحت المنحنى (AUC) تبلغ 91.53% في تحليل منحنى التشغيل المستلم (ROC)، مما يشير إلى قدرته التمييزية القوية. تلا ذلك KNN مع AUC تبلغ 85.09%، بينما وصلت شجرة القرار إلى AUC تبلغ 87.38%. أظهر بايز الساذج أدنى أداء بدقة تبلغ 73.20%، وأظهر MLP تعميمًا جيدًا مع AUC تبلغ 86.39%، على الرغم من بعض عدم الاستقرار.

تم التحقق من أداء AdaBoost من خلال تدابير القوة الإحصائية، حيث حقق دقة تبلغ 84.12% ± 1.34 وAUC تبلغ 92.31% ± 0.97 عبر خمسة طيات تحقق متقاطع، مما يشير إلى تباين منخفض واستقرار عالٍ. تؤكد النتائج فعالية آلية إعادة الوزن التكيفية لـ AdaBoost، المعززة من خلال زيادة العينة المعتمدة على SMOTE. بالإضافة إلى ذلك، تسلط الدراسة الضوء على الآثار التربوية لأنشطة MidTerm2-Practice، مشيرة إلى أن انخفاض المشاركة في هذه المهام يرتبط بالخطر الأكاديمي، بينما تشير المشاركة المستمرة إلى سلوكيات تعلم مستقرة. تدعم هذه الرؤية الحاجة إلى استراتيجيات التدخل المبكر، بما يتماشى مع تأكيد الدراسة على القابلية للتفسير في مخرجات النموذج (Coroama وGroza، 2022).

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقدم في الأنظمة التنبؤية لتحديد الطلاب المعرضين للخطر من خلال دمج الذكاء الاصطناعي القابل للتفسير (XAI)، وتقنيات التجميع، وتعلم الآلة. يستعرض دراسات هامة، مثل RADAR، التي حققت دقة تبلغ 82.22% في التنبؤ بالتسرب من خلال استخدام نماذج شجرة القرار والتأكيد على القابلية للتفسير. تشمل الأطر الأخرى الملحوظة نظام التنبؤ بالتعليقات المبكرة (EFPS)، الذي قام بتكييف التنبؤات بناءً على بيانات التقييم المستمر، ونظام الكشف المبكر (EDS)، الذي أظهر معدلات دقة عالية باستخدام خوارزميات التجميع مثل AdaBoost. توضح هذه الأنظمة مجتمعة فعالية النماذج التكيفية في توفير تدخلات في الوقت المناسب للاحتفاظ بالطلاب.

تحدد القسم أيضًا الفجوات المنهجية الحرجة في الأبحاث الحالية، ولا سيما الحاجة إلى أطر موحدة تدمج بسلاسة خوارزميات التعزيز مع تفسيرات قائمة على SHAP لتعزيز كل من الدقة التنبؤية وقابلية التفسير. على الرغم من النتائج الواعدة من دراسات مختلفة، تكافح العديد من النماذج الحالية مع قابلية التوسع والشفافية، مما يشير إلى الحاجة الملحة لأساليب شاملة توازن بين هذه الجوانب. تؤكد النتائج على أهمية تطوير نماذج قوية وقابلة للتفسير يمكن نشرها بفعالية عبر سياقات تعليمية متنوعة، مما يسهم في تحسين النتائج الأكاديمية واتخاذ قرارات مستنيرة في البيئات التعليمية.

القيود

تعترف الدراسة بعدة قيود قد تؤثر على قابلية تعميم إطار عمل Ex-ADA، على الرغم من أدائه التنبؤي القوي وقابليته للتفسير. أولاً، مجموعة البيانات مقيدة بدورة واحدة في جامعة واحدة في أساسيات البرمجة، مما قد يحد من قابلية تطبيق النموذج على سياقات تعليمية أخرى. يجب أن تتضمن الأبحاث المستقبلية مجموعات بيانات متعددة التخصصات ومتعددة المؤسسات لتعزيز الصلاحية الخارجية. ثانيًا، الاعتماد على مؤشرات سلوكية كمية—مثل الحضور وأداء منتصف الفصل—لا يشمل العوامل المعرفية أو الدافعية أو الاجتماعية والعاطفية التي تؤثر على النجاح الأكاديمي. يمكن أن يؤدي دمج البيانات النوعية، مثل التقارير الذاتية النفسية أو تحليل المشاعر، إلى فهم أكثر شمولية لسلوكيات تعلم الطلاب.

بالإضافة إلى ذلك، فإن قابلية تفسير النموذج مستمدة من SHAP، وهي طريقة بعدية، رغم أنها مفيدة، إلا أنها لا تضمن الشفافية السببية الكاملة. يمكن أن تستفيد التحقيقات المستقبلية من استكشاف تقنيات النمذجة السببية أو طرق التعزيز الأكثر قابلية للتفسير. قد تشكل التعقيدات الحسابية لـ Ex-ADA أيضًا تحديات للتطبيقات في الوقت الحقيقي في أنظمة الإنذار المبكر، مما يتطلب استراتيجيات تحسين لمجموعات البيانات الأكبر. علاوة على ذلك، لم تقيم الدراسة الاستقرار على المدى الطويل للنموذج عبر فصول أكاديمية مختلفة، مما يبرز الحاجة إلى إعادة التدريب الدوري للحفاظ على الدقة. أخيرًا، يجب أن تكون الاعتبارات الأخلاقية المتعلقة بخصوصية بيانات الطلاب وإمكانية التحيز الخوارزمي في مقدمة الأولويات، مما يبرز أهمية التعامل الآمن مع البيانات واستخدام التنبؤ المسؤول. تشير هذه القيود إلى مسارات حاسمة للأبحاث المستقبلية لتعزيز قابلية التوسع، والعدالة، والقدرات متعددة الأبعاد لإطار عمل Ex-ADA.

Journal: Frontiers in Education, Volume: 10
DOI: https://doi.org/10.3389/feduc.2025.1728070
Publication Date: 2026-01-14
Author(s): Emrah Arslan et al.
Primary Topic: Online Learning and Analytics

Overview

The research paper introduces Ex-ADA, an Explainable AdaBoost-based framework designed to enhance the early identification of academically at-risk students in higher education. Traditional predictive models often lack transparency, which undermines their effectiveness and trust among educators. Ex-ADA integrates SHapley Additive exPlanations (SHAP) with AdaBoost’s ensemble learning capabilities, utilizing data from 642 students in a programming course at the University of Plovdiv. The framework achieves an impressive accuracy of 84.12% and an AUC of 92.31%, surpassing conventional classifiers such as k-nearest neighbors and decision trees. SHAP analysis identifies attendance, midterm performance, and homework completion as critical predictors of student success, providing instructors with actionable insights for timely interventions.

The study emphasizes Ex-ADA’s dual strengths of predictive accuracy and interpretability, enabling personalized explanations for individual students’ risk factors. This approach not only aids in early intervention strategies but also fosters a more informed decision-making process for educators. Future research will aim to broaden the framework’s applicability by incorporating multimodal data, developing real-time monitoring tools, and addressing fairness and accountability in predictive analytics. Validation across diverse datasets will be crucial for establishing Ex-ADA’s robustness within institutional early-warning systems, ultimately enhancing its utility in promoting student retention and academic success.

Introduction

The introduction of this research paper highlights the rapid adoption of digital learning environments, which has generated vast amounts of educational data, presenting opportunities to enhance student retention and teaching effectiveness. A critical challenge in higher education is the timely identification of students at risk of academic dropout due to behavioral and emotional factors. Research indicates that early detection and tailored interventions can significantly improve academic success and retention rates. Traditional Early-Warning Systems (EWS) often rely on static indicators, lacking the explainability and adaptability needed to gain educators’ trust. The emergence of Explainable Artificial Intelligence (XAI) aims to address these shortcomings by providing transparent and interpretable predictions that support pedagogical decision-making.

The paper introduces Ex-ADA, an Explainable AdaBoost framework that combines the adaptive ensemble capabilities of AdaBoost with the interpretive strengths of SHapley Additive exPlanations (SHAP). This hybrid system is designed to accurately predict at-risk students while offering clear insights into the contributing factors, such as prior GPA and engagement metrics. The integration of SHAP allows for individualized explanations, enhancing the model’s utility as a diagnostic tool. The research demonstrates that Ex-ADA achieves an area under the curve (AUC) of 92.31% and an accuracy of 84.12% in predicting at-risk students, while also addressing the limitations of traditional models by adapting to individual learning trajectories and ensuring ethical AI deployment. The paper outlines its contributions, including improved transparency in educational decision-making and validation against benchmark models, setting the stage for further exploration in subsequent sections.

Methods

The proposed methodology outlines a structured analytical framework aimed at developing a predictive model for educational datasets characterized by incomplete data, outliers, and class imbalance, while ensuring explainability and reproducibility. The methodology consists of seven steps, beginning with data importation and preprocessing to ensure data quality and analytical readiness. Subsequently, the dataset is divided into training and testing subsets through a stratified cross-validation technique to facilitate a comprehensive evaluation.

Following the training and assessment of five supervised learning algorithms, a feature selection process is implemented to enhance model efficiency and interpretability. The best-performing model is then retrained using the selected features, and its performance is evaluated using established metrics. Finally, the use of SHAP (SHapley Additive exPlanations) allows for an in-depth explainability analysis, providing both local and global insights into the model’s decision-making process. A schematic overview of this methodological workflow is presented in Figure 1.

Results

The results of the study demonstrate significant differences in the performance of five classifiers—K Nearest Neighbors (KNN), Decision Tree, AdaBoost, Naïve Bayes, and Multilayer Perceptron (MLP)—in identifying at-risk students. AdaBoost achieved the highest Area Under the Curve (AUC) value of 91.53% in the Receiver Operating Characteristic (ROC) analysis, indicating its strong discriminatory capability. KNN followed with an AUC of 85.09%, while the Decision Tree reached an AUC of 87.38%. Naïve Bayes exhibited the lowest performance with an accuracy of 73.20%, and MLP showed good generalization with an AUC of 86.39%, albeit with some instability.

AdaBoost’s performance was further validated through statistical robustness measures, achieving an accuracy of 84.12% ± 1.34 and an AUC of 92.31% ± 0.97 across five cross-validation folds, indicating low variance and high stability. The findings underscore the effectiveness of AdaBoost’s adaptive reweighting mechanism, enhanced by SMOTE-based oversampling. Additionally, the study highlights the pedagogical implications of the MidTerm2-Practice activities, suggesting that low engagement in these tasks correlates with academic risk, while consistent participation signals stable learning behaviors. This insight supports the need for early intervention strategies, aligning with the study’s emphasis on explainability in model outputs (Coroama and Groza, 2022).

Discussion

The discussion section of the research paper highlights the advancements in predictive systems for identifying at-risk students through the integration of explainable artificial intelligence (XAI), ensemble techniques, and machine learning. It reviews significant studies, such as RADAR, which achieved an accuracy of 82.22% in dropout prediction by utilizing decision tree models and emphasizing explainability. Other notable frameworks include the Early Feedback Prediction System (EFPS), which adapted predictions based on continuous assessment data, and the Early Detection System (EDS), which demonstrated high accuracy rates using ensemble algorithms like AdaBoost. These systems collectively illustrate the effectiveness of adaptive models in providing timely interventions for student retention.

The section also identifies critical methodological gaps in current research, particularly the need for unified frameworks that seamlessly integrate boosting algorithms with SHAP-based explanations to enhance both predictive accuracy and interpretability. Despite the promising results from various studies, many existing models struggle with scalability and transparency, indicating a pressing need for comprehensive approaches that balance these aspects. The findings underscore the importance of developing robust, explainable models that can be effectively deployed across diverse educational contexts, ultimately contributing to improved academic outcomes and informed decision-making in educational settings.

Limitations

The study acknowledges several limitations that may affect the generalizability of the Ex-ADA framework, despite its strong predictive performance and explainability. Firstly, the dataset is restricted to a single university course in Fundamentals of Programming, which may limit the model’s applicability to other educational contexts. Future research should incorporate cross-disciplinary and multi-institutional datasets to enhance external validity. Secondly, the reliance on quantitative behavioral indicators—such as attendance and midterm performance—does not encompass the cognitive, motivational, or socioemotional factors influencing academic success. Incorporating qualitative data, such as psychological self-reports or sentiment analysis, could yield a more holistic understanding of student learning behaviors.

Additionally, the explainability of the model is derived from SHAP, a post-hoc method that, while useful, does not ensure complete causal transparency. Future investigations could benefit from exploring causal modeling techniques or more interpretable boosting methods. The computational complexity of Ex-ADA may also pose challenges for real-time applications in early-warning systems, necessitating optimization strategies for larger datasets. Furthermore, the study did not assess the long-term stability of the model across different academic terms, highlighting the need for periodic retraining to maintain accuracy. Lastly, ethical considerations regarding student data privacy and the potential for algorithmic bias must be prioritized, emphasizing the importance of secure data handling and responsible prediction use. These limitations suggest critical avenues for future research to enhance the scalability, fairness, and multidimensional capabilities of the Ex-ADA framework.