استخدام تحسين خوارزميات النحل والنمل في تقنيات التعلم الآلي كأداة في علم الأعصاب المعرفي Employing artificial bee and ant colony optimization in machine learning techniques as a cognitive neuroscience tool

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-94642-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40128279
تاريخ النشر: 2025-03-24
المؤلف: Kajal Mahawar وآخرون
الموضوع الرئيسي: تخطيط الدماغ وواجهات الدماغ-الكمبيوتر

نظرة عامة

تسلط الأبحاث الضوء على أهمية التعليم العالي في تشكيل نتائج الطلاب، لا سيما لطلاب تكنولوجيا المعلومات، الذين يمكن التنبؤ بأدائهم الأكاديمي بشكل فعال من خلال تقنيات التعلم الآلي المتقدمة. تحدد الدراسة التحديات مثل عدم توازن مجموعات البيانات والحاجة إلى ضبط الخوارزميات في أنظمة التنبؤ الأكاديمية الحالية. لمعالجة هذه القضايا، نفذ المؤلفون مجموعة متنوعة من خوارزميات التعلم الآلي، بما في ذلك شجرة القرار (DT)، وأقرب الجيران (KNN)، وXGBoost (XGB)، المعززة بتقنية زيادة العينة الأقلية الاصطناعية (SMOTE) وضبط المعلمات عبر طرق تحسين مستعمرة النمل (ACO) ومستعمرة النحل الاصطناعية (ABC).

تكشف النتائج أن الجمع بين SMOTE وACO مع نموذج DT حقق أداءً تنبؤياً متفوقاً، حيث حقق دقة بنسبة 98.1%، ودرجة F1 بنسبة 96%، ودقة بنسبة 96.24%، واسترجاع بنسبة 96.19%، ومنحنى ROC بنسبة 96%، وقيمة R² بنسبة 84.75%. تؤكد الدراسة على أهمية الضبط الدقيق في تحسين نماذج التعلم الآلي للتنبؤات الأكاديمية وتقدم رؤى قيمة لأصحاب المصلحة في التعليم الذين يسعون لتحسين نجاح الطلاب من خلال نهج قائم على البيانات. بالإضافة إلى ذلك، ساعد استخدام تقنية معامل ارتباط كيندال تاو في تحديد العوامل الرئيسية التي تؤثر على أداء الطلاب، مما أغنى مساهمات الدراسة في هذا المجال.

الطرق

تركز المنهجية المقترحة في هذه الدراسة على تعزيز أداء نموذج التعلم الآلي (ML) لتنبؤ نتائج الطلاب الأكاديمية من خلال معالجة عدم توازن مجموعات البيانات وتحسين المعلمات. على وجه التحديد، يتم استخدام تقنية زيادة العينة الأقلية الاصطناعية (SMOTE) لإدارة مجموعات البيانات غير المتوازنة، بينما يتم استخدام تحسين مستعمرة النمل (ACO) لضبط المعلمات. تستخدم الدراسة ثلاثة مصنفات تعلم آلي لتحليل البيانات المجمعة من 1369 طالباً في تكنولوجيا المعلومات عبر ثلاث كليات خاصة في جابالبور، ماديا براديش، الهند.

تم تسهيل جمع البيانات من خلال استبيان تم توزيعه عبر نماذج جوجل، مما أسفر عن مجموعة بيانات تتكون من 1369 سجلاً مع 70 ميزة أولية. تم تطبيق تقنية مربع كاي لتحديد 21 ميزة مثالية للتحليل، كما هو موضح في الجدول 1. يتم تمثيل الإطار العام للنهج المقترح بصرياً في الشكل 4، الذي يتبعه وصف للخوارزمية المستخدمة في الدراسة.

النتائج

قدمت الدراسة نتائج من ثمانية تجارب تركزت على تحليل ارتباط الميزات، وأهمية الميزات، وتطبيق تقنية زيادة العينة الأقلية الاصطناعية (SMOTE) بالتزامن مع مصنفات التعلم الآلي (ML). تشمل المصنفات المستخدمة أشجار القرار (DT)، وأقرب الجيران (KNN)، وزيادة التدرج المتطرفة (XGB). بالإضافة إلى ذلك، تم إجراء ضبط المعلمات باستخدام تقنيات تحسين مستعمرة النمل (ACO) ومستعمرة النحل الاصطناعية (ABC).

تم تنفيذ جميع التجارب في بايثون، مستفيدين من مكتباتها الواسعة لتسهيل التحليل. تؤكد النتائج على فعالية مصنفات التعلم الآلي المستخدمة وخوارزميات الضبط في تحسين أداء النموذج على مجموعات بيانات الطلاب، على الرغم من عدم تفصيل مقاييس الأداء المحددة والنتائج المقارنة في هذا القسم.

المناقشة

تستعرض قسم المناقشة من ورقة البحث دراسات متنوعة استخدمت خوارزميات التعلم الآلي التقليدية وتقنيات التحسين لتعزيز نتائج تعلم الطلاب. تشمل المساهمات البارزة تطوير ناجيها وآخرين لنظام قائم على PHP وLaravel يستخدم خوارزمية C4.5 لتنبؤ الأداء الأكاديمي، وتقييم غوناسينغ وآخرين لفعالية نموذج UTAUT-3 في سياقات التعلم الإلكتروني. تشمل الأعمال المهمة الأخرى دمج فنج لتقنيات التصنيف والتنقيب عن البيانات، وفحص كوهوز لعوامل الديموغرافيا في نماذج التنبؤ بالمخاطر، وتقديم فيرجر وآخرين لمقياس كثافة النموذج المطلق لتحليل تمييز النموذج. يبرز القسم فجوة في الأدبيات بشأن تطبيق تقنيات التحسين مثل مستعمرة النحل الاصطناعية (ABC) وتحسين مستعمرة النمل (ACO) في البيئات التعليمية، مما يشير إلى أن دمجها يمكن أن يحسن الدقة التنبؤية ونتائج الطلاب.

تشمل المنهجية المقترحة سلسلة من الخطوات، بما في ذلك معالجة البيانات المسبقة، وتحليل ارتباط الميزات باستخدام معامل كيندال تاو، وتطبيق تقنية زيادة العينة الأقلية الاصطناعية (SMOTE) لمعالجة عدم توازن الفئات. تركز الدراسة على طلاب السنة الثانية في برامج تكنولوجيا المعلومات المختلفة، مستخدمة مصنفات شجرة القرار (DT)، وأقرب الجيران (KNN)، وXGBoost، مع تحسين المعلمات من خلال تقنيات ABC وACO. يتم تقييم أداء هذه النماذج باستخدام مقاييس مثل الدقة، ودرجة F1، وتحليل منحنى ROC. تشير النتائج إلى أنه بينما تؤدي النماذج بشكل جيد على مجموعات البيانات متوسطة الحجم، تظل قابلية التوسع مصدر قلق لمجموعات البيانات الأكبر بسبب الزيادة في المتطلبات الحاسوبية المرتبطة بـ SMOTE وتقنيات التحسين. بشكل عام، يمكن أن يعزز دمج طرق التحسين المتقدمة بشكل كبير القدرات التنبؤية لنماذج التعلم الآلي في السياقات التعليمية.

القيود

تقدم الدراسة رؤى قيمة حول الإنجاز الأكاديمي؛ ومع ذلك، فإنها محدودة بمجموعتها البيانية، التي قد تكون محددة لمؤسسة واحدة أو ديموغرافيا معينة، مما يحد من إمكانية تعميم النتائج على سياقات أوسع. بالإضافة إلى ذلك، لم تشمل الأبحاث جميع المتغيرات ذات الصلة التي تؤثر على الأداء الأكاديمي، كما لم تستكشف تقنيات التعلم الآلي المتقدمة، مثل طرق التجميع أو التعلم العميق، التي يمكن أن تعزز الدقة التنبؤية.

علاوة على ذلك، تتطلب الطبيعة الديناميكية للأداء الأكاديمي دراسات طولية لفهم أكثر شمولاً للاتجاهات على مر الزمن. كما تثار اعتبارات أخلاقية في النمذجة التنبؤية، لا سيما فيما يتعلق بخصوصية الطلاب وأمان البيانات. هناك خطر من perpetuating biases within the models, potentially leading to unfair or discriminatory outcomes. Addressing equity and justice in predictive analytics remains a critical challenge in this field.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-94642-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40128279
Publication Date: 2025-03-24
Author(s): Kajal Mahawar et al.
Primary Topic: EEG and Brain-Computer Interfaces

Overview

The research highlights the significance of higher education in shaping student outcomes, particularly for IT students, whose academic performance can be effectively predicted through advanced machine learning (ML) techniques. The study identifies challenges such as imbalanced datasets and the need for algorithm tuning in existing academic prediction systems. To address these issues, the authors implemented various ML algorithms, including decision tree (DT), k-nearest neighbor (KNN), and XGBoost (XGB), enhanced by the synthetic minority oversampling technique (SMOTE) and hyperparameter tuning via Ant Colony Optimization (ACO) and Artificial Bee Colony (ABC) methods.

The findings reveal that the combination of SMOTE and ACO with the DT model yielded superior predictive performance, achieving an accuracy of 98.1%, an F1 score of 96%, precision of 96.24%, recall of 96.19%, an ROC curve of 96%, and an R² value of 84.75%. The study underscores the importance of fine-tuning in optimizing ML models for academic predictions and provides valuable insights for educational stakeholders aiming to improve student success through data-driven approaches. Additionally, the use of the Kendall Tau correlation coefficient technique facilitated the identification of key factors influencing student performance, further enriching the study’s contributions to the field.

Methods

The proposed methodology in this study focuses on enhancing machine learning (ML) model performance for predicting student academic outcomes by addressing dataset imbalances and optimizing hyperparameters. Specifically, the Synthetic Minority Over-sampling Technique (SMOTE) is employed to manage imbalanced datasets, while Ant Colony Optimization (ACO) is utilized for hyperparameter tuning. The research utilizes three ML classifiers to analyze the data collected from 1369 IT students across three private colleges in Jabalpur, Madhya Pradesh, India.

Data collection was facilitated through a questionnaire distributed via Google Forms, yielding a dataset comprising 1369 records with 70 initial features. The Chi-square technique was applied to identify 21 optimal features for the analysis, as detailed in Table 1. The overall framework of the proposed approach is visually represented in Figure 4, which is followed by a description of the algorithm used in the study.

Results

The study presented results from eight experiments that focused on feature correlation analysis, feature relevance, and the application of Synthetic Minority Over-sampling Technique (SMOTE) in conjunction with machine learning (ML) classifiers. The classifiers utilized included Decision Trees (DT), K-Nearest Neighbors (KNN), and eXtreme Gradient Boosting (XGB). Additionally, hyperparameter tuning was performed using Ant Colony Optimization (ACO) and Artificial Bee Colony (ABC) techniques.

All experiments were executed in Python, leveraging its extensive libraries to facilitate the analysis. The findings underscore the effectiveness of the employed ML classifiers and tuning algorithms in enhancing model performance on student datasets, although specific performance metrics and comparative results were not detailed in this section.

Discussion

The discussion section of the research paper reviews various studies that have employed traditional machine learning (ML) algorithms and optimization techniques to enhance student learning outcomes. Notable contributions include Najieha et al.’s development of a PHP and Laravel-based system utilizing the C4.5 algorithm for predicting academic performance, and Gunasinghe et al.’s assessment of the UTAUT-3 model’s effectiveness in e-learning contexts. Other significant works include Fang’s integration of classifiers and data mining techniques, Cohausz’s examination of demographic factors in at-risk prediction models, and Verger et al.’s introduction of the Model Absolute Density Distance metric for model discrimination analysis. The section highlights a gap in the literature regarding the application of optimization techniques like Artificial Bee Colony (ABC) and Ant Colony Optimization (ACO) in educational settings, suggesting that their integration could improve predictive accuracy and student outcomes.

The proposed methodology involves a series of steps, including data preprocessing, feature correlation analysis using the Kendall Tau coefficient, and the application of the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance. The study focuses on second-year undergraduate IT students across various programs, employing decision tree (DT), K-nearest neighbor (KNN), and XGBoost classifiers, with hyperparameters optimized through ABC and ACO techniques. The performance of these models is evaluated using metrics such as accuracy, F1 score, and ROC curve analysis. The findings indicate that while the models perform well on medium-sized datasets, scalability remains a concern for larger datasets due to increased computational demands associated with SMOTE and optimization techniques. Overall, the integration of advanced optimization methods could significantly enhance the predictive capabilities of ML models in educational contexts.

Limitations

The study presents valuable insights into academic achievement; however, it is limited by its dataset, which may be specific to a single institution or demographic, thereby limiting the generalizability of the findings to broader contexts. Additionally, the research did not encompass all relevant variables influencing academic performance, nor did it explore advanced machine learning techniques, such as ensemble methods or deep learning, which could enhance predictive accuracy.

Moreover, the dynamic nature of academic performance necessitates longitudinal studies for a more thorough understanding of trends over time. Ethical considerations also arise in predictive modeling, particularly regarding student privacy and data security. There is a risk of perpetuating biases within the models, potentially leading to unfair or discriminatory outcomes. Addressing equity and justice in predictive analytics remains a critical challenge in this field.