توقع داء السكري لدى الأطفال باستخدام التعلم الآلي Pediatric diabetes prediction using machine learning

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-24964-y
PMID: https://pubmed.ncbi.nlm.nih.gov/41540069
تاريخ النشر: 2026-01-15
المؤلف: Abeer El-Sayyid El-Bashbishy وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

يتناول هذا القسم من ورقة البحث تطوير نظام تعلم آلي (ML) مصمم للتنبؤ بمرض السكري وتصنيف أنواعه المختلفة، مع معالجة التحديات التي تطرحها مجموعات البيانات المحدودة والنماذج التنبؤية في أبحاث السكري. تقدم الدراسة مجموعة بيانات جديدة، مجموعة بيانات أنواع السكري، التي تدمج البيانات من مصادر متعددة، بما في ذلك السجلات الطبية للأطفال ومجموعات بيانات السكري المعروفة. تم استخدام نهج التصنيف متعدد الفئات باستخدام خوارزميات تعلم آلي مشرف مختلفة، مثل الشبكات العصبية الاصطناعية (ANN)، والانحدار اللوجستي، والغابات العشوائية، من بين أمور أخرى. حقق مصنف ANN أعلى دقة بنسبة 99.98%، مما يظهر قوة النموذج من خلال التحقق الخارجي.

تؤكد النتائج على إمكانيات تقنيات التعلم الآلي في تعزيز اكتشاف السكري وتصنيفه، وهو أمر حاسم للتدخل المبكر وإدارة المرض بشكل فعال. تم تقييم أداء النظام بدقة باستخدام مقاييس مثل الدقة، والدقة الإيجابية، والمساحة تحت منحنى التشغيل (AUC). كما تبرز الدراسة أهمية وجود خط أنابيب شامل لمعالجة البيانات وتحسين المعلمات لضمان جودة البيانات وموثوقية النموذج. قد تشمل الأعمال المستقبلية أتمتة التنبؤ بالسكري، وتطوير تطبيقات موبايل للمراقبة، واستكشاف دمج الخوارزميات الجينية وبيانات التصوير الطبي لتحسين رعاية السكري بشكل أكبر.

الطرق

في هذه الدراسة، يتم اقتراح نموذج جديد لتصنيف أنواع السكري بشكل متعدد الفئات، منظم من خلال منهجية منهجية. تبدأ العملية بجمع البيانات، تليها معالجة مسبقة لضمان جودة البيانات. بعد ذلك، يتم استخراج الميزات بناءً على تحليل البيانات، مما يوجه مراحل التدريب والاختبار للنموذج.

تُعزز عملية التصنيف متعدد الفئات من خلال تقييم أداء المصنف، وتحسين معاييره، وتنفيذ مراحل التنبؤ. يتم تمثيل الهيكل العام للنظام بصريًا في الشكل 2، موضحًا النهج الشامل المتبع لتطوير وتقييم فعالية النموذج في تصنيف أنواع السكري المختلفة.

النتائج

تقدم الدراسة نظام تعلم آلي مشرف (ML) يهدف إلى الاكتشاف المبكر وتصنيف أنواع السكري المتعددة، وهو أمر حاسم لتحسين النتائج السريرية وإدارة المرض. تم تطوير نموذج ML باستخدام مجموعة بيانات نوع السكري (DTD)، التي تدمج سجلات المرضى من مصادر متنوعة، مما يسمح بتصنيف متعدد الفئات لسكري النوع 1، وسكري النوع 2، وسكري الحمل، والأفراد غير المصابين بالسكري. تم إجراء مرحلة معالجة مسبقة صارمة، بما في ذلك التوحيد، وإزالة القيم الشاذة، وتقدير القيم المفقودة عبر تقنية MICE. لمعالجة عدم توازن الفئات، تم استخدام طريقة SMOTE، مما يعزز عدالة المصنف. تم تقييم تسع خوارزميات تعلم آلي مشرف، مع إجراء تحسين للمعلمات باستخدام تحسين سرب الجسيمات (PSO) لتحسين أداء النموذج.

أشارت النتائج إلى أن مصنف الشبكة العصبية الاصطناعية (ANN) حقق أعلى دقة تنبؤية بنسبة 99.98%، مع الحد الأدنى من الأخطاء في التصنيف عبر جميع الفئات. تم استخدام مقاييس الأداء مثل الدقة، والاسترجاع، ودرجة F1، والدقة، ومتوسط الخطأ التربيعي (MSE)، والمساحة تحت المنحنى (AUC) لتقييم المصنفات. كما أظهرت الدراسة قابلية تكيف النموذج من خلال تطبيقه على مجموعات بيانات خارجية، مؤكدة قوته في السيناريوهات الواقعية. تم إنشاء تصورات، بما في ذلك مصفوفات الارتباك ومنحنيات ROC، لدعم قابلية تفسير النموذج وتقييمه. ستوسع الأعمال المستقبلية مجموعة البيانات لتشمل ديموغرافيا أوسع وأنواع سكري إضافية، مما يعزز قدرات النموذج التنبؤية.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على تطوير وتقييم نماذج تجميعية مختلفة لاكتشاف السكري المبكر، باستخدام مزيج من خوارزميات التعلم الآلي التقليدية (ML) والشبكات العصبية العميقة (DNN). تدمج الدراسة مجموعات بيانات متعددة، بما في ذلك مجموعة بيانات السكري للهنود البيما (PID) وغيرها، لتعزيز عمومية النموذج ودقته التنبؤية. ومن الجدير بالذكر أن التجميع القائم على DNN حقق دقة مثيرة للإعجاب بنسبة 95.5% على مجموعة بيانات محاكاة، بينما سجلت نماذج ML المكدسة دقة بنسبة 75.03% و77.10% على مجموعة بيانات PID باستخدام طرق تحقق مختلفة. تؤكد النتائج على إمكانيات تقنيات التعلم التجميعي لتحسين التنبؤ بالسكري عبر مجموعات سكانية متنوعة، مع معالجة القيود المرتبطة بمجموعات بيانات ذات سكان واحد.

بالإضافة إلى ذلك، تناقش الورقة تنفيذ إطار عمل للتصنيف المتعدد الذي يصنف حالة السكري إلى فئات مصابة بالسكري، وغير مصابة، وما قبل السكري، باستخدام تقنيات معالجة مسبقة شاملة للتعامل مع تحديات البيانات مثل عدم توازن الفئات والقيم المفقودة. يتضمن الإطار مجموعة متنوعة من خوارزميات التعلم الآلي، محققًا دقة عالية بنسبة 98.87% مع نموذج تجميعي موزون. تؤكد الدراسة على أهمية معالجة البيانات القوية وطرق اختيار الميزات، بما في ذلك الاختبارات الإحصائية مثل ANOVA وChi-square، لضمان موثوقية التنبؤات. بشكل عام، تقدم البحث نهجًا شاملاً لتنبؤ السكري، مستفيدًا من تقنيات التعلم الآلي المتقدمة ومجموعات البيانات المتنوعة لتسهيل التدخلات الطبية في الوقت المناسب وتحسين إدارة المرض.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-24964-y
PMID: https://pubmed.ncbi.nlm.nih.gov/41540069
Publication Date: 2026-01-15
Author(s): Abeer El-Sayyid El-Bashbishy et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

This research paper section discusses the development of a machine learning (ML) system designed to predict diabetes and classify its various types, addressing the challenges posed by limited datasets and predictive models in diabetes research. The study introduces a novel dataset, the Diabetes Types Dataset, which integrates data from multiple sources, including pediatric records and established diabetes datasets. A multiclass classification approach was employed using various supervised ML algorithms, such as Artificial Neural Networks (ANN), Logistic Regression, and Random Forests, among others. The ANN classifier achieved the highest accuracy of 99.98%, demonstrating the model’s robustness through external validation.

The findings underscore the potential of ML techniques in enhancing diabetes detection and classification, which is crucial for early intervention and effective disease management. The system’s performance was rigorously evaluated using metrics like Accuracy, Precision, and Area Under the Receiver Operating Characteristic Curve (AUC). The study also highlights the importance of a comprehensive preprocessing pipeline and hyperparameter optimization to ensure data quality and model reliability. Future work may include automating diabetes prediction, developing mobile applications for monitoring, and exploring the integration of genetic algorithms and medical imaging data to further improve diabetes care.

Methods

In this study, a novel model for the multiclassification of diabetes types is proposed, structured through a systematic methodology. The process initiates with data collection, followed by preprocessing to ensure data quality. Subsequently, feature extraction is conducted based on data analysis, which informs the training and testing phases of the model.

The multiclassification process is further enhanced by evaluating the classifier’s performance, optimizing its parameters, and executing prediction phases. The overall system architecture is visually represented in Figure 2, illustrating the comprehensive approach taken to develop and assess the model’s efficacy in classifying different diabetes types.

Results

The study presents a supervised machine learning (ML) system aimed at the early detection and classification of multiple diabetes types, which is crucial for improving clinical outcomes and disease management. The ML model was developed using the Diabetes Type Dataset (DTD), which integrates patient records from various sources, allowing for a multiclass classification of Type 1 Diabetes, Type 2 Diabetes, Gestational Diabetes, and non-diabetic individuals. A rigorous preprocessing phase was conducted, including standardization, outlier removal, and imputation of missing values via the MICE technique. To address class imbalance, the SMOTE method was employed, enhancing the fairness of the classifier. Nine supervised ML algorithms were evaluated, with hyperparameter tuning performed using Particle Swarm Optimization (PSO) to optimize model performance.

The results indicated that the Artificial Neural Network (ANN) classifier achieved the highest predictive accuracy of 99.98%, with minimal misclassifications across all classes. Performance metrics such as Precision, Recall, F1-score, Accuracy, Mean Squared Error (MSE), and Area Under the Curve (AUC) were utilized to assess the classifiers. The study also demonstrated the model’s adaptability by applying it to external datasets, confirming its robustness in real-world scenarios. Visualizations, including confusion matrices and ROC curves, were generated to support model interpretability and evaluation. Future work will expand the dataset to include a broader demographic and additional diabetes types, enhancing the model’s predictive capabilities.

Discussion

The discussion section of the research paper highlights the development and evaluation of various ensemble models for early diabetes detection, utilizing a combination of traditional machine learning (ML) algorithms and deep neural networks (DNN). The study integrates multiple datasets, including the Pima Indian Diabetes (PID) dataset and others, to enhance model generalizability and predictive accuracy. Notably, the DNN-based ensemble achieved an impressive accuracy of 95.5% on a simulated dataset, while the stacked ML models recorded accuracies of 75.03% and 77.10% on the PID dataset with different validation methods. The findings underscore the potential of ensemble learning techniques to improve diabetes prediction across diverse populations, addressing limitations associated with single-population datasets.

Additionally, the paper discusses the implementation of a multi-classification framework that categorizes diabetes status into diabetic, non-diabetic, and prediabetic classes, employing extensive preprocessing techniques to handle data challenges such as class imbalance and missing values. The framework incorporates various ML algorithms, achieving a high accuracy of 98.87% with a weighted ensemble model. The study emphasizes the importance of robust data preprocessing and feature selection methods, including statistical tests like ANOVA and Chi-square, to ensure the reliability of predictions. Overall, the research presents a comprehensive approach to diabetes prediction, leveraging advanced ML techniques and diverse datasets to facilitate timely medical interventions and improve disease management.