الكشف عن مرض الكلى المزمن القابل للتفسير المدفوع بالبيانات باستخدام استيفاء البيانات القائم على RF وتعلم الميتا-إنسيبل Data-driven explainable chronic kidney disease detection using RF based data imputation and meta-ensemble learning

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-41425-2
PMID: https://pubmed.ncbi.nlm.nih.gov/41803222
تاريخ النشر: 2026-03-09
المؤلف: R. K. Gupta وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول ورقة البحث الحاجة الملحة للكشف المبكر عن مرض الكلى المزمن (CKD) من خلال إطار عمل منظم يعتمد على البيانات يهدف إلى تعزيز توقع مرض الكلى المزمن. الإطار الهجين المقترح يدمج تقنيات التقدير المعتمدة على غابة عشوائية (RF) لإدارة القيم المفقودة، وترميز الميزات الفئوية، وتقنية الزيادة الاصطناعية للأقليات (SMOTE) لمعالجة عدم توازن الفئات. يستخدم محسن الذئب الرمادي (GWO) لتحسين أوزان مجموعة من المصنفات، بما في ذلك شجرة القرار، والانحدار اللوجستي، ونايف بايز الغاوسي. تم تقييم الإطار على مجموعة بيانات UCI CKD، محققًا مقاييس أداء مثيرة للإعجاب: دقة 98.75%، دقة 98.8%، استرجاع 98.6%، ودرجة F1 تبلغ 98.7%.

علاوة على ذلك، تتضمن الدراسة تقنيات الذكاء الاصطناعي القابلة للتفسير (XAI) مثل SHAP وLIME لتحليل وتفسير مساهمات الميزات، مما يوفر رؤى تؤكد الأهمية السريرية للتوقعات. لا يعزز هذا الإطار دقة التوقعات فحسب، بل يقدم أيضًا نموذج دعم قرار شفاف وفعال حسابيًا يتماشى مع الذكاء الاصطناعي المعتمد على البيانات مع ممارسة أمراض الكلى، مما يسهم في تحسين نتائج المرضى في إدارة مرض الكلى المزمن.

مقدمة

تسلط مقدمة ورقة البحث الضوء على أهمية مرض الكلى المزمن (CKD) كقضية صحية واسعة الانتشار وتزداد سوءًا تؤثر على الملايين عالميًا، مع عبء ملحوظ في الدول النامية. يتم التأكيد على أن التشخيص المبكر والدقيق لمرض الكلى المزمن أمر ضروري لتخفيف المضاعفات، وتحسين نتائج المرضى، واستخدام موارد الرعاية الصحية بكفاءة. ومع ذلك، غالبًا ما تعيق التحديات مثل البيانات السريرية غير المكتملة، ومجموعات البيانات غير المتوازنة، والقدرة المحدودة على تعميم نماذج التعلم الآلي التقليدية تنفيذ أنظمة تشخيص فعالة.

تقترح الورقة أن الذكاء الاصطناعي (AI)، وخاصة التعلم الآلي، قد يقدم حلولًا واعدة لهذه التحديات. من خلال الاستفادة من تقنيات الذكاء الاصطناعي المتقدمة، من الممكن تحسين دقة وموثوقية تشخيص مرض الكلى المزمن، وبالتالي تلبية الحاجة الملحة لأدوات تشخيص أفضل في إدارة هذه المرض.

طرق

تتضمن المنهجية المقترحة لتصنيف أمراض الكلى المزمنة إطار عمل منظم يتكون من معالجة البيانات، وتدريب المصنفات، ونمذجة التجميع. في البداية، يتم معالجة مجموعة البيانات الخام لتحديد وترميز القيم المفقودة، تليها عملية التقدير باستخدام مُنظم غابة عشوائية، والذي يحافظ على علاقات الميزات بشكل أكثر فعالية من الطرق التقليدية. تخضع البيانات المقدرة للتقييس وترميز المتغيرات الفئوية. لمعالجة عدم توازن الفئات، يتم تطبيق تقنية الزيادة الاصطناعية للأقليات (SMOTE) داخل كل طية من طيات التحقق المتبادل، مما يضمن مجموعة تدريب متوازنة. يتم تدريب عدة مصنفات، بما في ذلك آلة الدعم الشعاعي (SVM)، وشجرة القرار (DT)، والانحدار اللوجستي (LR)، وأقرب الجيران (KNN)، ونايف بايز الغاوسي (GNB)، مع اختيار أفضل ثلاثة أداءً بناءً على دقة التحقق.

ثم يتم دمج هذه المصنفات المختارة باستخدام تجميع تصويت وزني، حيث يتم تحديد الأوزان المثلى من خلال تحسين الذئب الرمادي (GWO). تحاكي هذه التقنية السلوكية للصيد والتسلسل الاجتماعي للذئاب الرمادية لتعزيز أداء التجميع. للمقارنة، يتم أيضًا اشتقاق الأوزان الأساسية باستخدام طرق تحسين بديلة مثل خوارزمية اليراعة (FPA)، ومستعمرة النحل الاصطناعية (ABC)، وتحسين مستعمرة النمل (ACO)، وبرمجة المربعات الصغرى التتابعية (SLSQP)، وتحسين بايزي (BO)، مع قيود تضمن أن مجموع الأوزان يساوي واحد. تشير النتائج إلى أن GWO يتفوق باستمرار على الطرق الأخرى، مما يعزز تنوع وقوة نموذج التجميع، مما يؤدي في النهاية إلى تحسين نتائج التصنيف. يتم تقديم قائمة شاملة من المعلمات الفائقة المستخدمة في التجارب في الجدول المرافق.

مناقشة

تسلط قسم المناقشة في الورقة الضوء على التقدم في التعلم الآلي (ML) لتحسين دقة التشخيص لمرض الكلى المزمن (CKD). لقد أظهرت نماذج التعلم الآلي التقليدية، مثل الانحدار اللوجستي (LR)، وأشجار القرار (DT)، وآلات الدعم الشعاعي (SVM)، نجاحات متفاوتة ولكن غالبًا ما تكون محدودة بسبب اعتمادها على الميزات المصنوعة يدويًا والمنطق الحتمي. تقترح هذه الدراسة إطار عمل جديد يعتمد على تجميع الأوزان باستخدام محسن الذئب الرمادي (GWO) الذي يدمج عدة مصنفات مع تقنيات معالجة مسبقة قوية لتعزيز تصنيف مرض الكلى المزمن. تشمل الابتكارات الرئيسية استخدام غابة عشوائية (RF) لتقدير القيم المفقودة وتقنية الزيادة الاصطناعية للأقليات (SMOTE) لمعالجة عدم توازن الفئات، مما يؤدي في النهاية إلى نموذج تنبؤي أكثر دقة وموثوقية.

تظهر النتائج أن التجميع المحسن بواسطة GWO يتفوق باستمرار على المصنفات الفردية وطرق التجميع التقليدية، محققًا مقاييس متفوقة مثل الدقة، والدقة، والاسترجاع، ودرجة F1. كما تجري الدراسة تحليلًا مقارنًا شاملاً لمختلف خوارزميات التحسين، بما في ذلك برمجة المربعات الصغرى التتابعية (SLSQP)، وتحسين مستعمرة النمل (ACO)، وغيرها، مما يبرز فعالية نهج GWO في تحسين أوزان التجميع. بشكل عام، تسهم هذه الأبحاث بشكل كبير في المجال من خلال معالجة التحديات الحرجة في توقع مرض الكلى المزمن، مثل البيانات المفقودة وعدم توازن الفئات، بينما تقدم إطار عمل قوي للتطبيقات المستقبلية في البيئات السريرية.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-41425-2
PMID: https://pubmed.ncbi.nlm.nih.gov/41803222
Publication Date: 2026-03-09
Author(s): R. K. Gupta et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper addresses the critical need for early detection of Chronic Kidney Disease (CKD) through a structured data-driven framework aimed at enhancing CKD prediction. The proposed hybrid framework integrates Random Forest (RF)-based imputation for managing missing values, categorical feature encoding, and the Synthetic Minority Oversampling Technique (SMOTE) to tackle class imbalance. It employs a Grey Wolf Optimizer (GWO) to optimize the weights of an ensemble of classifiers, including Decision Tree, Logistic Regression, and Gaussian Naïve Bayes. The framework was evaluated on the UCI CKD dataset, achieving impressive performance metrics: an accuracy of 98.75%, precision of 98.8%, recall of 98.6%, and an F1-score of 98.7%.

Furthermore, the study incorporates explainable AI (XAI) techniques such as SHAP and LIME to analyze and interpret feature contributions, thereby providing insights that affirm the clinical relevance of the predictions. This framework not only enhances predictive accuracy but also offers a transparent and computationally efficient decision support model that aligns data-driven AI with nephrology practice, ultimately contributing to improved patient outcomes in CKD management.

Introduction

The introduction of the research paper highlights the significance of Chronic Kidney Disease (CKD) as a widespread and progressively worsening health issue that impacts millions globally, with a notable burden in developing nations. Early and precise diagnosis of CKD is emphasized as essential for mitigating complications, enhancing patient outcomes, and efficiently utilizing healthcare resources. However, the implementation of effective diagnostic systems is often obstructed by challenges such as incomplete clinical data, imbalanced datasets, and the limited generalizability of conventional machine learning models.

The paper suggests that artificial intelligence (AI), particularly machine learning, may offer promising solutions to these challenges. By leveraging advanced AI techniques, it is possible to improve the accuracy and reliability of CKD diagnosis, thereby addressing the critical need for better diagnostic tools in the management of this disease.

Methods

The proposed methodology for classifying chronic kidney diseases involves a structured framework comprising data preprocessing, classifier training, and ensemble modeling. Initially, the raw dataset is processed to identify and encode missing values, followed by imputation using a random forest regressor, which preserves feature correlations more effectively than traditional methods. The imputed data undergoes scaling and encoding of categorical variables. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is applied within each cross-validation fold, ensuring a balanced training set. Multiple classifiers, including Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Gaussian Naive Bayes (GNB), are trained, with the top three performers selected based on validation accuracy.

These selected classifiers are then combined using a weighted voting ensemble, where optimal weights are determined through Grey Wolf Optimization (GWO). This optimization technique mimics the hunting behavior and social hierarchy of grey wolves to enhance the ensemble’s performance. For comparison, baseline weights are also derived using alternative optimization methods such as Firefly Algorithm (FPA), Artificial Bee Colony (ABC), Ant Colony Optimization (ACO), Sequential Least Squares Quadratic Programming (SLSQP), and Bayesian Optimization (BO), with constraints ensuring the weights sum to one. The results indicate that GWO consistently outperforms other methods, thereby enhancing the diversity and robustness of the ensemble model, ultimately leading to improved classification outcomes. A comprehensive list of hyperparameters utilized in the experiments is provided in the accompanying table.

Discussion

The discussion section of the paper highlights the advancements in machine learning (ML) for improving the diagnostic accuracy of Chronic Kidney Disease (CKD). Traditional ML models, such as Logistic Regression (LR), Decision Trees (DT), and Support Vector Machines (SVM), have shown varying success but are often limited by their reliance on handcrafted features and deterministic logic. This study proposes a novel Grey Wolf Optimizer (GWO)-weighted ensemble framework that integrates multiple classifiers with robust preprocessing techniques to enhance CKD classification. Key innovations include the use of Random Forest (RF) for missing value imputation and the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance, ultimately leading to a more accurate and reliable predictive model.

The results demonstrate that the GWO-optimized ensemble consistently outperforms individual classifiers and traditional ensemble methods, achieving superior metrics such as accuracy, precision, recall, and F1-score. The study also conducts a comprehensive comparative analysis of various optimization algorithms, including Sequential Least Squares Programming (SLSQP), Ant Colony Optimization (ACO), and others, underscoring the effectiveness of the GWO approach in optimizing ensemble weights. Overall, this research contributes significantly to the field by addressing critical challenges in CKD prediction, such as missing data and class imbalance, while providing a robust framework for future applications in clinical settings.