الكشف المبكر عن مرض الكلى المزمن بناءً على نموذج تعلم آلي معزز بـ SURD Early detection of chronic kidney disease based on a SURD-enhanced machine learning model

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-41050-z
PMID: https://pubmed.ncbi.nlm.nih.gov/41741522
تاريخ النشر: 2026-02-25
المؤلف: Ningning Xue وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول هذه الدراسة التحدي الكبير المتمثل في التنبؤ المبكر بالمخاطر في مرض الكلى المزمن (CKD) من خلال تقديم إطار تنبؤي جديد يجمع بين التعلم الآلي مع تحليل التفكيك الفريد المتناغم (SURD) من نظرية المعلومات السببية. باستخدام مجموعة بيانات UCI-CKD التي تحتوي على 400 عينة، طور الباحثون عشرة نماذج تصنيف، مع معالجة البيانات المفقودة من خلال الاستيفاء المتعدد وعدم توازن الفئات عبر زيادة العينة الأقل تمثيلاً. تم تقييم أداء النماذج بدقة باستخدام مقاييس مثل الدقة، والموثوقية، والاسترجاع، ودرجة F1، والمساحة تحت منحنى التشغيل للمتلقي (AUC).

لضمان عمومية النماذج وللتخفيف من الإفراط في التكيف، تم إجراء تحقق خارجي شامل على مجموعة كبيرة من قاعدة بيانات MIMIC-IV، والتي تضم 27,834 سجلاً صحياً إلكترونياً. بينما أظهرت عدة نماذج أداءً عالياً على مجموعة البيانات الداخلية، تميز نموذج الغابة العشوائية بأداء AUC قدره 0.990 (95% CI: 0.989-0.991) في التحقق الخارجي، متفوقاً بشكل كبير على نموذج شجرة القرار الأساسي، الذي حقق AUC قدره 0.914 (95% CI: 0.912-0.916). حدد تحليل التفكيك السببي القائم على SURD وتحليل أهمية الميزات مؤشرات سريرية رئيسية، مثل الكرياتينين في المصل والهيموجلوبين. تشير هذه النتائج إلى أن الإطار الموجه بواسطة SURD يقدم طريقة قوية وقابلة للتفسير لتصنيف مخاطر CKD المبكرة، مما يظهر أداءً متسقًا عبر إعدادات سريرية مختلفة.

مقدمة

مرض الكلى المزمن (CKD) هو حالة شائعة ومتقدمة غالباً ما تظل غير مشخصة في مراحلها المبكرة بسبب نقص الأعراض الواضحة، مما يؤدي إلى زيادة خطر التقدم إلى مرض الكلى في المرحلة النهائية. المعايير التشخيصية القياسية لـ CKD هي معدل الترشيح الكبيبي المقدر (eGFR) أقل من 60 مل/دقيقة/1.73 م²؛ ومع ذلك، يُوصى بإجراء اختبارات الدم والبول الروتينية للتعرف المبكر في عموم السكان. بينما عزز التعلم الآلي دقة التنبؤات المتعلقة بـ CKD، يمكن أن تعيق تعقيدات هذه النماذج الشفافية السريرية، حيث تقوم أدوات الذكاء الاصطناعي القابلة للتفسير (XAI) الموجودة عادةً بتقييم الميزات بشكل منفصل، متجاهلة التفاعلات بين المتغيرات السريرية.

لمعالجة قيود الطرق الإحصائية التقليدية التي تفشل في التقاط الهيكل السببي لـ CKD، تستخدم هذه البحث إطارًا سببيًا لفهم أفضل للتكرار والتفاعلات بين الميزات السريرية. على الرغم من أن طرق الاكتشاف السببي المعروفة، مثل سببية غرانجر (GC)، والتخطيط المتقارب (CCM)، وانتقال الإنتروبيا، كانت فعالة في سياقات معينة، إلا أنها غالباً ما تعالج تدفق المعلومات كإشارة مجمعة وتكافح مع غير الخطيات والعوامل المربكة غير المرصودة التي تتميز بها CKD. تهدف هذه الدراسة إلى تعزيز فهم CKD من خلال استخدام نهج أكثر دقة في الاستدلال السببي.

طرق

في هذه الدراسة، تم هيكلة المنهجية التجريبية حول نهج منهجي لتحليل مجموعة بيانات مرض الكلى المزمن (CKD) المستمدة من مستودع UCI. تبدأ العملية بمعالجة البيانات، التي تُعد مجموعة البيانات للتحليل اللاحق. بعد ذلك، يتم استخدام طريقة SURD (اكتشاف التمثيل غير المراقب القائم على الفضاء الفرعي) لتصنيف الميزات داخل مجموعة البيانات.

تستكشف الدراسة أيضًا تطبيق عشرة تقنيات مختلفة للتعلم الآلي لتصنيف البيانات. لتقييم أداء هذه النماذج بشكل شامل، يتم استخدام مؤشرات تقييم متعددة. بالإضافة إلى ذلك، يتم دمج طريقة التصور SHAP (SHapley Additive exPlanations) لتوضيح أهمية الميزات، مما يعزز قابلية تفسير نتائج النموذج.

مناقشة

في هذه الدراسة، يستخدم المؤلفون تحليل التفكيك الفريد المتناغم (SURD) لتعزيز قابلية تفسير ودقة التنبؤ لنماذج التعلم الآلي في تشخيص مرض الكلى المزمن (CKD). من خلال تفكيك العلاقات السببية إلى مكونات فريدة ومتكررة ومتناسقة، يسمح SURD بعملية اختيار ميزات أكثر دقة تعالج تعقيدات الإشارات السريرية المتداخلة. تبحث البحث فيما إذا كان تطبيق SURD يحسن من تعميم النموذج ودقة التشخيص، ويحدد الميزات الفسيولوجية الرئيسية التي تؤثر على CKD، ويعزز شفافية النماذج التنبؤية من خلال تقنيات الذكاء الاصطناعي القابلة للتفسير (XAI).

تشير النتائج إلى أن النماذج التي تستخدم الميزات المختارة من خلال SURD تتفوق باستمرار على تلك المعتمدة على الطرق التقليدية مثل LASSO وخوارزمية PC، محققة دقة عالية وموثوقية عبر مصنفات مختلفة. من الجدير بالذكر أن الدراسة تظهر أن الميزات المتناسقة تساهم بشكل كبير في الأداء التنبؤي، بينما الميزات الفريدة، على الرغم من كونها مفيدة، تحقق نتائج أقل استقرارًا. بالإضافة إلى ذلك، يكشف استخدام قيم SHAP لتحليل أهمية الميزات أن بعض المؤشرات الرئيسية، مثل الهيموجلوبين وضغط الدم، تلعب أدوارًا حاسمة في التنبؤ بـ CKD. بشكل عام، تؤكد النتائج فعالية إطار عمل SURD في تحسين قابلية تفسير النموذج وتطبيقه السريري، مما يضع معيارًا جديدًا لاختيار الميزات في التشخيص الطبي.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-41050-z
PMID: https://pubmed.ncbi.nlm.nih.gov/41741522
Publication Date: 2026-02-25
Author(s): Ningning Xue et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

This study addresses the significant challenge of early risk prediction in chronic kidney disease (CKD) by introducing a novel prediction framework that combines machine learning with Synergy-Unique-Redundant Decomposition (SURD) from causal information theory. Utilizing the UCI-CKD dataset with 400 samples, the researchers developed ten classification models, addressing missing data through multiple imputation and class imbalance via synthetic minority oversampling. Model performance was rigorously evaluated using metrics such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC).

To ensure the generalizability of the models and to mitigate overfitting, extensive external validation was performed on a large-scale cohort from the MIMIC-IV database, comprising 27,834 electronic health records. While several models exhibited high performance on the internal dataset, the Random Forest model stood out with an AUC of 0.990 (95% CI: 0.989-0.991) in the external validation, significantly outperforming the baseline Decision Tree model, which achieved an AUC of 0.914 (95% CI: 0.912-0.916). The SURD-based causal decomposition and feature importance analyses identified key clinical predictors, such as serum creatinine and hemoglobin. These findings suggest that the SURD-guided framework offers a robust and interpretable method for early CKD risk stratification, demonstrating consistent performance across different clinical settings.

Introduction

Chronic kidney disease (CKD) is a prevalent and progressive condition that often goes undiagnosed in its early stages due to a lack of obvious symptoms, leading to a higher risk of progression to end-stage renal disease. The standard diagnostic criterion for CKD is an estimated glomerular filtration rate (eGFR) below 60 mL/min/1.73 m²; however, routine blood and urine tests are recommended for early identification in the general population. While machine learning has enhanced the accuracy of CKD predictions, the complexity of these models can hinder clinical transparency, as existing explainable artificial intelligence (XAI) tools typically assess features in isolation, neglecting the interactions among clinical variables.

To address the limitations of traditional statistical methods that fail to capture the causal structure of CKD, this research employs a causal framework to better understand the redundancy and interactions among clinical features. Although established causal discovery methods, such as Granger Causality (GC), Convergent Cross-Mapping (CCM), and Transfer Entropy, have been effective in certain contexts, they often treat information flow as an aggregate signal and struggle with the non-linearities and unobserved confounders that are characteristic of CKD. This study aims to advance the understanding of CKD by utilizing a more nuanced approach to causal inference.

Methods

In this study, the experimental methodology is structured around a systematic approach to analyzing the Chronic Kidney Disease (CKD) dataset sourced from the UCI repository. The process initiates with data preprocessing, which prepares the dataset for subsequent analysis. Following this, the SURD (Subspace-based Unsupervised Representation Discovery) method is employed to classify the features within the dataset.

The study further explores the application of ten distinct machine learning techniques to categorize the data. To assess the performance of these models comprehensively, multiple evaluation indicators are utilized. Additionally, the SHAP (SHapley Additive exPlanations) visualization method is incorporated to elucidate the significance of the features, thereby enhancing the interpretability of the model outcomes.

Discussion

In this study, the authors employ Synergy-Unique-Redundant Decomposition (SURD) to enhance the interpretability and predictive accuracy of machine learning models in chronic kidney disease (CKD) diagnostics. By decomposing causal relationships into unique, redundant, and synergistic components, SURD allows for a more nuanced feature selection process that addresses the complexities of overlapping clinical signals. The research investigates whether applying SURD improves model generalization and diagnostic accuracy, identifies key physiological features influencing CKD, and enhances the transparency of predictive models through explainable artificial intelligence (XAI) techniques.

The results indicate that models utilizing features selected through SURD consistently outperform those based on traditional methods like LASSO and the PC algorithm, achieving high accuracy and robustness across various classifiers. Notably, the study demonstrates that synergistic features contribute significantly to predictive performance, while unique features, although informative, yield less stable results. Additionally, the use of SHAP values for feature importance analysis reveals that certain key indicators, such as hemoglobin and blood pressure, play critical roles in CKD prediction. Overall, the findings underscore the effectiveness of the SURD framework in improving model interpretability and clinical applicability, setting a new standard for feature selection in medical diagnostics.