التعلم الآلي لتصنيف مرض الكلى المزمن وتوقع مستويات الكرياتينين باستخدام قياسات منزلية Machine learning for classifying chronic kidney disease and predicting creatinine levels using at-home measurements

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-88631-y
PMID: https://pubmed.ncbi.nlm.nih.gov/39910170
تاريخ النشر: 2025-02-05
المؤلف: Brady Metherall وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تبحث الورقة البحثية في تطبيق نماذج التعلم الآلي للكشف المبكر وتصنيف مرض الكلى المزمن (CKD) باستخدام مجموعات ميزات سريرية متنوعة. على وجه التحديد، تستخدم الدراسة الشبكات العصبية الاصطناعية (ANNs) والغابات العشوائية (RFs) على مجموعة بيانات تضم 400 مريض، وتحلل ثلاث مجموعات ميزات: ميزات المنزل، والمراقبة، والميزات المخبرية. تشير النتائج إلى أن RFs تتفوق على ANNs من حيث الدقة في تصنيف CKD في المنزل (92.5% مقابل 82.9%)، بينما تظهر ANNs معدل إيجابي حقيقي أعلى (TPR) مقارنةً بـ RFs. تحقق كلا الطريقتين دقة تتجاوز 98% لميزات المراقبة والمخبرية. بالإضافة إلى ذلك، تسلط الدراسة الضوء على أهمية المتغيرات السريرية مثل الهيموجلوبين واليوريا في الدم، فضلاً عن الأمراض المصاحبة مثل ارتفاع ضغط الدم ومرض السكري، في الكشف عن CKD.

تؤكد الاستنتاجات على إمكانية استخدام التعلم الآلي لتعزيز فحص وتشخيص CKD، لا سيما من خلال طرق الكشف في المنزل. يدعو المؤلفون إلى تطوير أدوات فحص أكثر قوة يمكن أن تؤدي إلى تدخلات مبكرة وتحسين نتائج المرضى. كما يعترفون بضرورة وجود مجموعات بيانات أكبر تحتوي على معلومات مفصلة عن مراحل CKD لتحسين دقة النموذج. تشمل اتجاهات البحث المستقبلية تدريب ANNs ذات المهام المزدوجة لتصنيف CKD وتوقع الكرياتينين في وقت واحد، مما قد يعزز الأداء التشخيصي بشكل أكبر. بشكل عام، يمكن أن يؤدي دمج التعلم الآلي في أمراض الكلى إلى فوائد كبيرة للأفراد المتأثرين بـ CKD.

الطرق

توضح قسم “الطرق” الأساليب التجريبية والتحليلية المستخدمة في الدراسة. تتناول اختيار المشاركين، وتصميم التجارب، والتقنيات الإحصائية المستخدمة لتحليل البيانات. يتم وصف منهجيات محددة، مثل التجارب المضبوطة أو الدراسات الملاحظة، لضمان إمكانية تكرار النتائج وموثوقيتها.

بالإضافة إلى ذلك، قد يتضمن القسم معلومات عن الأدوات والتقنيات المستخدمة لجمع البيانات، مثل الاستبيانات، والمعدات المخبرية، أو البرمجيات للتحليل الإحصائي. يتم أيضًا مناقشة الأسباب وراء الطرق المختارة، مع التأكيد على ملاءمتها لمعالجة الأسئلة البحثية المطروحة في الدراسة. بشكل عام، يخدم هذا القسم لتوفير إطار واضح لفهم كيفية إجراء البحث وصحة النتائج التي تم الحصول عليها.

النتائج

في هذا القسم، يتم تقديم نتائج الدراسة بعد معالجة البيانات، مما أسفر عن مجموعة بيانات تضم 383 مريضًا يتميزون بـ 54 ميزة لكل منهم. تم تقسيم مجموعة البيانات إلى مجموعتين: 238 مريضًا يعانون من مرض الكلى المزمن (CKD) و145 بدون. من بين 54 ميزة، تم تخصيص 27 لأغراض المراقبة بينما كانت 18 مخصصة للتقييمات المنزلية. استخدمت الدراسة كل من الشبكات العصبية الاصطناعية (ANN) وخوارزميات الغابات العشوائية (RF) لأداء مهام التصنيف على هذه المجموعات من الميزات.

تم حساب مقاييس الأداء لكل نموذج عبر 10-fold cross-validation، مع الإبلاغ عن المتوسط والانحراف المعياري. يتم تلخيص نتائج التصنيف لـ CKD باستخدام كل من ANN وRF في الجدول 3، بينما يقدم الجدول 4 النتائج المتعلقة بتوقع الكرياتينين. بالإضافة إلى ذلك، يتم توضيح منحنيات خاصية التشغيل المستقبلي (ROC) وقيم المساحة تحت المنحنى (AUC) في الشكل 2، ويعرض الشكل 3 أهمية الميزات المستمدة من نماذج RF. يتم تفصيل قيم المعلمات الفائقة ونطاقات الضبط المستخدمة في التجارب في الجدول 2.

المناقشة

في هذه الدراسة، استخدمنا مجموعة بيانات متاحة للجمهور من مستودع التعلم الآلي UCI، تضم 400 مريض من تاميل نادو، الهند، للتحقيق في الكشف عن مرض الكلى المزمن (CKD) وتوقع مستوى الكرياتينين باستخدام تقنيات التعلم الآلي. تتضمن مجموعة البيانات 250 مريضًا تم تشخيصهم بـ CKD و150 بدون، مع ميزات تشمل نتائج اختبارات الدم، والبيانات الديموغرافية، والأمراض المصاحبة. قمنا بتصنيف الميزات إلى ثلاث مجموعات: ميزات المنزل (التي يمكن قياسها بسهولة من قبل المرضى)، وميزات المراقبة (المكتسبة خلال الفحوصات الصحية)، وميزات المختبر (المستمدة من الاختبارات المتخصصة). استخدم تحليلنا الشبكات العصبية الاصطناعية (ANNs) والغابات العشوائية (RFs) لتصنيف CKD وتوقع مستويات الكرياتينين، حيث أظهرت RF أداءً متفوقًا، لا سيما مع ميزات المنزل، محققة دقة قدرها 92.5% مقارنةً بـ 82.9% لـ ANN.

أشارت النتائج إلى أن كلا النموذجين أديا بشكل استثنائي مع ميزات المراقبة والمخبرية، محققين دقة تتجاوز 98%. ومع ذلك، تفوقت RF على ANN من حيث معدلات السلبية الحقيقية والدقة العامة عند استخدام ميزات المنزل، مما يبرز ملاءمتها للكشف المبكر عن CKD في البيئات غير السريرية. بالإضافة إلى ذلك، حددنا الميزات الرئيسية التي تؤثر على أداء النموذج، مثل ارتفاع ضغط الدم ومرض السكري، والتي كانت مهمة في كل من مهام التصنيف والانحدار. تتماشى نتائجنا مع الدراسات السابقة، مما يعزز أهمية هذه الخصائص في تشخيص CKD. بشكل عام، تؤكد هذه الدراسة على إمكانية استخدام التعلم الآلي لتعزيز تشخيص وإدارة CKD، لا سيما من خلال استخدام البيانات المنزلية المتاحة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-88631-y
PMID: https://pubmed.ncbi.nlm.nih.gov/39910170
Publication Date: 2025-02-05
Author(s): Brady Metherall et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper investigates the application of machine learning models for the early detection and classification of chronic kidney disease (CKD) using various clinical feature sets. Specifically, the study employs artificial neural networks (ANNs) and random forests (RFs) on a dataset of 400 patients, analyzing three feature sets: at-home, monitoring, and laboratory features. The results indicate that RFs outperform ANNs in terms of accuracy for at-home CKD classification (92.5% vs. 82.9%), while ANNs exhibit a higher true positive rate (TPR) compared to RFs. Both methods achieve accuracies exceeding 98% for monitoring and laboratory features. Additionally, the study highlights the importance of clinical variables such as hemoglobin and blood urea, as well as comorbidities like hypertension and diabetes mellitus, in CKD detection.

The conclusions emphasize the potential of machine learning to enhance CKD screening and diagnosis, particularly through at-home detection methods. The authors advocate for the development of more robust screening tools that could lead to earlier interventions and improved patient outcomes. They also acknowledge the necessity for larger datasets with detailed CKD stage information to refine model accuracy. Future research directions include training dual-task ANNs for simultaneous CKD classification and creatinine prediction, which may further enhance diagnostic performance. Overall, the integration of machine learning in nephrology could significantly benefit individuals affected by CKD.

Methods

The “Methods” section outlines the experimental and analytical approaches employed in the study. It details the selection of participants, the design of the experiments, and the statistical techniques used for data analysis. Specific methodologies, such as controlled trials or observational studies, are described to ensure reproducibility and reliability of the findings.

Additionally, the section may include information on the tools and technologies utilized for data collection, such as surveys, laboratory equipment, or software for statistical analysis. The rationale behind the chosen methods is also discussed, emphasizing their appropriateness for addressing the research questions posed in the study. Overall, this section serves to provide a clear framework for understanding how the research was conducted and the validity of the results obtained.

Results

In this section, the results of the study are presented following the pre-processing of data, which resulted in a dataset comprising 383 patients characterized by 54 features each. The dataset was divided into two groups: 238 patients with Chronic Kidney Disease (CKD) and 145 without. Among the 54 features, 27 were designated for monitoring purposes while 18 were intended for at-home assessments. The study employed both Artificial Neural Networks (ANN) and Random Forest (RF) algorithms to perform classification tasks on these feature sets.

The performance metrics for each model were calculated across 10-fold cross-validation, with mean and standard deviation reported. The classification results for CKD using both ANN and RF are summarized in Table 3, while Table 4 presents the findings related to creatinine prediction. Additionally, the Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) values are illustrated in Fig. 2, and Fig. 3 displays the feature importance derived from the RF models. Hyperparameter values and tuning ranges utilized in the experiments are detailed in Table 2.

Discussion

In this study, we utilized a publicly available dataset from the UCI machine learning repository, comprising 400 patients from Tamil Nadu, India, to investigate chronic kidney disease (CKD) detection and creatinine level prediction using machine learning techniques. The dataset includes 250 patients diagnosed with CKD and 150 without, with features encompassing blood test results, demographic data, and comorbidities. We categorized the features into three sets: at-home features (easily measurable by patients), monitoring features (obtained during health checks), and laboratory features (derived from specialized tests). Our analysis employed artificial neural networks (ANNs) and random forests (RFs) to classify CKD and predict creatinine levels, with RF demonstrating superior performance, particularly with at-home features, achieving an accuracy of 92.5% compared to 82.9% for ANN.

The results indicated that both models performed exceptionally well with monitoring and laboratory features, achieving accuracies exceeding 98%. However, RF outperformed ANN in terms of true negative rates and overall accuracy when using at-home features, highlighting its suitability for early CKD detection in non-clinical settings. Additionally, we identified key features influencing model performance, such as hypertension and diabetes mellitus, which were significant in both classification and regression tasks. Our findings align with previous studies, reinforcing the importance of these attributes in CKD diagnosis. Overall, this research underscores the potential of machine learning to enhance CKD diagnosis and management, particularly through the use of accessible at-home data.