مقارنة نماذج التعلم الآلي اللامركزية ونماذج الذكاء الاصطناعي السريرية مع البدائل المحلية والمركزية: مراجعة منهجية Comparing decentralized machine learning and AI clinical models to local and centralized alternatives: a systematic review

المجلة: npj Digital Medicine، المجلد: 9، العدد: 1
DOI: https://doi.org/10.1038/s41746-025-02329-z
PMID: https://pubmed.ncbi.nlm.nih.gov/41688745
تاريخ النشر: 2026-02-14
المؤلف: José Miguel Diniz وآخرون
الموضوع الرئيسي: التقنيات التي تحافظ على الخصوصية في البيانات

نظرة عامة

تقيّم هذه المراجعة المنهجية فعالية طرق التعلم اللامركزي (DL)، مثل التعلم الفيدرالي والتعلم الجماعي، مقارنةً بنماذج التعلم المركزي التقليدية (CL) ضمن تطبيقات الرعاية الصحية. قامت المراجعة بتحليل 160 مقالة، تشمل 710 نموذج DL و8,149 مقارنة في الأداء عبر مجالات سريرية متنوعة، لا سيما الأورام، COVID-19، والتشخيصات العصبية. تشير النتائج إلى أنه بينما تفوق CL على DL في المقاييس المعتمدة على العتبة—مظهرةً تفضيلًا بنسبة 78% للدقة ودرجة Dice—أظهر DL أداءً مشابهًا في مقاييس الترتيب، مع تفضيل بنسبة 51% للمنطقة تحت منحنى التشغيل المستلم (AUROC).

علاوة على ذلك، تجاوز DL باستمرار التعلم المحلي (LL) عبر جميع المقاييس التي تم تقييمها، لا سيما في الدقة (86% تفضيل) والدقة (83% تفضيل). أشار تحليل العتبة السريرية إلى أن CL عزز من جدوى DL في ما يصل إلى 18% من المقارنات، مع تحقيق كلا الطريقتين لجدوى سريرية تعكس عادةً اختلافات في الأداء “ممتاز مقابل مقبول” (فرق وسطي من 0.7-1.5 نقطة مئوية). في المقابل، حسّن DL بشكل كبير من جدوى LL، مع اختلافات وسطية تتراوح من 7.6 إلى 27 نقطة مئوية. تشير هذه النتائج إلى أن DL يمكن أن يكون بديلاً مقبولاً سريريًا في البيئات الحساسة للخصوصية، مما يتطلب توازنًا دقيقًا بين تبادل الأداء والاعتبارات التنظيمية. يجب أن تركز الأبحاث المستقبلية على الإبلاغ القياسي عن مقاييس الخصوصية والأداء.

مقدمة

في المقدمة، يحدد البحث التحديات الكبيرة التي تواجه أنظمة الصحة العالمية، بما في ذلك زيادة احتياجات الرعاية والتكاليف بسبب التحولات الوبائية والتقدم التكنولوجي. على الرغم من إمكانية التعلم الآلي والذكاء الاصطناعي (AI) في تعزيز اتخاذ القرارات السريرية وتحسين استخدام الموارد، إلا أن أنظمة الرعاية الصحية واجهت صعوبة في تنفيذ هذه التقنيات بشكل فعال. تم تحديد حاجز حاسم وهو الوصول إلى بيانات صحية عالية الجودة، لا سيما البيانات الواقعية (RWD)، والتي تعتبر ضرورية لتطوير تطبيقات الرعاية الصحية القوية ولكن تعيقها التحديات القانونية والتنظيمية والتشغيلية.

لمعالجة هذه القضايا، تقترح طرق التعلم اللامركزي، مثل التعلم الفيدرالي (FL) والتعلم الجماعي (SL)، كحلول قابلة للتطبيق. تسمح هذه الطرق بتدريب نماذج التعلم الآلي على بيانات الرعاية الصحية الموزعة مع ضمان الخصوصية والامتثال للوائح. يتضمن FL تدريب النموذج المحلي مع مشاركة تحديثات المعلمات فقط للتجميع، بينما يسهل SL التجميع من نظير إلى نظير دون سلطة مركزية. يذكر البحث أيضًا استخدام طرق التجميع، مثل تقنيات bagging، التي تقدم مخططات تجميع مرنة. تختتم المقدمة بنظرة عامة موجزة عن منهجية الدراسة، مع تسليط الضوء على فحص وتحليل 946 دراسة، مما أدى في النهاية إلى فحص 160 مقالة رئيسية تتعلق بنماذج التعلم اللامركزي.

طرق

في هذا القسم، يحدد المؤلفون الطرق المستخدمة لتحليل مقاييس الأداء المتعلقة بهياكل التعلم اللامركزي في التطبيقات السريرية. تم تصنيف البيانات بناءً على مقاييس الأداء، هيكل التعلم اللامركزي، والمجالات السريرية. تم تطوير لوحة معلومات عبر الإنترنت باستخدام حزمة Shiny R لتسهيل عمليات البحث المخصصة لمقارنات الأداء. شمل التحليل الرسوم البيانية التي توضح توزيع اختلافات الأداء بين النماذج اللامركزية وغير اللامركزية، مع إحصائيات رئيسية مثل الاختلافات الوسيطة، النسب المئوية 25 و75، وفترات الثقة 95% المعاد تقديرها من 10,000 محاكاة.

لتحليل الحساسية، تم إنشاء تنويعات من الرسوم البيانية باستثناء الدراسة التي تحتوي على أعلى عدد من الملاحظات، كما هو موضح في المواد التكميلية. وضع المؤلفون عتبة أداء مقبولة سريريًا تبلغ 0.80 لتقييم السيناريوهات التي لم تحقق فيها النماذج المحلية هذا المعيار بينما حققت نماذج التعلم اللامركزي ذلك أو تجاوزته. كما حددوا حالات حيث كانت كلا نوعي النموذج قابلة للتطبيق سريريًا، مع تسليط الضوء على الحالات التي تفوقت فيها النماذج اللامركزية على المحلية. نظرًا لتنوع البيانات، لم يتم إجراء تحليل ميتا؛ بل تم تقسيم المقارنات إلى عشرات بناءً على أداء النموذج اللامركزي، مما يسمح بفحص دقيق للأداء عبر فترات مختلفة.

نقاش

تقدم هذه المراجعة المنهجية تحليلًا شاملاً لطرق التعلم اللامركزي (DL) في الرعاية الصحية، حيث تجمع البيانات من 160 دراسة تشمل 710 نموذج لامركزي و8,149 مقارنة في الأداء. تشير النتائج إلى نمو كبير في إنتاج الأبحاث منذ عام 2020، لا سيما في الأمراض السرطانية، COVID-19، والحالات العصبية. يتفوق التعلم المركزي باستمرار على الطرق اللامركزية في الدقة ودرجة Dice، محققًا أداءً متفوقًا في 78% من المقارنات. ومع ذلك، تظهر النماذج اللامركزية مزايا ملحوظة في الدقة والحساسية، متفوقةً على نماذج التعلم المحلي عبر جميع المقاييس، لا سيما في السيناريوهات التي تكون فيها مشاركة البيانات مقيدة أو عندما تكون الصلاحية الخارجية ذات أولوية.

تسلط المراجعة الضوء على أنه بينما يمكن أن توفر النماذج المركزية بدائل مفيدة سريريًا في ما يصل إلى 18% من الحالات، لا سيما بالنسبة للحساسية والدقة، فإن النماذج اللامركزية أكثر فعالية في التخفيف من الإفراط في التكيف مع الأنماط المحددة للموقع، مما يعزز القابلية للتعميم. كما تؤكد التحليل على الحاجة إلى الإبلاغ القياسي عن تقنيات الحفاظ على الخصوصية ومقاييس الأداء، حيث تعيق التناقضات الحالية تقييم التبادلات بين الخصوصية والأداء السريري. بشكل عام، تؤسس الدراسة DL كبديل قابل للتطبيق في السياقات التي تكون فيها الطرق المركزية غير ممكنة، مع التأكيد على أن التعلم المركزي لا يزال هو المعيار الذهبي للأداء الأمثل في تطبيقات الرعاية الصحية.

Journal: npj Digital Medicine, Volume: 9, Issue: 1
DOI: https://doi.org/10.1038/s41746-025-02329-z
PMID: https://pubmed.ncbi.nlm.nih.gov/41688745
Publication Date: 2026-02-14
Author(s): José Miguel Diniz et al.
Primary Topic: Privacy-Preserving Technologies in Data

Overview

This systematic review assesses the efficacy of decentralized learning (DL) methods, such as federated learning and swarm learning, in comparison to traditional centralized learning (CL) models within healthcare applications. The review analyzed 160 articles, encompassing 710 DL models and 8,149 performance comparisons across various clinical domains, notably oncology, COVID-19, and neurological diagnostics. The findings indicate that while CL outperformed DL in threshold-dependent metrics—showing a 78% favorability for accuracy and Dice score—DL demonstrated comparable performance in ranking metrics, with a 51% favorability for the area under the receiver operating characteristic curve (AUROC).

Moreover, DL consistently surpassed local learning (LL) across all evaluated metrics, particularly in precision (86% favorability) and accuracy (83% favorability). Clinical threshold analysis indicated that CL enhanced the viability of DL in up to 18% of comparisons, with both methods achieving clinical viability typically reflecting “excellent versus acceptable” performance differences (median difference of 0.7-1.5 percentage points). In contrast, DL significantly improved LL viability, with median differences ranging from 7.6 to 27 percentage points. These results suggest that DL can serve as a clinically acceptable alternative in privacy-sensitive environments, necessitating a careful balance between performance trade-offs and regulatory considerations. Future research should focus on standardized reporting of privacy and performance metrics.

Introduction

In the introduction, the paper outlines the significant challenges faced by global health systems, including rising care needs and costs due to epidemiological transitions and technological advancements. Despite the potential of machine learning and artificial intelligence (AI) to enhance clinical decision-making and optimize resource use, healthcare systems have struggled to effectively implement these technologies. A critical barrier identified is the access to high-quality health data, particularly real-world data (RWD), which is essential for developing robust healthcare applications but is hindered by legal, regulatory, and operational challenges.

To address these issues, decentralized learning approaches, such as federated learning (FL) and swarm learning (SL), are proposed as viable solutions. These methods allow for the training of machine learning models on distributed healthcare data while ensuring privacy and compliance with regulations. FL involves local model training with only parameter updates shared for aggregation, while SL facilitates peer-to-peer aggregation without a central authority. The paper also mentions the use of ensemble methods, like bagging techniques, which offer flexible aggregation schemes. The introduction concludes with a brief overview of the study’s methodology, highlighting the screening and analysis of 946 studies, ultimately leading to the examination of 160 primary articles related to decentralized learning models.

Methods

In this section, the authors outline the methods employed to analyze performance metrics related to decentralized learning architectures in clinical applications. Data were categorized based on performance metrics, decentralized learning architecture, and clinical domains. An online dashboard was developed using the Shiny R package to facilitate customized searches of performance comparisons. The analysis included histograms illustrating the distribution of performance differences between decentralized and non-decentralized models, with key statistics such as median differences, the 25th and 75th percentiles, and bootstrapped 95% confidence intervals derived from 10,000 simulations.

For sensitivity analysis, variations of the histograms were generated excluding the study with the highest number of observations, as detailed in the supplementary material. The authors established a clinically acceptable performance threshold of 0.80 to evaluate scenarios where local models fell short of this benchmark while decentralized learning models met or exceeded it. They also identified instances where both model types were clinically viable, highlighting cases where decentralized models outperformed local ones. Due to the heterogeneity of the data, no meta-analysis was performed; instead, comparisons were segmented into deciles based on decentralized model performance, allowing for a nuanced examination of performance across different intervals.

Discussion

This systematic review offers a comprehensive analysis of decentralized learning (DL) approaches in healthcare, synthesizing data from 160 studies that encompass 710 decentralized models and 8,149 performance comparisons. The findings indicate a significant growth in research output since 2020, particularly in oncological diseases, COVID-19, and neurological conditions. Centralized learning consistently outperforms decentralized methods in accuracy and Dice score, achieving superior performance in 78% of comparisons. However, decentralized models demonstrate notable advantages in precision and sensitivity, outperforming local learning models across all metrics, particularly in scenarios where data sharing is restricted or when external validity is prioritized.

The review highlights that while centralized models can provide clinically useful alternatives in up to 18% of cases, particularly for sensitivity and accuracy, decentralized models are more effective in mitigating overfitting to site-specific patterns, thus enhancing generalizability. The analysis also underscores the need for standardized reporting on privacy-preserving techniques and performance metrics, as current inconsistencies hinder the assessment of the trade-offs between privacy and clinical performance. Overall, the study establishes DL as a viable alternative in contexts where centralized approaches are infeasible, while emphasizing that centralized learning remains the gold standard for optimal performance in healthcare applications.