تعزيز حوكمة الذكاء الاصطناعي في الرعاية الصحية من خلال نموذج نضج شامل قائم على مراجعة منهجية Advancing healthcare AI governance through a comprehensive maturity model based on systematic review

المجلة: npj Digital Medicine، المجلد: 9، العدد: 1
DOI: https://doi.org/10.1038/s41746-026-02418-7
PMID: https://pubmed.ncbi.nlm.nih.gov/41673321
تاريخ النشر: 2026-02-11
المؤلف: Rowan Hussein وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية والتعليم

نظرة عامة

إن نشر الذكاء الاصطناعي (AI) في الرعاية الصحية يتزايد بسرعة؛ ومع ذلك، فإن أطر الحوكمة الحالية غالبًا ما تكون مجزأة وتفترض توفر موارد كبيرة. أظهر استعراض منهجي لـ 35 إطارًا لتنفيذ الذكاء الاصطناعي في الرعاية الصحية، نُشر بين عامي 2019 و2024، سبعة مجالات أساسية للحوكمة الفعالة للذكاء الاصطناعي. بينما تقدم هذه الأطر رؤى قيمة، فإن متطلبات مواردها تشكل تحديات كبيرة للمنظمات الصحية الأصغر.

لسد هذه الفجوة، طور المؤلفون تقييم جاهزية حوكمة الذكاء الاصطناعي في الرعاية الصحية (HAIRA)، وهو نموذج نضج من خمسة مستويات مصمم لتوفير مسارات حوكمة قابلة للتنفيذ تتناسب مع موارد المنظمات المختلفة. يتراوح HAIRA من المستوى 1 (أولي/عشوائي) إلى المستوى 5 (رائد)، مع وضع معايير محددة عبر المجالات السبعة للحوكمة. يمكّن هذا النموذج المتدرج المنظمات الصحية من تقييم قدراتها الحالية في حوكمة الذكاء الاصطناعي وتحديد أهداف تقدم واقعية، مما يعالج الحاجة الملحة لاستراتيجيات حوكمة مرنة تضمن أن يؤدي تنفيذ الذكاء الاصطناعي إلى فوائد ملموسة عبر مستويات موارد متنوعة.

مقدمة

تؤكد مقدمة ورقة البحث على الأهمية الحاسمة لإطارات المراقبة والصيانة في الوقت الحقيقي لأنظمة الذكاء الاصطناعي، لا سيما في بيئات الرعاية الصحية. تبرز الحاجة إلى التقييم المستمر لضمان توافق أداء الذكاء الاصطناعي مع الاحتياجات المتطورة للسكان المستهدفين. هذا التقييم المستمر ضروري لاكتشاف تحولات مجموعة البيانات – التغيرات في توزيع بيانات التدريب التي يمكن أن تؤدي إلى عدم دقة في مخرجات الذكاء الاصطناعي. قد تنشأ مثل هذه التحولات من التغيرات في ميزات الإدخال، وعلاقاتها مع المتغيرات المستهدفة، أو التغيرات في المتغير المستهدف نفسه. تؤكد الورقة على أن أدوات الذكاء الاصطناعي التي تستخدم حلقات التغذية الراجعة تتطلب مراقبة يقظة لمنع تضخيم الأخطاء أو التحيزات مع مرور الوقت.

علاوة على ذلك، تناقش المقدمة المشهد التنظيمي، مشيرة إلى أنه بينما تستخدم معظم الأجهزة الطبية حاليًا خوارزميات “مقفلة”، هناك تحول نحو نماذج أكثر مرونة تتطلب بروتوكول تغيير الخوارزمية (ACP) لإعادة التدريب وتقييم الأداء. تعمل المراقبة في الوقت الحقيقي كنظام إنذار مبكر، مما يمكّن مقدمي الرعاية الصحية من تحديد متى تكون التعديلات ضرورية مع ضمان تنفيذ التغييرات بحذر لتقليل المخاطر. لا تقيّم المراقبة المستمرة جودة البيانات وأداء النموذج فحسب، بل تسهل أيضًا اكتشاف التحيزات التي قد تنشأ أثناء النشر، مما يعزز العدالة والفعالية عبر مجموعات المرضى المتنوعة. يُوصى بإجراء تدقيقات منتظمة لمخرجات أدوات الذكاء الاصطناعي وتفاعلات المستخدمين لتحديد ومعالجة التحيزات المحتملة بشكل أكبر.

الطرق

في هذا الاستعراض المنهجي، التزمنا بإرشادات تمديد PRISMA لتقييم الأطر الحالية، والإرشادات، وقوائم التحقق لتنفيذ الذكاء الاصطناعي (AI) في الرعاية الصحية. كانت معايير الإدراج الأساسية تركز على الوثائق التي قدمت نهجًا شاملًا لتنفيذ الذكاء الاصطناعي، مع التأكيد على الأطر الشاملة بدلاً من المكونات المعزولة. تم دفع هذا القرار من خلال الحاجة إلى معالجة أبعاد متعددة لتقييم الذكاء الاصطناعي، مثل التحيز، وجودة البيانات، وتقييم النتائج، بشكل جماعي. استهدفنا بشكل خاص بيئات الرعاية الصحية بسبب تحدياتها الفريدة، بما في ذلك مخاوف خصوصية المرضى وإمكانية حدوث أخطاء خوارزمية تهدد الحياة، مما استبعد نماذج تقييم الذكاء الاصطناعي غير المتعلقة بالرعاية الصحية.

شمل عملية المراجعة اختيارًا صارمًا للمقالات المنشورة بين عامي 2019 و2024، لضمان ملاءمة الأطر التي تم فحصها. تم فحص المقالات بناءً على معايير أهلية محددة مسبقًا، مع مناقشات بين المراجعين لحل أي شكوك. تم تحليل الدراسات المدرجة من حيث خصائصها الهيكلية وتوصياتها، التي تم تصنيفها إلى سبعة عمليات معترف بها في حوكمة الذكاء الاصطناعي. نظرًا للطبيعة المفاهيمية للعديد من الدراسات المدرجة، اعتُبرت أدوات تقييم خطر التحيز التقليدية غير قابلة للتطبيق؛ وبالتالي، تم إجراء تقييم نوعي للشمولية والأدلة بدلاً من ذلك. كانت هذه المنهجية تهدف إلى تعزيز موثوقية النتائج مع الحفاظ على الشفافية والدقة في عملية المراجعة.

النتائج

حددت الدراسة ما مجموعه 2,351 مقالة من خلال عمليات البحث بالكلمات الرئيسية، والتي تم تضييقها إلى 29 مقالة ذات صلة حول حوكمة الذكاء الاصطناعي في الرعاية الصحية بعد تطبيق فلاتر للحداثة والملاءمة والمراجعة اليدوية. تم تصنيف هذه المقالات إلى سبعة مجالات بناءً على العمليات المعمول بها في حوكمة الذكاء الاصطناعي، مع تركيز ملحوظ على صياغة المشكلة، وتطوير الخوارزمية، والمراقبة، بينما حصل الهيكل التنظيمي والتقييم الخارجي على تركيز أقل. تشمل الأطر الرئيسية من الأدبيات إطار Abramoff وآخرين لتعزيز العدالة الصحية، وإطار حوكمة ABCDS الذي تم تنفيذه في نظام الرعاية الصحية بجامعة ديوك، وإطار إدارة مخاطر الذكاء الاصطناعي من NIST، الذي يؤكد على الشفافية والمساءلة.

تؤكد النتائج على ضرورة وجود هيئة حوكمة شاملة تضم مجموعة متنوعة من أصحاب المصلحة، بما في ذلك علماء البيانات، ومطوري الذكاء الاصطناعي، والخبراء السريريين، للإشراف على دورة حياة الذكاء الاصطناعي. ستسهل هذه الهيئة اختيار وتحقق ومراقبة أدوات الذكاء الاصطناعي. بالإضافة إلى ذلك، تبرز الأدبيات أهمية الأهداف الواضحة في تنفيذ الذكاء الاصطناعي، داعيةً أنظمة الرعاية الصحية لتقييم سير العمل الحالي وتعريف معايير النجاح السريرية لضمان توافق أدوات الذكاء الاصطناعي مع الاحتياجات الصحية المحددة ومعايير السلامة. بشكل عام، توفر الدراسة إطارًا أساسيًا لمعالجة تحديات الحوكمة في تطبيقات الذكاء الاصطناعي ضمن بيئات الرعاية الصحية.

المناقشة

تؤكد قسم المناقشة في ورقة البحث على الحاجة الملحة لتطوير خوارزميات قوية وإرشادات تدريب النماذج لضمان الاستخدام الأخلاقي للذكاء الاصطناعي في الرعاية الصحية. تبرز المخاطر المرتبطة بالذكاء الاصطناعي، مثل التحيز وانتهاكات الخصوصية، باستخدام مثال خوارزمية ذكاء اصطناعي تمثلت بشكل خاطئ احتياجات الصحة للمرضى السود بسبب متغيرات وكيل معيبة. تؤكد الأطر التنظيمية، مثل قانون الذكاء الاصطناعي في الاتحاد الأوروبي وإرشادات إدارة الغذاء والدواء، على أهمية البيانات عالية الجودة والتمثيلية والحاجة إلى عمليات تحقق شاملة لتقليل التحيز. تم ذكر أدوات مثل AI Fairness 360 ومؤشرات عدالة Google كموارد لتقييم وتصحيح التحيز في أنظمة الذكاء الاصطناعي.

تقدم الورقة أيضًا تقييم جاهزية حوكمة الذكاء الاصطناعي في الرعاية الصحية (HAIRA)، وهو نموذج نضج متدرج مصمم لمساعدة المنظمات الصحية في تقييم وتعزيز قدراتها في حوكمة الذكاء الاصطناعي. يمتد HAIRA عبر خمسة مستويات، من الوعي الأساسي في المستوى 1 إلى الابتكار المتقدم في المستوى 5، مع معالجة المجالات الرئيسية مثل تطوير الخوارزمية، وتقييم النموذج، ودمج النشر. يعترف هذا النموذج بتنوع الموارد والخبرات عبر أنظمة الرعاية الصحية، مما يوفر مسارًا منظمًا للمنظمات لتحديد أهداف حوكمة واقعية وتحسين عمليات تنفيذ الذكاء الاصطناعي. تختتم المناقشة بالاعتراف بالاحتياجات المتنوعة لمجموعات المرضى وأهمية مشاركة أصحاب المصلحة في تطوير أطر حوكمة فعالة للذكاء الاصطناعي.

Journal: npj Digital Medicine, Volume: 9, Issue: 1
DOI: https://doi.org/10.1038/s41746-026-02418-7
PMID: https://pubmed.ncbi.nlm.nih.gov/41673321
Publication Date: 2026-02-11
Author(s): Rowan Hussein et al.
Primary Topic: Artificial Intelligence in Healthcare and Education

Overview

The deployment of Artificial Intelligence (AI) in healthcare is rapidly increasing; however, existing governance frameworks are often fragmented and assume substantial resource availability. A systematic review of 35 AI implementation frameworks in healthcare, published between 2019 and 2024, revealed seven essential domains for effective AI governance. While these frameworks offer valuable insights, their resource demands pose significant challenges for smaller healthcare organizations.

To bridge this gap, the authors developed the Healthcare AI Governance Readiness Assessment (HAIRA), a five-level maturity model designed to provide actionable governance pathways tailored to the resources of different organizations. HAIRA ranges from Level 1 (Initial/Ad Hoc) to Level 5 (Leading), with specific benchmarks established across the seven governance domains. This tiered model enables healthcare organizations to evaluate their current AI governance capabilities and set realistic advancement goals, addressing the urgent need for adaptive governance strategies that ensure AI implementation yields meaningful benefits across varying resource levels.

Introduction

The introduction of the research paper underscores the critical importance of real-time monitoring and maintenance frameworks for AI systems, particularly in healthcare settings. It highlights the need for continuous assessment to ensure that AI performance aligns with the evolving needs of the target population. This ongoing evaluation is essential for detecting dataset shifts—changes in the distribution of training data that can lead to inaccuracies in AI outputs. Such shifts may arise from alterations in input features, their relationships with target variables, or changes in the target variable itself. The paper emphasizes that AI tools utilizing feedback loops require vigilant monitoring to prevent the amplification of errors or biases over time.

Moreover, the introduction discusses the regulatory landscape, noting that while most medical devices currently use ‘locked’ algorithms, there is a shift towards more flexible models that necessitate an Algorithm Change Protocol (ACP) for retraining and performance evaluation. Real-time monitoring acts as an early warning system, enabling healthcare providers to identify when adjustments are necessary while ensuring that changes are implemented cautiously to mitigate risks. Continuous monitoring not only assesses data quality and model performance but also facilitates the detection of biases that may arise during deployment, thereby promoting fairness and effectiveness across diverse patient populations. Regular audits of AI tool outputs and user interactions are recommended to further identify and address potential biases.

Methods

In this systematic review, we adhered to the PRISMA extension guidelines to evaluate existing frameworks, guidelines, and checklists for the implementation of artificial intelligence (AI) in healthcare. The primary inclusion criteria focused on documents that provided a holistic approach to AI implementation, emphasizing comprehensive frameworks rather than isolated components. This decision was driven by the necessity to address multiple dimensions of AI evaluation, such as bias, data quality, and outcome assessment, collectively. We specifically targeted healthcare settings due to their unique challenges, including patient privacy concerns and the potential for life-threatening algorithmic errors, thereby excluding non-healthcare-related AI evaluation models.

The review process involved a rigorous selection of articles published between 2019 and 2024, ensuring the relevance of the frameworks examined. Articles were screened based on predefined eligibility criteria, with discussions among reviewers to resolve uncertainties. Included studies were analyzed for their structural characteristics and recommendations, which were categorized into seven recognized processes of AI governance. Given the conceptual nature of many included studies, traditional risk of bias assessment tools were deemed inapplicable; thus, a qualitative evaluation of comprehensiveness and evidence was conducted instead. This methodology aimed to enhance the reliability of findings while maintaining transparency and rigor in the review process.

Results

The research identified a total of 2,351 articles through keyword searches, which were narrowed down to 29 relevant articles on AI governance in healthcare after applying filters for recency, relevance, and manual review. These articles were categorized into seven domains based on established processes in AI governance, with a notable emphasis on problem formulation, algorithm development, and monitoring, while organizational structure and external evaluation received less focus. Key frameworks from the literature include Abramoff et al.’s framework for promoting health equity, Bedoya et al.’s ABCDS governance framework implemented at Duke University Health System, and the NIST AI Risk Management Framework, which emphasizes transparency and accountability.

The findings underscore the necessity of an overarching governance body comprising diverse stakeholders, including data scientists, AI developers, and clinical experts, to oversee the AI lifecycle. This body would facilitate the selection, validation, and monitoring of AI tools. Additionally, the literature highlights the importance of clear objectives in AI implementation, advocating for healthcare systems to assess existing workflows and define clinical success criteria to ensure that AI tools align with specific healthcare needs and safety standards. Overall, the study provides a foundational framework for addressing governance challenges in AI applications within healthcare settings.

Discussion

The discussion section of the research paper emphasizes the critical need for robust algorithm development and model training guidelines to ensure ethical AI use in healthcare. It highlights the risks associated with AI, such as bias and privacy breaches, using the example of an AI algorithm that misrepresented the health needs of Black patients due to flawed proxy variables. Regulatory frameworks, such as the EU Artificial Intelligence Act and FDA guidelines, stress the importance of high-quality, representative data and the need for comprehensive validation processes to mitigate bias. Tools like AI Fairness 360 and Google Fairness Indicators are mentioned as resources for assessing and correcting bias in AI systems.

The paper also introduces the Healthcare AI Governance Readiness Assessment (HAIRA), a tiered maturity model designed to help healthcare organizations evaluate and enhance their AI governance capabilities. HAIRA spans five levels, from basic awareness at Level 1 to advanced innovation at Level 5, addressing key domains such as algorithm development, model evaluation, and deployment integration. This model recognizes the varying resources and expertise across healthcare systems, providing a structured pathway for organizations to set realistic governance goals and improve their AI implementation processes. The discussion concludes by acknowledging the diverse needs of patient populations and the importance of stakeholder engagement in developing effective AI governance frameworks.