إطار عام لحوكمة الأجهزة الطبية المعتمدة على الذكاء الاصطناعي/تعلم الآلة A general framework for governing marketed AI/ML medical devices

المجلة: npj Digital Medicine، المجلد: 8، العدد: 1
DOI: https://doi.org/10.1038/s41746-025-01717-9
PMID: https://pubmed.ncbi.nlm.nih.gov/40450160
تاريخ النشر: 2025-05-31
المؤلف: Boris Babic وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية والتعليم

نظرة عامة

يوفر هذا المشروع البحثي تقييمًا شاملاً لنظام المراقبة بعد التسويق الخاص بإدارة الغذاء والدواء الأمريكية (FDA) للأجهزة الطبية المعتمدة على الذكاء الاصطناعي (AI) وتعلم الآلة (ML)، مع التركيز بشكل خاص على قاعدة بيانات تجربة الأجهزة لمصنعيها ومستخدميها (MAUDE). يحلل الدراسة تقارير الأحداث السلبية المتعلقة بحوالي 950 جهاز AI/ML تم الموافقة عليها بين عامي 2010 و2023، كاشفًا عن أوجه قصور كبيرة في نظام الإبلاغ الحالي. يحدد المؤلفون ثلاث مساهمات رئيسية: يصفون تقارير الأحداث السلبية، ويسلطون الضوء على أوجه القصور في آليات الإبلاغ الخاصة بـ FDA، ويقترحون توصيات قابلة للتنفيذ لتعزيز تقييم سلامة وفعالية أجهزة AI/ML.

تشير النتائج إلى أن قاعدة بيانات MAUDE تفتقر إلى قيمة معلوماتية حاسمة، مع وجود بيانات مفقودة كبيرة وعدم دقة في المتغيرات المبلغ عنها. ومن الجدير بالذكر أن المخاطر الكبيرة المرتبطة بأجهزة AI/ML، وخاصة تلك المتعلقة ببيانات التدريب والتحقق من صحة النماذج، لا يتم التقاطها في التقارير. يقترح المؤلفون مجموعتين من التوصيات: واحدة تهدف إلى تحسين ملاءمة قاعدة بيانات MAUDE لأجهزة AI/ML وأخرى تركز على تطوير إطار عمل أكثر شفافية للمراقبة بعد التسويق يتجاوز قيود الإبلاغ القائم على الأحداث الفردية.

الطرق

في هذه الدراسة، تم الحصول على البيانات من قاعدة بيانات تجربة الأجهزة لمصنعيها ومستخدميها (MAUDE) الخاصة بـ FDA، مع التركيز على الأجهزة الطبية المعتمدة بين عامي 2010 و2023. شملت التحليلات الأجهزة من الفئة I و II للإخطار المسبق (510(k))، بالإضافة إلى طلبات التصنيف De Novo، والتي تمثل معًا أكثر من 98% من تفويضات سوق الأجهزة من قبل FDA خلال هذه الفترة. استهدفت مجموعة البيانات بشكل خاص الأجهزة المعتمدة على الذكاء الاصطناعي وتعلم الآلة (AI/ML)، حيث تم التقاط 823 جهازًا فريدًا تم الموافقة عليه بموجب 510(k) مرتبطًا بـ 943 حدثًا سلبيًا تم الإبلاغ عنه من خلال نظام الإبلاغ عن الأجهزة الطبية (MDR) الخاص بـ FDA.

تتبع البحث 54 ميزة تتعلق بالأحداث السلبية ومصنعي الأجهزة، بما في ذلك نوع الحدث، مكان حدوثه، وتفاصيل الشركة المصنعة. من بين 943 MDR المرتبطة، كان هناك 20 رمز منتج فريد للأجهزة الطبية، والتي تعمل كمعرفات مكونة من ثلاثة أحرف تعينها FDA لتصنيف ومراقبة الأجهزة الطبية. توفر الدراسة روابط شاملة لمجموعة البيانات الكاملة ورمز STATA لإعادة الإنتاج، مما يسهل التحليل الإضافي لسلامة وأداء هذه الأجهزة الطبية.

النتائج

في هذا القسم، يقدم المؤلفون نتائج تحليلهم لتقارير الأحداث السلبية المتعلقة بأجهزة AI/ML الطبية، باستخدام بيانات من قاعدة بيانات MAUDE الخاصة بـ FDA. تتضمن مجموعة البيانات 823 جهازًا فريدًا تم الموافقة عليه بموجب 510(k) مرتبطًا بـ 943 حدثًا سلبيًا تم الإبلاغ عنه بين عامي 2010 و2023. ومن الجدير بالذكر أن تركيزًا كبيرًا من هذه التقارير ينشأ من جهازين فقط: نظام تحديد الهوية الميكروبي باستخدام مطياف الكتلة من Biomerieux (رمز المنتج PEX) ونظام مراقبة مستوى الجلوكوز في الدم من DarioHealth (رمز المنتج NBW). يكشف التحليل أن أكثر من 98% من الأحداث السلبية المرتبطة بأجهزة AI/ML تُعزى إلى أقل من خمسة أجهزة، مقارنةً بحوالي 85% للأجهزة غير المعتمدة على AI/ML. علاوة على ذلك، يتم تصنيف 90.88% من تقارير أجهزة AI/ML على أنها أعطال، مقارنةً بـ 77.05% للأجهزة غير المعتمدة على AI/ML، مما يشير إلى تركيز مرتفع بشكل خاص من المشكلات داخل تقنيات AI/ML.

يؤكد المؤلفون أن الغالبية العظمى من الأحداث السلبية لنظام تحديد الهوية الميكروبي باستخدام مطياف الكتلة تتعلق بالخطأ في تحديد الكائنات الدقيقة، مما قد يشكل مخاطر خطيرة على سلامة المرضى. ومع ذلك، فإن شدة هذه المشكلات يصعب تقييمها بسبب القيود في البيانات التنظيمية. وبالمثل، بالنسبة لنظام مراقبة مستوى الجلوكوز في الدم من Dario، تتعلق التقارير بشكل أساسي بقراءات الجلوكوز غير الصحيحة، وبعضها قد يكون إيجابيات كاذبة. يحذر المؤلفون من استخلاص استنتاجات واسعة حول أداء هذه الأجهزة بناءً فقط على تمثيلها في قاعدة البيانات، مشيرين إلى أن ممارسات الإبلاغ الدقيقة من قبل الشركات المصنعة قد تسهم في تمثيلها المفرط. يجادلون بأن نظام الإبلاغ عن الأحداث السلبية الحالي غير كافٍ لمراقبة أجهزة AI/ML بشكل فعال، مما يبرز الحاجة إلى تحسينات لالتقاط ومعالجة المخاوف المتعلقة بالسلامة الخاصة بهذه التكنولوجيا بشكل أفضل.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على القضايا الحرجة المتعلقة بقاعدة بيانات MAUDE، لا سيما فيما يتعلق بالبيانات المفقودة وتصنيفات الأحداث غير الكافية لأجهزة AI/ML الطبية. يكشف تحليل 943 تقريرًا عن الأحداث السلبية عن فجوات كبيرة في اكتمال البيانات، حيث كانت المتغيرات الرئيسية مثل موقع الحدث ومهنة المراسل مفقودة في 100% و30% من الحالات، على التوالي. تعيق هذه النقص في المعلومات التقييمات الفعالة للسلامة وتُعقد من تحديد الأحداث السلبية، خاصةً بالنظر إلى الطبيعة الحساسة للسياق لأداء أجهزة AI/ML. تشير النتائج إلى أن البيانات المفقودة أكثر شيوعًا في أجهزة AI/ML مقارنةً بالأجهزة الطبية التقليدية، مما يطرح تحديات أمام الشركات المصنعة والجهات التنظيمية في فهم ومعالجة قضايا السلامة.

علاوة على ذلك، تنتقد الورقة نظام تصنيف الأحداث الحالي، مشيرةً إلى وجود انفصال بين الأحداث المبلغ عنها ووصفها النوعي. على سبيل المثال، يتم تصنيف غالبية التقارير على أنها “عطل”، ومع ذلك فإن العديد من الحوادث تنشأ من أخطاء المستخدم بدلاً من فشل الجهاز. تُخفي هذه التصنيفات الخاطئة الطبيعة الحقيقية للمخاطر المتعلقة بالجهاز وتُعقد من المساءلة. يقترح المؤلفون أن الإطار الحالي للإبلاغ يجب أن يتطور لالتقاط مجموعة أوسع من القضايا ذات الصلة بأجهزة AI/ML، بما في ذلك انزلاق المفهوم وتحول المتغيرات، والتي هي فريدة من نوعها لهذه التقنيات. يدعون إلى تحسين ممارسات جمع البيانات وإصلاحات تنظيمية لتعزيز سلامة وقابلية استخدام أجهزة AI/ML الطبية، مؤكدين على الحاجة إلى نهج أكثر استباقية للمراقبة بعد التسويق.

Journal: npj Digital Medicine, Volume: 8, Issue: 1
DOI: https://doi.org/10.1038/s41746-025-01717-9
PMID: https://pubmed.ncbi.nlm.nih.gov/40450160
Publication Date: 2025-05-31
Author(s): Boris Babic et al.
Primary Topic: Artificial Intelligence in Healthcare and Education

Overview

This research project provides a comprehensive evaluation of the U.S. Food and Drug Administration’s (FDA) postmarket surveillance system for artificial intelligence (AI) and machine learning (ML) based medical devices, specifically focusing on the Manufacturer and User Facility Device Experience (MAUDE) database. The study analyzes adverse event reports related to approximately 950 AI/ML devices approved between 2010 and 2023, revealing significant deficiencies in the current reporting system. The authors identify three main contributions: they characterize the adverse event reports, highlight shortcomings in the FDA’s reporting mechanisms, and propose actionable recommendations for enhancing the assessment of AI/ML device safety and effectiveness.

The findings indicate that the MAUDE database lacks critical informational value, with substantial missing data and inaccuracies in reported variables. Notably, significant risks associated with AI/ML devices, particularly those related to the training and validation data of the models, are not captured in the reports. The authors suggest two sets of recommendations: one aimed at improving the relevance of the MAUDE database for AI/ML devices and another focused on developing a more transparent postmarket surveillance framework that transcends the limitations of individual event-based reporting.

Methods

In this study, data were sourced from the FDA’s Manufacturer and User Facility Device Experience (MAUDE) database, focusing on medical devices approved between 2010 and 2023. The analysis included Class I and II Premarket Notification (510(k)) devices, as well as De Novo classification requests, which together account for over 98% of FDA device market authorizations during this period. The dataset specifically targeted artificial intelligence and machine learning (AI/ML) devices, capturing 823 unique 510(k)-cleared devices linked to 943 adverse events reported through the FDA’s Medical Device Reporting (MDR) system.

The research tracked 54 features related to the adverse events and device manufacturers, including event type, occurrence setting, and manufacturer details. Among the 943 linked MDRs, there were 20 unique medical device product codes, which serve as three-letter identifiers assigned by the FDA to classify and monitor medical devices. The study provides comprehensive links to the complete dataset and the STATA code for reproducibility, facilitating further analysis of the safety and performance of these medical devices.

Results

In this section, the authors present findings from their analysis of adverse event reports related to AI/ML medical devices, utilizing data from the FDA’s MAUDE database. The dataset includes 823 unique 510(k)-cleared devices linked to 943 adverse events reported between 2010 and 2023. Notably, a significant concentration of these reports arises from only two devices: Biomerieux’s Mass Spectrometry Microbial Identification System (product code PEX) and DarioHealth’s Dario Blood Glucose Monitoring System (product code NBW). The analysis reveals that over 98% of adverse events associated with AI/ML devices are attributed to fewer than five devices, contrasting with approximately 85% for non-AI/ML devices. Furthermore, 90.88% of AI/ML device reports are categorized as malfunctions, compared to 77.05% for non-AI/ML devices, indicating a particularly high concentration of issues within AI/ML technologies.

The authors emphasize that the majority of adverse events for the Mass Spectrometry Microbial Identification System involve misidentifications of microorganisms, which can pose serious risks to patient safety. However, the severity of these issues is difficult to assess due to limitations in the regulatory data. Similarly, for the Dario Blood Glucose Monitoring System, reports primarily concern incorrect glucose readings, some of which may be false positives. The authors caution against drawing broad conclusions about the performance of these devices based solely on their representation in the database, suggesting that diligent reporting practices by manufacturers may contribute to their overrepresentation. They argue that the current adverse event reporting system is inadequate for effectively monitoring AI/ML devices, highlighting the need for improvements to better capture and address safety concerns specific to this technology.

Discussion

The discussion section of the research paper highlights critical issues regarding the MAUDE database, particularly concerning missing data and inadequate event classifications for AI/ML medical devices. The analysis of 943 adverse event reports reveals significant gaps in data completeness, with key variables such as Event Location and Reporter Occupation missing in 100% and 30% of cases, respectively. This lack of information hampers effective safety assessments and complicates the identification of adverse events, especially given the context-sensitive nature of AI/ML device performance. The findings indicate that missing data is more prevalent in AI/ML devices compared to traditional medical devices, which poses challenges for manufacturers and regulators in understanding and addressing safety issues.

Furthermore, the paper critiques the current event classification system, noting a disconnect between reported events and their qualitative descriptions. For instance, a majority of reports are categorized as ‘Malfunction,’ yet many incidents stem from user errors rather than device failures. This misclassification obscures the true nature of device-related risks and complicates accountability. The authors suggest that the existing reporting framework should evolve to capture a broader range of issues relevant to AI/ML devices, including concept drift and covariate shift, which are unique to these technologies. They advocate for improved data collection practices and regulatory reforms to enhance the safety and usability of AI/ML medical devices, emphasizing the need for a more proactive approach to post-market surveillance.