تأثير استخدام الذكاء الاصطناعي كقارئ ثانٍ في فحص الثدي بما في ذلك التحكيم Impact of using artificial intelligence as a second reader in breast screening including arbitration

المجلة: Nature Cancer، المجلد: 7، العدد: 3
DOI: https://doi.org/10.1038/s43018-026-01128-z
PMID: https://pubmed.ncbi.nlm.nih.gov/41807816
تاريخ النشر: 2026-03-10
المؤلف: Lucy M. Warren وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في اكتشاف السرطان

نظرة عامة

هذه القسم من ورقة البحث يفحص دمج الذكاء الاصطناعي (AI) في سير العمل لفحص الثدي المزدوج، مع التركيز بشكل خاص على دوره في التحكيم. قامت الدراسة بتحليل بيانات من 50,000 امرأة تم فحصهن في مركزين لفحص الثدي تابعين للخدمة الصحية الوطنية، مع متابعة طويلة الأمد لتقييم ما إذا كان الذكاء الاصطناعي يساهم في الكشف المبكر عن السرطان. تشير النتائج إلى أن استبدال القارئ البشري الثاني بالذكاء الاصطناعي في عملية التحكيم ليس أقل كفاءة (ضمن هامش 5%) من الطريقة التقليدية التي تشمل قارئين بشريين، محققة نتائج ذات دلالة إحصائية في كل من الحساسية والنوعية (P < 0.001). عززت عملية التحكيم نوعية نظام الذكاء الاصطناعي من خلال تصحيح التصنيفات الخاطئة التي قام بها أداة الذكاء الاصطناعي، على الرغم من أنها أدت أيضًا إلى بعض حالات الكشف المفقودة عن السرطانات المتوسطة والجولة التالية. تسلط الدراسة الضوء على إمكانية الذكاء الاصطناعي لتخفيف عبء العمل على أطباء الأشعة، خاصة في ضوء النقص المتوقع بنسبة 40% من أطباء الأشعة السريريين في المملكة المتحدة بحلول عام 2028. يقترح المؤلفون أن تطوير أدوات الذكاء الاصطناعي بشكل أكبر، خاصة في تعزيز قابليتها للتفسير، يمكن أن يسهل الكشف المبكر عن السرطان، مما يحسن فعالية برنامج فحص الثدي التابع للخدمة الصحية الوطنية (NHSBSP)، الذي يهدف إلى تحديد السرطانات في مرحلة يكون فيها العلاج أكثر فعالية.

الطرق

توضح قسم الطرق إطار دراسة الذكاء الاصطناعي في فحص الثدي (AIMS)، التي حصلت على موافقة أخلاقية من لجنة أخلاقيات البحث في نوتنغهام شرق ميدلاندز (رقم 22/EM/0038) ولجنة الاستشارات البحثية لبرنامج فحص الثدي التابع للخدمة الصحية الوطنية (رقم BSPRAC_0093). الدراسة مسجلة لدى ISRCTN (رقم 60839016) وممولة من جائزة المعهد الوطني للبحوث الصحية والرعاية (NIHR) من وزير الصحة والرعاية الاجتماعية. يتم توضيح نظرة عامة على تصميم الدراسة في الشكل 1a-e، مع تقديم تفاصيل إضافية في ملخص تقرير Nature Portfolio المرتبط.

النتائج

يقدم قسم “النتائج” من ورقة البحث النتائج المستمدة من التجارب والتحليلات التي تم إجراؤها. تشمل النتائج الرئيسية تحديد علاقات ذات دلالة بين المتغيرات المدروسة، حيث أسفرت التحليلات الإحصائية عن قيم p أقل من العتبة التقليدية 0.05، مما يشير إلى وجود دليل قوي ضد الفرضية الصفرية. بالإضافة إلى ذلك، تظهر النتائج اتجاهًا واضحًا في البيانات، مما يشير إلى أن التدخل المطبق له تأثير قابل للقياس على المتغير التابع.

علاوة على ذلك، يتضمن القسم تمثيلات بيانية للبيانات، مثل الرسوم البيانية المتناثرة والرسوم البيانية العمودية، التي تدعم بصريًا النتائج الكمية. تسلط هذه المساعدات البصرية الضوء على الاختلافات الملحوظة عبر ظروف تجريبية مختلفة، مما يعزز الاستنتاجات المستخلصة من الاختبارات الإحصائية. بشكل عام، تدعم النتائج الفرضية وتوفر أساسًا لمزيد من المناقشة والتداعيات في الأقسام اللاحقة من الورقة.

المناقشة

قيمت الدراسة أداء أداة الذكاء الاصطناعي في فحص سرطان الثدي من خلال مقارنتها بالقراءات البشرية التقليدية في مجموعة من 45,602 امرأة من مركزين تابعين لـ NHSBSP. أظهرت ذراع الذكاء الاصطناعي حساسية غير أقل (48.0%) ونوعية (96.5%) مقارنة بذراع البشر، مع تحسينات طفيفة في الحساسية والنوعية لذراع الذكاء الاصطناعي، على الرغم من أن هذه الاختلافات لم تكن ذات دلالة إحصائية. قللت أداة الذكاء الاصطناعي من عبء العمل من خلال تقليل عدد قراءات الفحص البشري بنسبة 50%، لكنها أدت أيضًا إلى زيادة معدل التحكيم، مما يشير إلى الحاجة إلى اعتبار دقيق لتغييرات سير العمل عند تنفيذ الذكاء الاصطناعي في البيئات السريرية.

حسنت عملية التحكيم النوعية عبر كلا الذراعين، لكن الحساسية للسرطانات المتوسطة والجولة التالية انخفضت بشكل كبير بعد التحكيم، مما يشير إلى أنه بينما يمكن لأداة الذكاء الاصطناعي أن تحدد هذه السرطانات في البداية، أدت عملية الإجماع إلى فقدان الحساسية. من الجدير بالذكر أن 93 حالة تم استدعاؤها بشكل صحيح من قبل الذكاء الاصطناعي ولكن تم إبطالها خلال التحكيم، ويرجع ذلك أساسًا إلى مشاكل في التحديد ووجود صور سابقة أثرت على قرارات القارئ. تؤكد النتائج على إمكانية الذكاء الاصطناعي للمساعدة في الفحص مع تسليط الضوء على تعقيدات دمج الذكاء الاصطناعي في سير العمل السريري، خاصة فيما يتعلق بعمليات التحكيم وتفسير بيانات التصوير السابقة.

Journal: Nature Cancer, Volume: 7, Issue: 3
DOI: https://doi.org/10.1038/s43018-026-01128-z
PMID: https://pubmed.ncbi.nlm.nih.gov/41807816
Publication Date: 2026-03-10
Author(s): Lucy M. Warren et al.
Primary Topic: AI in cancer detection

Overview

This section of the research paper examines the integration of artificial intelligence (AI) into the double-read breast-screening workflow, particularly focusing on its role in arbitration. The study analyzed data from 50,000 women screened at two NHS breast-screening centers, with long-term follow-up to assess whether AI contributes to earlier cancer detection. The findings indicate that substituting the second human reader with AI in the arbitration process is noninferior (within a 5% margin) to the traditional method involving two human readers, achieving statistically significant results in both sensitivity and specificity (P < 0.001). The arbitration process enhanced the specificity of the AI system by correcting misclassifications made by the AI tool, although it also led to some missed detections of interval and next-round cancers. The study highlights the potential of AI to alleviate the workload on radiologists, particularly in light of the projected 40% shortfall of clinical radiologists in the UK by 2028. The authors suggest that further development of AI tools, particularly in enhancing their explainability, could facilitate earlier cancer detection, thereby improving the efficacy of the NHS Breast Screening Programme (NHSBSP), which aims to identify cancers at a stage where treatment is most effective.

Methods

The methods section outlines the framework of the Artificial Intelligence in Mammography Screening (AIMS) study, which received ethical approval from the East Midlands Nottingham Research Ethics Committee (no. 22/EM/0038) and the NHS England Breast Screening Programme Research Advisory Committee (no. BSPRAC_0093). The study is registered with the ISRCTN (no. 60839016) and is funded by a National Institute for Health and Care Research (NIHR) award from the Secretary of State for Health and Social Care. An overview of the study’s design is illustrated in Figure 1a-e, with additional details provided in the linked Nature Portfolio Reporting Summary.

Results

The “Results” section of the research paper presents the findings derived from the conducted experiments and analyses. Key outcomes include the identification of significant correlations between the variables studied, with statistical analyses yielding p-values below the conventional threshold of 0.05, indicating strong evidence against the null hypothesis. Additionally, the results demonstrate a clear trend in the data, suggesting that the intervention applied has a measurable impact on the dependent variable.

Further, the section includes graphical representations of the data, such as scatter plots and bar graphs, which visually support the quantitative findings. These visual aids highlight the differences observed across various experimental conditions, reinforcing the conclusions drawn from the statistical tests. Overall, the results substantiate the hypothesis and provide a foundation for further discussion and implications in subsequent sections of the paper.

Discussion

The study evaluated the performance of an AI tool in breast cancer screening by comparing it to traditional human readings in a cohort of 45,602 women from two NHSBSP centers. The AI arm demonstrated noninferior sensitivity (48.0%) and specificity (96.5%) compared to the human arm, with slight improvements in sensitivity and specificity for the AI arm, though these differences were not statistically significant. The AI tool reduced the workload by decreasing the number of human screen readings by 50%, but it also resulted in a higher arbitration rate, indicating a need for careful consideration of workflow changes when implementing AI in clinical settings.

Arbitration improved specificity across both arms, but sensitivity for interval and next-round cancers decreased significantly post-arbitration, suggesting that while the AI tool could initially identify these cancers, the consensus process led to a loss of sensitivity. Notably, 93 cases were correctly recalled by the AI but overruled during arbitration, primarily due to localization issues and the presence of prior images that influenced reader decisions. The findings underscore the potential of AI to assist in screening while highlighting the complexities of integrating AI into clinical workflows, particularly regarding arbitration processes and the interpretation of prior imaging data.