النشر الواقعي لنموذج أساس علم الأمراض المعدل للكشف عن علامات سرطان الرئة Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection

المجلة: Nature Medicine، المجلد: 31، العدد: 9
DOI: https://doi.org/10.1038/s41591-025-03780-x
PMID: https://pubmed.ncbi.nlm.nih.gov/40634781
تاريخ النشر: 2025-07-09
المؤلف: Gabriele Campanella وآخرون
الموضوع الرئيسي: الرياضيّات والتعلم الآلي في التصوير الطبي

نظرة عامة

تناقش هذه الفقرة تطوير وتطبيق العالم الحقيقي لمؤشر حيوي حسابي تم ضبطه بدقة للكشف عن طفرات EGFR في سرطان الغدد الرئوية (LUAD) باستخدام علم الأمراض الرقمي. تسلط الدراسة الضوء على قيود طرق الاختبار الحالية، مثل الفحوصات المعتمدة على PCR، التي، على الرغم من سرعتها، تفتقر إلى دقة تسلسل الجيل التالي وتتطلب عينات نسيج إضافية. لمعالجة هذه التحديات، جمع الباحثون مجموعة بيانات دولية كبيرة من شرائح سرطان الغدد الرئوية الرقمية (N = 8,461) لتعزيز نموذج أساسي للكشف عن مؤشرات EGFR الحيوية. أظهر النموذج تحسينات كبيرة في الأداء، حيث حقق متوسط منطقة تحت المنحنى (AUC) قدره 0.847 في التحقق الداخلي و0.870 في التحقق الخارجي، مع تجربة صامتة مستقبلية أسفرت عن AUC قدره 0.890.

تشير النتائج إلى أن سير العمل المدعوم بالذكاء الاصطناعي يمكن أن يقلل من الحاجة إلى اختبارات جزيئية سريعة بنسبة تصل إلى 43% مع الحفاظ على المعايير السريرية. على الرغم من وجود إرشادات موصى بها لاختبار EGFR لحالات LUAD المتقدمة، إلا أن نسبة كبيرة (24-28%) من الحالات في الولايات المتحدة تظل غير مختبرة، على الأرجح بسبب التحديات التقنية في معالجة العينات. يؤدي هذا الفجوة في الاختبار إلى تلقي العديد من المرضى علاجًا دون المستوى الأمثل لسرطان الغدد الرئوية EGFR-mutated. تؤكد الدراسة على إمكانيات مؤشرات علم الأمراض الحسابية لتعزيز الممارسة السريرية وتحسين نتائج المرضى في إدارة سرطان الرئة.

الطرق

تحدد فقرة “الطرق” في ورقة البحث التصميم التجريبي والتقنيات التحليلية المستخدمة للتحقيق في سؤال البحث. استخدمت الدراسة نهجًا كميًا، يتضمن تحليلات إحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تم اختيار المشاركين بناءً على معايير إدراج محددة، مما يضمن عينة تمثيلية لأهداف الدراسة.

شملت جمع البيانات إجراءات موحدة، بما في ذلك استخدام أدوات موثقة لقياس المتغيرات ذات الصلة. تم إجراء التحليل باستخدام برامج إحصائية مناسبة، مع تحديد مستويات الدلالة عند p < 0.05. سمحت الطرق المستخدمة بإجراء مقارنات وتفسيرات قوية للنتائج، مما ساهم في موثوقية وصلاحية النتائج المقدمة في الدراسة.

النتائج

تظهر نتائج الدراسة أن النموذج المقترح يظهر قدرات تعميم قوية، كما يتضح من أدائه في كل من مجموعات التحقق الداخلية والخارجية. حقق النموذج منطقة تحت المنحنى (AUC) إجمالية قدرها 0.870 على 1,484 شريحة داخلية وAUC قدرها 0.896 على 397 شريحة خارجية، مما يؤكد قوته عبر مجموعات بيانات متنوعة. ومن الجدير بالذكر أن مجموعة MSHS حققت AUC قدره 0.870 (N = 294)، بينما حققت مجموعة TCGA AUC قدره 0.860 (N = 519). تم التحقق من أداء النموذج بشكل أكبر عبر أنواع مختلفة من الماسحات الضوئية، مع مؤشرات ارتباط بيرسون تشير إلى اتساق عالٍ في درجات النموذج: 0.828 (Philips مقابل Aperio)، 0.832 (Philips مقابل Pramana)، و0.935 (Aperio مقابل Pramana).

بالإضافة إلى ذلك، كشفت تحليل العوامل الموجودة في مجموعة TCGA أن النموذج ظل قويًا ضد أنواع مختلفة من العوامل، مع الحفاظ على أداء AUC ثابت عبر تصنيفات مختلفة. ومن المثير للاهتمام، أنه عند استبعاد الشرائح التي تحتوي على أكثر العوامل حدة، تحسن AUC بشكل كبير من 0.860 إلى 0.918، مما يبرز إمكانيات النموذج لتحقيق دقة محسنة في التطبيقات السريرية. بشكل عام، تؤكد هذه النتائج فعالية وموثوقية النموذج المقترح في بيئات وظروف متنوعة.

المناقشة

تسلط فقرة المناقشة في ورقة البحث الضوء على التقدم الكبير الذي تم إحرازه في تطوير والتحقق السريري من EAGLE (تقييم الرئة الجينومي EGFR AI)، وهو مؤشر حيوي حسابي مصمم للتنبؤ بحالة الطفرات في EGFR في سرطان الغدد الرئوية (LUAD) من شرائح خزعة ملونة بالهيماتوكسيليين والإيوزين (H&E). تؤكد الدراسة أنه، على الرغم من أن النماذج السابقة أظهرت دقة عالية في الكشف عن طفرات EGFR باستخدام تقنيات إشراف ضعيف والشبكات العصبية التلافيفية، إلا أن هناك تقدمًا محدودًا في التنفيذ السريري. تم تدريب EAGLE على مجموعة بيانات متنوعة من 5,174 شريحة وتم التحقق منها عبر مؤسسات متعددة، محققة منطقة تحت المنحنى (AUC) قدرها 0.890 في تجربة صامتة في الوقت الحقيقي، مما يدل على قوتها واستعدادها للاستخدام السريري.

تشير النتائج إلى أن EAGLE يمكن أن يقلل بشكل فعال من الاعتماد على الاختبارات السريعة لطفرات EGFR دون المساس بأداء الفحص، مما يسهل سير العمل الجزيئي في الإعدادات السريرية. أظهر النموذج حساسية عالية (0.918) ونوعية (0.993) عند مقارنته باختبار Idylla السريع، مع إمكانية تحسين رعاية المرضى من خلال تسهيل اتخاذ قرارات علاجية أسرع. علاوة على ذلك، تؤكد الدراسة على أهمية دمج علماء الأمراض في سير العمل، حيث يمكن أن تسهم خبرتهم في تحسين أداء النموذج، خاصة في الحالات الصعبة مثل تلك التي تحتوي على بنية ورمية ضئيلة. بشكل عام، تؤسس الأبحاث معيارًا حاسمًا للتطبيق السريري للذكاء الاصطناعي في علم الأمراض، داعيةً إلى دور EAGLE كأداة فحص مكملة في بروتوكولات الاختبار الجيني الحالية.

Journal: Nature Medicine, Volume: 31, Issue: 9
DOI: https://doi.org/10.1038/s41591-025-03780-x
PMID: https://pubmed.ncbi.nlm.nih.gov/40634781
Publication Date: 2025-07-09
Author(s): Gabriele Campanella et al.
Primary Topic: Radiomics and Machine Learning in Medical Imaging

Overview

This section discusses the development and real-world application of a fine-tuned computational biomarker for detecting EGFR mutations in lung adenocarcinoma (LUAD) using digital histopathology. The study highlights the limitations of current testing methods, such as PCR-based assays, which, while rapid, lack the accuracy of next-generation sequencing and require additional tissue samples. To address these challenges, the researchers compiled a substantial international dataset of digital lung adenocarcinoma slides (N = 8,461) to enhance a foundation model for EGFR biomarker detection. The model demonstrated significant improvements in performance, achieving a mean area under the curve (AUC) of 0.847 in internal validation and 0.870 in external validation, with a prospective silent trial yielding an AUC of 0.890.

The findings indicate that the artificial intelligence-assisted workflow can reduce the need for rapid molecular tests by up to 43% while maintaining clinical standards. Despite established guidelines recommending EGFR testing for advanced-stage LUAD, a significant proportion (24-28%) of cases in the USA remain untested, likely due to technical challenges in sample processing. This gap in testing leads to many patients receiving suboptimal treatment for EGFR-mutated LUAD. The study underscores the potential of computational pathology biomarkers to enhance clinical practice and improve patient outcomes in lung cancer management.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research question. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected from various experiments. Participants were selected based on specific inclusion criteria, ensuring a representative sample for the study’s objectives.

Data collection involved standardized procedures, including the use of validated instruments to measure the relevant variables. The analysis was conducted using appropriate statistical software, with significance levels set at p < 0.05. The methods employed allowed for robust comparisons and interpretations of the results, ultimately contributing to the reliability and validity of the findings presented in the study.

Results

The results of the study demonstrate that the proposed model exhibits strong generalization capabilities, as evidenced by its performance on both internal and external validation cohorts. The model achieved an overall area under the curve (AUC) of 0.870 on 1,484 internal slides and an AUC of 0.896 on 397 external slides, confirming its robustness across diverse datasets. Notably, the MSHS cohort yielded an AUC of 0.870 (N = 294), while the TCGA cohort achieved an AUC of 0.860 (N = 519). The model’s performance was further validated across different scanner types, with Pearson correlation coefficients indicating high consistency in model scores: 0.828 (Philips vs. Aperio), 0.832 (Philips vs. Pramana), and 0.935 (Aperio vs. Pramana).

Additionally, the analysis of artifacts present in the TCGA cohort revealed that the model remained robust against various types of artifacts, maintaining stable AUC performance across different stratifications. Interestingly, when slides containing the most severe artifacts were excluded, the AUC improved significantly from 0.860 to 0.918, underscoring the model’s potential for enhanced accuracy in clinical applications. Overall, these findings affirm the efficacy and reliability of the proposed model in diverse settings and conditions.

Discussion

The discussion section of the research paper highlights the significant advancements made in the development and clinical validation of EAGLE (EGFR AI Genomic Lung Evaluation), a computational biomarker designed to predict EGFR mutational status in lung adenocarcinoma (LUAD) from hematoxylin and eosin (H&E) stained biopsy slides. The study emphasizes that, despite previous models demonstrating high accuracy in detecting EGFR mutations using weakly supervised techniques and convolutional neural networks, there has been limited progress in clinical implementation. EAGLE was trained on a diverse dataset of 5,174 slides and validated across multiple institutions, achieving an area under the curve (AUC) of 0.890 in a real-time silent trial, indicating its robustness and readiness for clinical use.

The findings suggest that EAGLE can effectively reduce the reliance on rapid testing for EGFR mutations without compromising screening performance, thus streamlining the molecular workflow in clinical settings. The model demonstrated high sensitivity (0.918) and specificity (0.993) when compared to the Idylla rapid test, with the potential to enhance patient care by facilitating quicker treatment decisions. Furthermore, the study underscores the importance of integrating pathologists into the workflow, as their expertise could further refine model performance, particularly in challenging cases such as those with minimal tumor architecture. Overall, the research establishes a critical benchmark for the clinical application of AI in pathology, advocating for EAGLE’s role as a complementary screening tool in existing genomic testing protocols.