العوامل المربكة والانحيازات تتزايد عند التنبؤ بالعلامات الحيوية الجزيئية من الصور النسيجية Confounding factors and biases abound when predicting molecular biomarkers from histological images

المجلة: Nature Biomedical Engineering
DOI: https://doi.org/10.1038/s41551-026-01616-8
PMID: https://pubmed.ncbi.nlm.nih.gov/41772176
تاريخ النشر: 2026-03-02
المؤلف: Muhammad Dawood وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في اكتشاف السرطان

طرق

قسم “الطرق” في ورقة البحث يحدد تصميم التجارب والتقنيات التحليلية المستخدمة للتحقيق في أسئلة البحث. استخدمت الدراسة نهجًا كميًا، يتضمن تحليلات إحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب مختبرية محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لملاحظة تأثيراتها على النتائج ذات الصلة.

شمل جمع البيانات استخدام أدوات وبروتوكولات موحدة لضمان الموثوقية والصلاحية. تم إجراء التحليل باستخدام أدوات برمجية قادرة على إجراء اختبارات إحصائية معقدة، مثل تحليل الانحدار وANOVA، لتحديد الفروق والعلاقات المهمة بين المتغيرات. يبرز القسم أهمية القابلية للتكرار والشفافية في عملية البحث، موضحًا الخطوات المتخذة لتقليل التحيز وتعزيز قوة النتائج.

نتائج

قسم “النتائج” في ورقة البحث يقدم النتائج الرئيسية المستمدة من التجارب أو التحليلات التي تم إجراؤها. يحدد النتائج بشكل منهجي، مع التركيز على النقاط والاتجاهات البيانية المهمة التي لوحظت خلال الدراسة. غالبًا ما تكون النتائج مصحوبة بتحليلات إحصائية ذات صلة، والتي قد تشمل قيم p، وفترات الثقة، أو أحجام التأثير، للتحقق من النتائج.

بالإضافة إلى ذلك، قد يتضمن القسم تمثيلات بصرية مثل الرسوم البيانية أو الجداول لتوضيح البيانات بوضوح. تساعد هذه الوسائل البصرية في تعزيز فهم القارئ للنتائج وتسليط الضوء على العلاقات أو الفروق بين المتغيرات قيد التحقيق. بشكل عام، تسهم النتائج في السياق الأوسع لسؤال البحث، مما يوفر رؤى قد تفيد الدراسات المستقبلية أو التطبيقات العملية.

مناقشة

في هذه الدراسة، قمنا بفحص نقدي لقيود نماذج التعلم الآلي (ML) في توقع العلامات الحيوية الجزيئية من صور الشرائح الكاملة الملونة بصبغة الهيماتوكسيليين والإيوزين (H&E). كشفت تحليلاتنا لـ 8,221 مريضًا عبر عدة مجموعات سرطانية عن اعتمادات متبادلة كبيرة بين العلامات الحيوية، مما يشير إلى أن الأساليب الحالية للتعلم الآلي قد تخلط بين تأثيرات عوامل متعددة مرتبطة بدلاً من عزل إشارات علامات حيوية محددة. على سبيل المثال، انخفض أداء متنبئ مستقبلات الإستروجين (ER) بشكل كبير في المجموعات الفرعية المحددة بواسطة حالة طفرة TP53، مما يدل على أن توقعات النموذج تتأثر بحالة علامات حيوية أخرى، مما قد يؤدي إلى تفسيرات سريرية متحيزة.

علاوة على ذلك، وجدنا أن دقة التنبؤ لنماذج التعلم الآلي تتأثر بشدة بالمتغيرات المربكة مثل درجة الورم وعبء الطفرات الورمية (TMB). أظهرت تحليلات التقسيم أن الدقة العالية الظاهرة لهذه النماذج غالبًا ما تخفي عدم قدرتها على التعميم عبر مجموعات مختلفة، حيث يمكن أن تؤدي التغيرات في ارتباطات العلامات الحيوية إلى انخفاضات كبيرة في الأداء. على سبيل المثال، انخفضت AUROC لتوقع طفرات TP53 بشكل كبير عند تقسيمها حسب درجة الورم، مما يبرز اعتماد النماذج على الشكل المرتبط بالدرجة بدلاً من الميزات المحددة للعلامات الحيوية الحقيقية. تؤكد هذه النتائج على ضرورة إجراء تقييمات واعية للتحيز في نماذج التعلم الآلي لضمان فائدتها السريرية، حيث إنها حاليًا لا تحل محل الاختبارات الجينومية في الرعاية الروتينية.

Journal: Nature Biomedical Engineering
DOI: https://doi.org/10.1038/s41551-026-01616-8
PMID: https://pubmed.ncbi.nlm.nih.gov/41772176
Publication Date: 2026-03-02
Author(s): Muhammad Dawood et al.
Primary Topic: AI in cancer detection

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research questions. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled laboratory experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved the use of standardized instruments and protocols to ensure reliability and validity. The analysis was conducted using software tools capable of performing complex statistical tests, such as regression analysis and ANOVA, to determine significant differences and relationships among the variables. The section emphasizes the importance of replicability and transparency in the research process, detailing the steps taken to minimize bias and enhance the robustness of the findings.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments or analyses. It systematically outlines the outcomes, emphasizing significant data points and trends observed during the study. The results are often accompanied by relevant statistical analyses, which may include p-values, confidence intervals, or effect sizes, to validate the findings.

Additionally, the section may include visual representations such as graphs or tables to illustrate the data clearly. These visual aids serve to enhance the reader’s understanding of the results and highlight the relationships or differences between variables under investigation. Overall, the findings contribute to the broader context of the research question, providing insights that may inform future studies or practical applications.

Discussion

In this study, we critically examined the limitations of machine learning (ML) models in predicting molecular biomarkers from hematoxylin and eosin (H&E) stained whole slide images (WSIs). Our analysis of 8,221 patients across multiple cancer cohorts revealed significant interdependencies among biomarkers, suggesting that current ML approaches may conflate the effects of multiple interrelated factors rather than isolating specific biomarker signals. For instance, the performance of the estrogen receptor (ER) predictor dropped significantly in subgroups defined by TP53 mutation status, indicating that the model’s predictions are influenced by the status of other biomarkers, which could lead to biased clinical interpretations.

Furthermore, we found that the predictive accuracy of ML models is heavily affected by confounding variables such as tumor grade and tumor mutational burden (TMB). Stratification analyses demonstrated that the apparent high accuracy of these models often masks their inability to generalize across different cohorts, as shifts in biomarker associations can lead to substantial performance declines. For example, the AUROC for predicting TP53 mutations decreased significantly when stratified by tumor grade, highlighting the models’ reliance on grade-associated morphology rather than true biomarker-specific features. These findings underscore the necessity for bias-aware evaluations in ML models to ensure their clinical utility, as they currently do not replace genomic testing in routine care.

كلمات مفتاحية: اكتشاف العلامات الحيوية، التداخل، الفرز، المؤشرات الجزيئية، سرطان، علامة حيوية