إطار عمل قائم على التعلم العميق والمحولات البصرية لتحديد سرطان الثدي وأنواعه Deep learning and vision transformers-based framework for breast cancer and subtype identification

المجلة: Neural Computing and Applications، المجلد: 37، العدد: 16
DOI: https://doi.org/10.1007/s00521-025-10984-2
تاريخ النشر: 2025-01-29
المؤلف: Ishrat Jahan وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في اكتشاف السرطان

نظرة عامة

تقدم هذه البحث إطار عمل متقدم للتعلم العميق للكشف التلقائي عن سرطان الثدي وأنواعه الفرعية باستخدام صور الشرائح الكاملة الملونة بالهيماتوكسيلين والإيوزين (H&E). يتكون الإطار من ثلاثة مكونات رئيسية: مصنف بقع سرطانية، مصنف نوع السرطان، ومصنف على مستوى الشرائح الكاملة. تستخدم الدراسة تصور Score-CAM لتعزيز القابلية للتفسير من خلال تسليط الضوء على المناطق التي تؤثر على توقعات النموذج، مما يتماشى مع تقييمات أطباء الأمراض ويسهل تحليل الأخطاء. تتضمن مجموعة البيانات 111 شريحة كاملة، مع 85 حالة خبيثة و26 حالة حميدة، وتتميز بمجموع 28,428 بقعة مشروحة تمت مراجعتها من قبل أطباء الأمراض الخبراء.

من بين النماذج المختلفة للتعلم العميق التي تم اختبارها، حقق النموذج القائم على Vision Transformer (ViT) أعلى دقة، حيث بلغت 96.74% للكشف عن البقع السرطانية و89.78% لتصنيف الأنواع الفرعية. أسفرت تقنية التصويت بالأغلبية لتصنيف الشرائح الكاملة عن درجة F1 بلغت 99.06% لتصنيف السرطان بشكل عام و96.13% لتصنيف الأنواع الفرعية. تسلط الدراسة الضوء على إمكانيات هذا الإطار لتحسين دقة التشخيص في البيئات السريرية، على الرغم من أنها تعترف بالحاجة إلى مزيد من التحقق عبر بيئات متنوعة ومجموعات بيانات أكبر. كما تشير الأبحاث إلى التحديات المتعلقة بعدد حالات سرطان الغدد اللبنية المحدودة، مما يقترح أن تعزيز البيانات يمكن أن يحسن أداء النموذج. في النهاية، يهدف الإطار إلى تسهيل أساليب العلاج الشخصية، مما يحسن نتائج المرضى في رعاية سرطان الثدي.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الطبيعة الحرجة لسرطان الثدي، وهو سبب رئيسي للوفيات يتميز بتكاثر الخلايا غير المنضبط مما يؤدي إلى أورام يمكن أن تغزو الأنسجة المحيطة. توضح المقدمة نوعين رئيسيين من أورام الثدي—حميدة وخبيثة—حيث تعتبر الأورام الخبيثة مثيرة للقلق بشكل خاص بسبب نموها السريع وإمكانية انتشارها. تؤكد الورقة على أهمية الكشف المبكر، مشيرة إلى أن العلاج الفعال ممكن عند التعرف على سرطان الثدي في مراحله الأولية. يتم استخدام تقنيات تصوير متنوعة، بما في ذلك تصوير الثدي بالأشعة السينية والموجات فوق الصوتية، للكشف، لكن الاعتماد على التحليل النسيجي يبقى المعيار الذهبي للتشخيص.

تناقش المقدمة أيضًا التحديات التي تواجهها في المشهد التشخيصي الحالي، وخاصة نقص أطباء الأمراض والاختلاف في دقة التشخيص. لمعالجة هذه القضايا، تستعرض الورقة التقدمات الأخيرة في تقنيات التعلم الآلي لتصنيف سرطان الثدي بناءً على الصور النسيجية. تبرز العديد من الدراسات التي حاولت تحسين دقة التصنيف من خلال نماذج ومنهجيات مبتكرة، ومع ذلك لا يزال العديد منها يعاني من نقص في القابلية للتفسير والتشخيص الشامل على مستوى المرضى. يقترح المؤلفون إطار عمل جديد للتعلم العميق يهدف إلى تصنيف الأنسجة الحميدة والخبيثة على مستويات البقع والشرائح الكاملة، مع دمج تقنيات التصور المتقدمة لتعزيز القابلية للتفسير والتحقق من قرارات النموذج. تشمل المساهمات الرئيسية نتائج تجريبية شاملة، واستخدام مستويات تكبير أقل لتقليل التكاليف الحاسوبية، وتطبيق Score-CAM لتحسين شفافية النموذج.

الطرق

تتركز المنهجية الموضحة في هذا البحث على بنية تعلم عميق لتصنيف البقع السرطانية، مع استخدام تقسيم البيانات حسب الموضوع من خلال التحقق المتقاطع بخمس طيات. يخصص هذا النهج 80% من البيانات للتدريب، و20% للاختبار، ويحتفظ بـ10% من بيانات التدريب للتحقق، مما يمنع تسرب البيانات ويضمن أن تقييم النموذج يعتمد على بيانات غير مرئية تمامًا. تضمنت مجموعة التدريب بقعًا من 80 شريحة كاملة، بينما تضمنت مجموعات الاختبار والتحقق بقعًا من 22 و9 شرائح كاملة، على التوالي. بالنسبة لتصنيف الأنواع الفرعية، تم تطبيق نفس الاستراتيجية حسب الموضوع، مما أسفر عن 41 شريحة كاملة من سرطان القنوات و5 من سرطان الغدد اللبنية في مجموعة التدريب، مع اختيار بقع بحجم 3000 × 3000 بكسل عند مستوى تكبير 10x من مناطق معينة من الاهتمام (ROIs) لالتقاط الميزات النسيجية الحرجة.

للتنبؤ بسرطان على مستوى المرضى، تم تنفيذ آلية تصويت بالأغلبية مع عتبة أقل من 1%، مما يضمن حساسية عالية في اكتشاف الخباثة. على العكس، تم استخدام عتبة 50% لتصنيف الأنواع الفرعية، مما يسمح بالتعرف المتوازن على سرطانات القنوات والغدد اللبنية. تم تدريب النماذج مسبقًا على ImageNet، مع تغيير حجم صور البقع إلى 224 × 224 بكسل وتطبيعها. شمل عملية التدريب 150 دورة، وحجم دفعة 16، واستخدمت مُحسِّن Adam مع معدل تعلم قدره 0.0001. تضمنت الإعدادات الحاسوبية وحدتين معالجة رسومية ونظام معالجة قوي، مع إجراء جميع التجارب في Visual Studio Code. تهدف هذه المنهجية الشاملة إلى تعزيز دقة وموثوقية الكشف عن السرطان وتصنيف الأنواع الفرعية.

النتائج

يقدم قسم “النتائج” في الورقة البحثية النتائج الرئيسية المستمدة من التجارب أو التحليلات التي تم إجراؤها. عادةً ما يتضمن بيانات كمية، وتحليلات إحصائية، وتمثيلات بصرية مثل الرسوم البيانية أو الجداول لتوضيح النتائج. غالبًا ما تتم مقارنة النتائج مع الفرضيات أو الدراسات السابقة لتسليط الضوء على الفروق أو التأكيدات المهمة.

في هذا القسم، قد يقوم المؤلفون بالإبلاغ عن مقاييس محددة، مثل القيم المتوسطة، والانحرافات المعيارية، وقيم p، لدعم ادعاءاتهم. بالإضافة إلى ذلك، يتم مناقشة أي اتجاهات أو أنماط ملحوظة، مما يوفر رؤى حول تداعيات النتائج. بشكل عام، تخدم النتائج للتحقق من أهداف البحث وتساهم في الفهم الأوسع للموضوع قيد التحقيق.

المناقشة

في هذه الدراسة، تم استخدام مجموعة بيانات من مسابقة تقييم Her-2/neu، والتي تتكون من 172 شريحة كاملة من حالات سرطان الثدي الغازي. بعد استبعاد شريحة واحدة تالفة، ركز التحليل على 111 شريحة كاملة، بما في ذلك 57 حالة من سرطان القنوات الغازي (IDC) و8 حالات من سرطان الغدد اللبنية الغازي (ILC). تم معالجة الشرائح الكاملة إلى بقع أصغر، مع استخراج وتصنيف إجمالي 28,428 بقعة على أنها حميدة أو خبيثة أو مهملة. كانت الدراسة تهدف إلى معالجة عدم توازن الفئات من خلال تقنيات تعزيز البيانات، مما يضمن مجموعة تدريب متوازنة لنماذج التعلم العميق.

شمل الإطار المقترح للتنبؤ بنوع السرطان ثلاث مراحل رئيسية: مصنف بقع سرطانية، مصنف نوع السرطان، ومصنف على مستوى الشرائح الكاملة، معززًا بـ Score-CAM من أجل القابلية للتفسير. ميز مصنف البقع السرطانية بين البقع الحميدة والخبيثة، بينما ركز مصنف الأنواع الفرعية على التمييز بين IDC وILC. تم تقييم النماذج، بما في ذلك DenseNet-201 وMobileNetV2 وVision Transformers (ViTs)، بناءً على مقاييس أداء مختلفة. تفوق نموذج ViT على الآخرين، محققًا أعلى دقة بلغت 96.74%، مما يدل على فعاليته في التقاط الأنماط المعقدة في الصور الطبية. تؤكد الدراسة على أهمية القابلية للتفسير في نماذج التعلم العميق، حيث يوفر Score-CAM رؤى قيمة حول عملية اتخاذ القرار، مما يعزز الثقة في الأنظمة التشخيصية الآلية.

Journal: Neural Computing and Applications, Volume: 37, Issue: 16
DOI: https://doi.org/10.1007/s00521-025-10984-2
Publication Date: 2025-01-29
Author(s): Ishrat Jahan et al.
Primary Topic: AI in cancer detection

Overview

This research presents an advanced deep learning framework for the automatic detection of breast cancer and its subtypes using hematoxylin and eosin (H&E) stained whole slide images (WSIs). The framework consists of three main components: a cancerous patch classifier, a cancer subtype classifier, and a WSI-level classifier. The study employs Score-CAM visualization to enhance interpretability by highlighting regions that influence model predictions, thereby aligning with pathologists’ assessments and facilitating error analysis. The dataset includes 111 WSIs, with 85 malignant and 26 benign cases, and features a total of 28,428 annotated patches reviewed by expert pathologists.

Among the various deep learning models tested, the Vision Transformer (ViT)-based model achieved the highest accuracy, with 96.74% for cancerous patch detection and 89.78% for subtype classification. The majority voting technique for WSI classification yielded an F1-score of 99.06% for overall cancer classification and 96.13% for subtype classification. The study highlights the potential of this framework to improve diagnostic accuracy in clinical settings, though it acknowledges the need for further validation across diverse environments and larger datasets. The research also notes challenges related to the limited number of lobular carcinoma cases, suggesting that data augmentation could enhance model performance. Ultimately, the framework aims to facilitate personalized treatment approaches, thereby improving patient outcomes in breast cancer care.

Introduction

The introduction of this research paper highlights the critical nature of breast cancer, a leading cause of mortality characterized by uncontrolled cell proliferation resulting in tumors that can invade surrounding tissues. It outlines the two primary types of breast tumors—benign and malignant—with malignant tumors being particularly concerning due to their rapid growth and potential to metastasize. The paper emphasizes the importance of early detection, noting that effective treatment is possible when breast cancer is identified in its initial stages. Various imaging techniques, including mammography and ultrasound, are employed for detection, but the reliance on histopathological analysis remains the gold standard for diagnosis.

The introduction also discusses the challenges faced in the current diagnostic landscape, particularly the shortage of pathologists and the variability in diagnostic accuracy. To address these issues, the paper reviews recent advancements in machine learning techniques for breast cancer classification based on histopathological images. It highlights several studies that have attempted to improve classification accuracy through innovative models and methodologies, yet many still fall short in terms of interpretability and comprehensive patient-level diagnosis. The authors propose a novel deep-learning framework that aims to classify benign and malignant tissues at both patch and whole-slide levels, incorporating advanced visualization techniques to enhance interpretability and validation of the model’s decisions. Key contributions include comprehensive experimental results, the use of lower magnification levels to reduce computational costs, and the application of Score-CAM for improved model transparency.

Methods

The methodology outlined in this research focuses on a deep learning architecture for cancerous patch classification, employing a subject-wise data split through fivefold cross-validation. This approach allocates 80% of the data for training, 20% for testing, and reserves 10% of the training data for validation, effectively preventing data leakage and ensuring that model evaluation is based on entirely unseen data. The training set included patches from 80 whole slide images (WSIs), while the test and validation sets comprised patches from 22 and 9 WSIs, respectively. For subtype classification, the same subject-wise strategy was applied, resulting in 41 WSIs of ductal carcinoma and 5 of lobular carcinoma in the training set, with patches sized 3000 by 3000 pixels at a 10x magnification level selected from specific regions of interest (ROIs) to capture critical histological features.

For patient-level cancer prediction, a majority voting mechanism with a threshold of less than 1% was implemented, ensuring high sensitivity in detecting malignancy. Conversely, a 50% threshold was used for subtype classification, allowing for balanced identification of ductal and lobular carcinomas. The models were pre-trained on ImageNet, with patch images resized to 224 x 224 pixels and normalized. The training process involved 150 epochs, a batch size of 16, and utilized the Adam optimizer with a learning rate of 0.0001. The computational setup included two GPUs and a robust processing system, with all experiments conducted in Visual Studio Code. This comprehensive methodology aims to enhance the accuracy and reliability of cancer detection and subtype classification.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments or analyses. It typically includes quantitative data, statistical analyses, and visual representations such as graphs or tables to illustrate the outcomes. The results are often compared against the hypotheses or previous studies to highlight significant differences or confirmations.

In this section, the authors may report on specific metrics, such as mean values, standard deviations, and p-values, to substantiate their claims. Additionally, any observed trends or patterns are discussed, providing insights into the implications of the findings. Overall, the results serve to validate the research objectives and contribute to the broader understanding of the topic under investigation.

Discussion

In this study, a dataset from the Her-2/neu scoring contest, comprising 172 whole slide images (WSIs) from invasive breast carcinoma cases, was utilized. After excluding one corrupted WSI, the analysis focused on 111 WSIs, including 57 invasive ductal carcinoma (IDC) and 8 invasive lobular carcinoma (ILC) cases. The WSIs were processed into smaller patches, with a total of 28,428 patches extracted and labeled as benign, malignant, or discarded. The study aimed to address class imbalances through data augmentation techniques, ensuring a balanced training set for the deep learning models.

The proposed framework for cancer type prediction involved three key stages: a cancerous patch classifier, a cancer subtype classifier, and a WSI-level classifier, enhanced by Score-CAM for interpretability. The cancerous patch classifier distinguished between benign and malignant patches, while the subtype classifier focused on differentiating IDC and ILC. The models, including DenseNet-201, MobileNetV2, and Vision Transformers (ViTs), were evaluated based on various performance metrics. The ViT model outperformed others, achieving the highest accuracy of 96.74%, demonstrating its effectiveness in capturing complex patterns in medical images. The study emphasizes the importance of interpretability in deep learning models, with Score-CAM providing valuable insights into the decision-making process, thereby enhancing trust in automated diagnostic systems.