التعلم العميق المحسّن لكشف أورام الدماغ: نهج هجين مع آليات الانتباه والشرح السريري Optimized deep learning for brain tumor detection: a hybrid approach with attention mechanisms and clinical explainability

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-04591-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40858650
تاريخ النشر: 2025-08-26
المؤلف: Aditya Jayesh Aiya وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

تقدم البحث نموذج تعلم عميق هجين لتصنيف أورام الدماغ (BTC) باستخدام التصوير بالرنين المغناطيسي (MRI)، مدمجًا VGG16، وآلية انتباه، ومعلمات محسّنة. يصنف هذا النموذج الأورام إلى ورم دبقي، ورم سحائي، ورم في الغدة النخامية، وعدم وجود ورم، محققًا دقة ملحوظة تبلغ 99% على مجموعة بيانات تتكون من 7023 صورة MRI. تستخدم الطريقة تقنيات معالجة مسبقة متقدمة، وتعلم نقل، ورسم خرائط تنشيط الفئة المعتمدة على التدرج (Grad-CAM) لتحسين الأداء وقابلية التفسير. ومن الجدير بالذكر أن النموذج يتفوق على الطرق التقليدية مثل آلات الدعم الناقل (SVM) المدمجة مع هيستوغرام التدرجات الموجهة (HOG)، ونمط ثنائي محلي (LBP)، وتحليل المكونات الرئيسية (PCA)، بينما يلغي أيضًا الحاجة إلى التسمية اليدوية من خلال التعلم من البداية إلى النهاية.

في الخاتمة، يؤكد الدراسة على فعالية النموذج وقابلية تفسيره، محققًا دقة تشخيصية عالية على الرغم من عدم توازن الفئات الطفيف، مع دقة متوسطة، واسترجاع، ودرجة F1 تبلغ 0.99 لكل فئة ورم. أظهرت مصفوفات الالتباس تصنيفات خاطئة قليلة، وأظهرت منحنيات ROC قدرات تمييزية مثالية (AUC = 1.00). تم استخدام تقنيات تنظيم، بما في ذلك زيادة البيانات، والتوقف المبكر، لتعزيز التعميم وتقليل الإفراط في التكيف. استخدمت خرائط حرارة Grad-CAM للتحقق من قابلية تفسير النموذج من خلال تسليط الضوء باستمرار على المناطق ذات الصلة بالورم، وهو أمر حاسم لتعزيز ثقة الأطباء وتسهيل دمج النموذج في عمليات اتخاذ القرار السريرية.

الطرق

تحدد قسم الطرق النهج الشامل المستخدم لتصنيف أورام الدماغ متعددة الفئات (BTC) باستخدام مسحات MRI. يشمل ذلك عدة خطوات حاسمة، بما في ذلك إعداد مجموعة البيانات، والمعالجة المسبقة، وتحليل البيانات الاستكشافية (EDA)، وتصميم هيكل النموذج، وإجراءات التدريب، وضبط المعلمات، ومقاييس التقييم.

لتحسين كل من الدقة وقابلية التفسير للتشخيص السريري، يدمج الدراسة آلية انتباه إلى جانب أدوات التصور وشبكة عصبية تلافيفية (CNN) مدربة مسبقًا. تم بناء هذا الإطار المنهجي لضمان الصرامة، وإمكانية التكرار، والأهمية السريرية، مما يسهل التطبيق الفعال للنموذج في سياق تشخيصي.

النتائج

في هذا القسم، يتم تحليل أداء نموذج التعلم العميق الهجين المقترح لتصنيف أورام الدماغ من صور MRI بشكل شامل. تشمل التقييمات فحص تاريخ تدريب النموذج ودقته في التصنيف على مجموعة بيانات الاختبار. تم تعزيز قابلية تفسير تنبؤات النموذج من خلال استخدام تصورات Grad-CAM، التي توفر رؤى حول عملية اتخاذ القرار للنموذج.

تشير النتائج إلى تحسين كبير في دقة التصنيف مقارنة بالنماذج السابقة، مما يبرز إمكانية تطبيق النموذج المقترح في الإعدادات الطبية لتصنيف أورام الدماغ (BTC). تسلط التحليلات المقارنة الضوء على تفوق نموذج التعلم العميق الهجين، مما يشير إلى فعاليته في تعزيز القدرات التشخيصية في الممارسة السريرية.

المناقشة

في مناقشة التقدمات الأخيرة في اكتشاف أورام الدماغ باستخدام تقنيات التعلم العميق (DL)، تبرز عدة دراسات فعالية نماذج وهياكل مختلفة. اقترح Mathivanan وآخرون شبكة كشف أورام الدماغ (BTDN) الآمنة التي حققت معدلات دقة مثيرة للإعجاب بلغت 99.68%، 98.81%، و95.33% عبر ثلاث مجموعات بيانات من خلال تحسين جودة صورة MRI وضمان نقل البيانات بشكل آمن. بالمثل، استخدم Hun وآخرون مجموعة من النماذج المعدلة بدقة، محققين دقة تبلغ 99.67% و99.39% على مجموعات بيانات مختلفة. تشمل المساهمات الملحوظة الأخرى نموذج GATransformer، الذي يدمج آليات الانتباه لتحسين قابلية التفسير ودقة التصنيف، ونموذج الهجين Vision Transformer (ViT) ووحدة التكرار المغلقة (GRU)، الذي حقق درجة F1 تبلغ 97% على مجموعة بيانات من بنغلاديش.

كما تؤكد الأدبيات على أهمية هيكل النموذج وضبط المعلمات في تحسين الأداء. على سبيل المثال، يتضمن نموذج VGG16، المعدل لتصنيف أورام الدماغ، آلية انتباه لإعطاء الأولوية للميزات التشخيصية، مما يعزز قابلية التفسير. تم استخدام تقنيات مثل تنظيم التسرب وزيادة البيانات لتحسين التعميم والصلابة ضد الإفراط في التكيف. بشكل عام، تؤكد هذه الدراسات مجتمعة على إمكانيات نماذج DL في تعزيز الدقة التشخيصية والفائدة السريرية في اكتشاف أورام الدماغ، بينما تعالج أيضًا التحديات المتعلقة بعدم توازن البيانات وقابلية تفسير النموذج.

القيود

تسلط القيود المفروضة على نموذج التعلم العميق الهجين المقترح لتصنيف أورام الدماغ الضوء على عدة مجالات حاسمة للبحث المستقبلي والتطبيق السريري. أولاً، تم تطوير النموذج باستخدام مجموعة بيانات عامة واحدة، موصوفة بشكل جيد من Kaggle، والتي قد لا تشمل التنوع السريري الكامل الموجود في المؤسسات متعددة المراكز. قد تؤثر الاختلافات في بروتوكولات التصوير، والمعدات، وخصائص المرضى على إمكانية تعميم النموذج، مما يبرز الحاجة إلى التحقق الخارجي عبر مجموعات متنوعة.

بالإضافة إلى ذلك، بينما يدمج النموذج تحسينات مثل آليات الانتباه وتصوير Grad-CAM، لم يتم تحليل هذه المكونات بشكل مستقل من خلال دراسات إلغاء مخصصة. قد تعيق هذه الفجوة في التحليل الدقيق قابلية تفسير المكونات الفردية، والتي قد تكون مفيدة للبحث المستقبلي الذي يهدف إلى تحسين هيكل النموذج. علاوة على ذلك، قد تشكل المتطلبات الحاسوبية لميزات قابلية التفسير تحديات للتوسع في البيئات ذات الموارد المحدودة. على الرغم من أن Grad-CAM يساعد في تحديد المناطق ذات الأهمية التشخيصية، إلا أن دقتها المكانية المنخفضة قد تحد من فعاليتها في التطبيقات التي تتطلب تحديدًا تشريحيًا دقيقًا، مثل التخطيط الجراحي. أخيرًا، قد يوفر تحليل مقارن مع الهياكل المستندة إلى المحولات الناشئة والنماذج الهجينة التي تتضمن نماذج لغوية كبيرة رؤى إضافية. على الرغم من هذه القيود، تظل مساهمات النموذج مهمة، حيث تقدم أساسًا لمزيد من التطوير والتطبيق العملي في الإعدادات السريرية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-04591-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40858650
Publication Date: 2025-08-26
Author(s): Aditya Jayesh Aiya et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

The research presents a hybrid deep learning model for brain tumor classification (BTC) using Magnetic Resonance Imaging (MRI), integrating VGG16, an attention mechanism, and optimized hyperparameters. This model categorizes tumors into glioma, meningioma, pituitary tumor, and no tumor, achieving a remarkable 99% accuracy on a dataset of 7023 MRI images. The approach employs advanced preprocessing techniques, transfer learning, and Gradient-weighted Class Activation Mapping (Grad-CAM) for enhanced performance and interpretability. Notably, the model surpasses traditional methods such as Support Vector Machines (SVM) combined with Histogram of Oriented Gradients (HOG), Local Binary Pattern (LBP), and Principal Component Analysis (PCA), while also eliminating the need for manual labeling through end-to-end learning.

In the conclusion, the study emphasizes the model’s effectiveness and interpretability, achieving high diagnostic accuracy despite mild class imbalance, with average precision, recall, and F1-score of 0.99 for each tumor class. Confusion matrices indicated minimal misclassifications, and ROC curves demonstrated perfect discriminative capabilities (AUC = 1.00). Regularization techniques, including data augmentation, dropout, and early stopping, were employed to enhance generalization and mitigate overfitting. The use of Grad-CAM heatmaps validated the model’s interpretability by consistently highlighting tumor-relevant areas, which is crucial for fostering clinician trust and facilitating the model’s integration into clinical decision-making processes.

Methods

The methods section delineates the comprehensive approach employed for multiclass brain tumor classification (BTC) utilizing MRI scans. It encompasses several critical steps, including dataset preparation, preprocessing, exploratory data analysis (EDA), model architecture design, training procedures, hyperparameter tuning, and evaluation metrics.

To enhance both accuracy and interpretability for clinical diagnostics, the study integrates an attention mechanism alongside visualization tools and a pre-trained convolutional neural network (CNN). This methodological framework is constructed to ensure rigor, reproducibility, and clinical relevance, thereby facilitating the effective application of the model in a diagnostic context.

Results

In this section, the performance of the proposed hybrid deep learning (DL) model for classifying brain tumors from MRI images is thoroughly analyzed. The evaluation includes an examination of the model’s training history and its classification accuracy on the test dataset. The interpretability of the model’s predictions is enhanced through the use of Grad-CAM visualizations, which provide insights into the decision-making process of the model.

The results indicate a significant improvement in classification accuracy compared to previous models, underscoring the potential applicability of the proposed model in medical settings for brain tumor classification (BTC). The comparative analysis highlights the superiority of the hybrid DL model, suggesting its effectiveness in enhancing diagnostic capabilities in clinical practice.

Discussion

In the discussion of recent advancements in brain tumor detection using deep learning (DL) techniques, several studies highlight the efficacy of various models and architectures. Mathivanan et al. proposed a secure Brain-Tumor Detection Network (BTDN) that achieved impressive accuracy rates of 99.68%, 98.81%, and 95.33% across three datasets by enhancing MRI image quality and ensuring secure data transmission. Similarly, Hun et al. utilized an ensemble of fine-tuned models, achieving accuracies of 99.67% and 99.39% on different datasets. Other notable contributions include the GATransformer model, which integrates attention mechanisms to improve interpretability and classification accuracy, and the hybrid Vision Transformer (ViT) and Gated Recurrent Unit (GRU) model, which achieved an F1-score of 97% on a dataset from Bangladesh.

The literature also emphasizes the importance of model architecture and hyperparameter tuning in optimizing performance. For instance, the VGG16 model, adapted for brain tumor classification, incorporates an attention mechanism to prioritize diagnostic features, thereby enhancing interpretability. Techniques such as dropout regularization and data augmentation have been employed to improve generalization and robustness against overfitting. Overall, these studies collectively underscore the potential of DL models in enhancing diagnostic accuracy and clinical utility in brain tumor detection, while also addressing challenges related to data imbalance and model interpretability.

Limitations

The limitations of the proposed hybrid deep learning model for brain tumor classification highlight several critical areas for future research and clinical application. Firstly, the model was developed using a single, well-annotated public dataset from Kaggle, which may not encompass the full clinical diversity found in multicenter institutions. Variations in imaging protocols, equipment, and patient demographics could affect the model’s generalizability, underscoring the need for external validation across diverse cohorts.

Additionally, while the model integrates enhancements such as attention mechanisms and Grad-CAM visualization, these components have not been independently analyzed through dedicated ablation studies. This lack of granular analysis may hinder the interpretability of individual components, which could be beneficial for future research aimed at optimizing the model’s structure. Furthermore, the computational demands of the interpretability features may pose challenges for scalability in resource-limited settings. Although Grad-CAM aids in identifying diagnostically significant regions, its low spatial resolution may limit its effectiveness in applications requiring precise anatomical localization, such as surgical planning. Lastly, a comparative analysis with emerging transformer-based architectures and hybrid models incorporating large language models could provide additional insights. Despite these limitations, the contributions of the model remain significant, offering a foundation for further development and practical application in clinical settings.