الكشف عن أورام الدماغ باستخدام التصوير بالرنين المغناطيسي من خلال هيكل انتباه قابل للتفسير EfficientNetV2 وMLP-mixer MRI-based brain tumor detection through an explainable EfficientNetV2 and MLP-mixer attention architecture

المجلة: Physical and Engineering Sciences in Medicine
DOI: https://doi.org/10.1007/s13246-026-01728-0
PMID: https://pubmed.ncbi.nlm.nih.gov/41945251
تاريخ النشر: 2026-04-07
المؤلف: Mustafa Yurdakul وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

تقدم ورقة البحث نموذجًا جديدًا للتعلم العميق لتصنيف أورام الدماغ، مما يعالج الحاجة الملحة لأنظمة التشخيص الآلي بسبب معدلات الوفيات العالية المرتبطة بهذه الأورام. باستخدام مجموعة بيانات تتكون من 3,064 صورة MRI معززة بالتباين بتقنية T1، تقيم الدراسة تسع هياكل معروفة للشبكات العصبية التلافيفية (CNN) لتحديد العمود الفقري الأكثر فعالية لتصنيف الأورام. تم اختيار EfficientNetV2 لأدائه المتفوق، وتم دمج هيكل MLP-Mixer القائم على الانتباه لتعزيز قدرات التصنيف. تم التحقق من فعالية النموذج من خلال التحقق المتقاطع بخمسة أضعاف، محققًا دقة مثيرة للإعجاب تبلغ 99.50%، بالإضافة إلى دقة واسترجاع ودرجات F1 تتجاوز 99%.

تشير النتائج إلى أن النموذج المقترح لا يتفوق فقط على الأساليب الموجودة في الأدبيات، بل يعزز أيضًا قابلية التفسير من خلال تصورات Grad-CAM، التي تبرز المناطق ذات الصلة من صور MRI التي تتوافق مع مناطق الأورام الفعلية. تعزز هذه القدرة موثوقية النموذج السريرية، مما يجعله أداة دعم قرار قيمة لأطباء الأشعة. يؤكد المؤلفون على إمكانيات الذكاء الاصطناعي في الرعاية الصحية ويقترحون أن الأبحاث المستقبلية التي تشمل مجموعات بيانات أكبر ومتعددة المراكز ومتعددة الأنماط يمكن أن تعزز بشكل أكبر تعميم النموذج وقابليته للتطبيق السريري.

مقدمة

تتناول مقدمة ورقة البحث هذه التأثير الكبير لأورام الدماغ، التي تتميز بالنمو غير المنضبط للخلايا في أنسجة الدماغ، مما يؤثر على مئات الآلاف عالميًا كل عام. مع تشخيص حوالي 25,000 حالة جديدة في الولايات المتحدة في عام 2024، ترتبط أورام الدماغ، وخاصة الأنواع الخبيثة، بمعدلات وفيات عالية. تشمل الأنواع الأكثر شيوعًا الأورام الدبقية، والأورام السحائية، وأورام الغدة النخامية، كل منها له خصائصه وتأثيراته السريرية المميزة. الأورام الدبقية عدوانية وغير متجانسة، بينما تنمو الأورام السحائية عادة ببطء وغالبًا ما تكون حميدة، في حين يمكن أن تكون أورام الغدة النخامية وظيفية أو غير وظيفية، مما يؤثر على مستويات الهرمونات ويسبب أعراضًا متنوعة.

تسلط الورقة الضوء على التحديات في تشخيص أورام الدماغ باستخدام MRI، التي، على الرغم من قدراتها العالية الدقة، تقدم تعقيدات في تحليل الصور بسبب تنوع شكل الأورام وإمكانية تفويت الآفات. وهذا يبرز الحاجة المتزايدة لأنظمة التشخيص المدعومة بالحاسوب (CAD). تستعرض المقدمة مجموعة متنوعة من أساليب التعلم العميق لتصنيف أورام الدماغ، مشيرة إلى معدلات دقة عالية حققتها نماذج مختلفة من الشبكات العصبية التلافيفية (CNN). ومع ذلك، تشير أيضًا إلى القيود مثل التكاليف الحسابية العالية وتقليل قابلية التفسير السريرية. لمعالجة هذه القضايا، تقترح الدراسة نموذجًا هجينًا جديدًا يجمع بين EfficientNetV2 وهيكل MLP-Mixer القائم على الانتباه، بهدف تعزيز الأداء مع الحفاظ على قابلية التفسير السريرية. توضح الورقة مساهماتها، بما في ذلك مراجعة شاملة للأدبيات، وتقييم لعدة هياكل CNN، وتطبيق رسم خرائط تنشيط الفئة المعتمد على التدرج (Grad-CAM) من أجل قابلية تفسير النموذج.

الطرق

في قسم الطرق، يوضح المؤلفون المواد والأساليب المستخدمة في دراستهم حول تصنيف أورام الدماغ. يقدمون مجموعة البيانات، موضحين خصائصها التصويرية ونطاقها، تليها وصف لهندسة EfficientNetV2، التي تعمل كعمود فقري لنموذجهم. كما يتم تسليط الضوء على دمج هيكل MLP-Mixer القائم على الانتباه، جنبًا إلى جنب مع استخدام نهج قائم على Grad-CAM لشرح النموذج. توضح شرائح MRI من مجموعة البيانات أنواع الأورام المختلفة، بما في ذلك الأورام الدبقية، والأورام السحائية، وأورام الغدة النخامية، المقدمة في وجهات نظر تشريحية متنوعة.

اختار المؤلفون EfficientNetV2 بسبب توازنه المفضل بين الدقة والكفاءة الحسابية، واختاروا نسخة أصغر نطاقًا لتجاربهم. استخدم الإعداد التجريبي محطة عمل مع تكوينات محددة من الأجهزة والبرامج، كما هو موضح في الجدول 1. تم استخدام طريقة التحقق المتقاطع بخمسة أضعاف لتقييم مجموعة البيانات، بما يتماشى مع الممارسات الشائعة في الأدبيات لضمان مقارنة عادلة للطرق.

النتائج

في قسم النتائج، تقيم الدراسة أداء هياكل الشبكات العصبية التلافيفية (CNN) المختلفة لتصنيف أورام الدماغ باستخدام طريقة التحقق المتقاطع بخمسة أضعاف. من بين النماذج التسعة المختبرة، برز EfficientNetV2 كالأكثر فعالية، محققًا دقة تبلغ 97.54%، إلى جانب قيم دقة واسترجاع ودرجة F1 تبلغ 97.93%، 96.49%، و97.11%، على التوالي. تضمنت النماذج البارزة الأخرى VGG16 وDenseNet121، التي أظهرت أيضًا دقة عالية وأداء متوازن. بالمقابل، أظهرت الهياكل التقليدية مثل ResNet50 وConvNeXt أداءً أقل، خاصة في الاسترجاع، مما يدل على قيودها في اكتشاف أنماط الأورام المعقدة.

حسن النموذج الهجين المقترح، الذي يدمج EfficientNetV2 مع كتل انتباه MLP-Mixer، بشكل كبير مقاييس التصنيف، محققًا دقة تزيد عن 98% عبر تكوينات مختلفة. أدت التكوين الأمثل (Token = 128، Channel = 512) إلى تحقيق دقة 99.50%، ودقة 99.47%، واسترجاع 99.52%، ودرجة F1 تبلغ 99.49%، مما يمثل تحسينًا كبيرًا مقارنة بالنموذج الأساسي. بالإضافة إلى ذلك، أكدت تصورات Grad-CAM أن النموذج يركز بفعالية على المناطق السريرية ذات الصلة بالأورام، مما يعزز من قابليته للتفسير وموثوقيته لتطبيقات الرعاية الصحية. تؤكد هذه النتائج على إمكانيات هياكل CNN المتقدمة في تحسين أنظمة دعم القرار المعتمدة على الذكاء الاصطناعي في التصوير الطبي.

المناقشة

تسلط قسم المناقشة في الورقة الضوء على تطوير وتقييم نموذج هجين جديد لتصنيف أورام الدماغ، يجمع بين EfficientNetV2 مع هيكل MLP-Mixer القائم على الانتباه الخطي. يعزز آلية الانتباه الخطي قدرة النموذج على التقاط الاعتماديات بعيدة المدى في البيانات مع الحفاظ على الكفاءة الحسابية. يتضمن كتلة انتباه MLP-Mixer المقترحة طبقات انتباه قبل كل من MLPs الخاصة بمزج الرموز ومزج القنوات، مما يجمع بفعالية المعلومات المكانية والقنوية لتحسين أداء التصنيف. حقق النموذج نتائج ملحوظة، بدقة تبلغ 99.50%، متجاوزًا معايير الأدبيات الحالية ومظهرًا تحسينات كبيرة في الدقة والحساسية ودرجة F1.

بالإضافة إلى ذلك، تم تقييم قابلية تفسير النموذج باستخدام تصورات Grad-CAM، التي أشارت إلى أن النموذج يركز على المناطق السريرية ذات الصلة بالأورام، مما يعزز من موثوقيته كأداة تشخيصية. على الرغم من قوته، تعترف الدراسة بالقيود، مثل مجموعة البيانات المحدودة التي تتكون من ثلاثة أنواع فقط من الأورام وغياب الاختبار عبر أنماط التصوير المختلفة. تهدف الأبحاث المستقبلية إلى معالجة هذه القيود من خلال التعاون مع مؤسسات الرعاية الصحية لجمع مجموعات بيانات أكثر تنوعًا وتقييم أداء النموذج عبر تقنيات التصوير المختلفة، مما يعزز في النهاية قابليته للتطبيق في الإعدادات السريرية.

Journal: Physical and Engineering Sciences in Medicine
DOI: https://doi.org/10.1007/s13246-026-01728-0
PMID: https://pubmed.ncbi.nlm.nih.gov/41945251
Publication Date: 2026-04-07
Author(s): Mustafa Yurdakul et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

The research paper presents a novel deep learning model for the classification of brain tumors, addressing the critical need for automated diagnostic systems due to the high mortality rates associated with these tumors. Utilizing a dataset of 3,064 T1-weighted contrast-enhanced MRI images, the study evaluates nine established convolutional neural network (CNN) architectures to identify the most effective backbone for tumor classification. EfficientNetV2 was selected for its superior performance, and an attention-based MLP-Mixer architecture was integrated to enhance classification capabilities. The model’s effectiveness was validated through five-fold cross-validation, achieving an impressive accuracy of 99.50%, along with precision, recall, and F1 scores exceeding 99%.

The findings indicate that the proposed model not only outperforms existing methods in the literature but also enhances interpretability through Grad-CAM visualizations, which highlight relevant regions of MRI images corresponding to actual tumor areas. This capability significantly improves clinical reliability, making the model a valuable decision support tool for radiologists. The authors emphasize the potential of artificial intelligence in healthcare and suggest that future research involving larger, multi-center, and multimodal datasets could further enhance the model’s generalization and clinical applicability.

Introduction

The introduction of this research paper addresses the significant impact of brain tumors, which are characterized by the uncontrolled growth of cells in brain tissue, affecting hundreds of thousands globally each year. With approximately 25,000 new cases diagnosed in the USA in 2024, brain tumors, particularly malignant types, are associated with high mortality rates. The most prevalent types include gliomas, meningiomas, and pituitary tumors, each with distinct characteristics and clinical implications. Gliomas are aggressive and heterogeneous, meningiomas typically grow slowly and are often benign, while pituitary tumors can be functional or nonfunctional, affecting hormone levels and causing various symptoms.

The paper highlights the challenges in diagnosing brain tumors using MRI, which, despite its high-resolution capabilities, presents complexities in image analysis due to tumor morphological diversity and the potential for missed lesions. This underscores the growing necessity for computer-aided diagnosis (CAD) systems. The introduction reviews various deep learning approaches for brain tumor classification, noting high accuracy rates achieved by different convolutional neural network (CNN) models. However, it also points out limitations such as high computational costs and reduced clinical interpretability. To address these issues, the study proposes a novel hybrid model that combines EfficientNetV2 with an attention-based MLP-Mixer architecture, aiming to enhance performance while maintaining clinical interpretability. The paper outlines its contributions, including a comprehensive literature review, evaluation of multiple CNN architectures, and the application of Gradient-weighted Class Activation Mapping (Grad-CAM) for model interpretability.

Methods

In the Methods section, the authors outline the materials and methodologies employed in their study on brain tumor classification. They introduce the dataset, detailing its imaging characteristics and scope, followed by a description of the EfficientNetV2 architecture, which serves as the backbone of their model. The integration of an attention-based MLP-Mixer structure is also highlighted, alongside the use of a Grad-CAM-based approach for model explainability. MRI slices from the dataset illustrate the different tumor types, including glioma, meningioma, and pituitary tumors, presented in various anatomical views.

The authors selected EfficientNetV2 due to its favorable balance between accuracy and computational efficiency, opting for a smaller scale variant for their experiments. The experimental setup utilized a workstation with specified hardware and software configurations, as detailed in Table 1. A 5-fold cross-validation method was employed for dataset evaluation, aligning with common practices in the literature to ensure a fair comparison of the methodologies.

Results

In the results section, the study evaluates the performance of various convolutional neural network (CNN) architectures for brain tumor classification using a five-fold cross-validation method. Among the nine tested models, EfficientNetV2 emerged as the most effective, achieving an accuracy of 97.54%, alongside precision, recall, and F1-score values of 97.93%, 96.49%, and 97.11%, respectively. Other notable models included VGG16 and DenseNet121, which also demonstrated high accuracy and balanced performance. In contrast, traditional architectures like ResNet50 and ConvNeXt exhibited lower performance, particularly in recall, indicating their limitations in detecting complex tumor patterns.

The proposed hybrid model, which integrates EfficientNetV2 with MLP-Mixer Attention blocks, significantly improved classification metrics, achieving over 98% accuracy across various configurations. The optimal configuration (Token = 128, Channel = 512) yielded 99.50% accuracy, 99.47% precision, 99.52% recall, and 99.49% F1-score, marking a substantial enhancement over the baseline model. Additionally, Grad-CAM visualizations confirmed that the model effectively focuses on clinically relevant tumor regions, thereby enhancing its explainability and reliability for healthcare applications. These findings underscore the potential of advanced CNN architectures in improving AI-based decision support systems in medical imaging.

Discussion

The discussion section of the paper highlights the development and evaluation of a novel hybrid model for brain tumor classification, integrating EfficientNetV2 with a linear attention-based MLP-Mixer architecture. The linear attention mechanism enhances the model’s ability to capture long-range dependencies in data while maintaining computational efficiency. The proposed MLP-Mixer Attention block incorporates attention layers before both the Token-Mixing and Channel-Mixing MLPs, effectively combining spatial and channel information to improve classification performance. The model achieved remarkable results, with an accuracy of 99.50%, surpassing existing literature benchmarks and demonstrating significant improvements in precision, sensitivity, and F1 score.

Additionally, the model’s explainability was assessed using Grad-CAM visualizations, which indicated that the model focuses on clinically relevant regions of the tumors, thereby enhancing its reliability as a diagnostic tool. Despite its strengths, the study acknowledges limitations, such as the restricted dataset comprising only three tumor types and the lack of testing across different imaging modalities. Future research aims to address these limitations by collaborating with healthcare institutions to gather more diverse datasets and evaluate the model’s performance across various imaging techniques, ultimately enhancing its applicability in clinical settings.