دمج نموذج هجين مع الذكاء الاصطناعي القابل للتفسير لتشخيص أورام الدماغ: نهج موحد لتحليل MRI والتنبؤ Hybrid model integration with explainable AI for brain tumor diagnosis: a unified approach to MRI analysis and prediction

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-06455-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40596288
تاريخ النشر: 2025-07-01
المؤلف: D Vamsidhar وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

يتناول هذا القسم من ورقة البحث الدور الحاسم للكشف الدقيق في العلاج الفعال لأورام الدماغ، مشددًا على أهمية التصوير الطبي للتشخيص المبكر. يقدم البحث نهجين مبتكرين للكشف عن الأورام: الأول يجمع بين معالجة الصور، ومحولات الرؤية (ViT)، وخوارزميات التعلم الآلي لتحليل الصور الطبية، محققًا دقة تصل إلى 98.17%. النهج الثاني يستخدم تقنية دمج النماذج المتوازية التي تدمج بين نموذجين عميقين مدربين مسبقًا، ResNet101 وXception، ويستخدم تفسيرات نموذجية قابلة للتفسير محليًا (LIME) لتعزيز قابلية تفسير النموذج، مما يؤدي إلى دقة مثيرة للإعجاب تصل إلى 99.67%. يدعو المؤلفون إلى استخدام تقنية دمج النماذج المتوازية كأفضل طريقة للكشف عن الأورام.

في الختام، يستفيد نظام التصنيف المقترح من دمج الميزات، وآلية الانتباه الذاتي، ودمج النماذج المتوازية لتحسين تصنيف أورام الدماغ. تفوقت طريقة دمج النماذج المتوازية على الأساليب التقليدية من خلال التقاط الأنماط المكانية والدلالية المعقدة في الصور الطبية بشكل فعال، مما يعزز تمييز الميزات وقوة التصنيف. إن دمج تقنيات الذكاء الاصطناعي القابلة للتفسير، مثل LIME، يعالج التحديات المرتبطة بالنماذج العميقة من خلال ضمان الشفافية والموثوقية، وهو أمر حاسم للتطبيقات السريرية. ستركز الأعمال المستقبلية على توسيع النموذج لتصنيف متعدد الفئات لأنواع الأورام وتعزيز قدراته على التعميم للاستخدام الأوسع.

الطرق

تستخدم المنهجية المقترحة نهج التعلم العميق باستخدام مجموعة متنوعة من الشبكات العصبية التلافيفية (CNNs)، وبالتحديد متغيرات ResNet وXception، لاختيار النموذج ودمجه في مهام التصوير الطبي، مثل تقسيم أورام الدماغ. تبدأ سير العمل باستكشاف هذه النماذج، تليها تطبيق تقنيات الذكاء الاصطناعي القابلة للتفسير (XAI)، وخاصة LIME، لتعزيز قابلية تفسير النموذج وشفافيته. يقدم قسم النتائج تحليلًا مقارنًا لأداء متغيرات ResNet وXception والنماذج الهجينة.

تتعامل ResNet101 مع مشكلة تلاشي التدرج في الشبكات العميقة من خلال هيكل التعلم المتبقي المبتكر، الذي يتضمن خرائط الهوية عبر اتصالات تخطي، مما يسهل تدفق التدرج الفعال أثناء العودة. مع 101 طبقة تلافيفية، تلتقط ResNet101 ميزات هرمية معقدة، مما يجعلها فعالة بشكل خاص في المهام الدقيقة مثل تقسيم الأورام. كما يركز تصميمها على الكفاءة الحاسوبية من خلال هياكل عنق الزجاجة، بينما تستفيد من الأوزان المدربة مسبقًا من مجموعات بيانات مثل ImageNet لتعزيز قدراتها الوظيفية. في المقابل، يستخدم نموذج Xception التلافيفات القابلة للفصل حسب العمق، مما يسمح باستخراج الميزات بكفاءة والقدرة على تعلم الأنماط المعقدة من مجموعات بيانات عالية الأبعاد. تم تدريب هذا النموذج أيضًا مسبقًا على مجموعات بيانات كبيرة، مما يجعله مناسبًا لمهام تحليل الصور التفصيلية، بما في ذلك الكشف عن أورام الدماغ، من خلال تحقيق توازن بين الكفاءة الحاسوبية وقابلية تفسير النموذج.

يختتم القسم بمناقشة المفاهيم الرياضية التي تدعم المنهجية المقترحة، مع التركيز على دمج تمثيلات الميزات التكميلية وطبقة التصنيف، والتي تعتبر حاسمة للتحقق من فعالية النهج.

النتائج

يقدم البحث نهجًا هجينيًا للتعلم الآلي لتصنيف أورام الدماغ من خلال دمج محولات الرؤية (ViTs) مع الخوارزميات التقليدية مثل آلات الدعم الناقل (SVM) والغابات العشوائية (RF). تم تقييم النموذج على كل من مجموعة بيانات قياسية وبيانات خارجية تم جمعها ذاتيًا، مما كشف عن انخفاض كبير في الدقة لنماذج ViT عند اختبارها على صور غير مرئية، خاصة تلك المدربة مع متغيرات الصور الهندسية مثل هيستوغرام التدرجات الموجهة (HOGs) والمناطق المmasked. بينما حققت بعض النماذج دقة تزيد عن 98% على بيانات التدريب، غالبًا ما كانت الأداء على مجموعات البيانات الخارجية أقل من 50%، مما يشير إلى الإفراط في التكيف مع ميزات معينة بدلاً من الأنماط القابلة للتعميم. في المقابل، أظهرت النماذج المدربة على صور محسنة باستخدام CLAHE أداءً أكثر اتساقًا، مما يبرز أهمية تحسينات البيانات الواقعية لتحسين التعميم في التطبيقات السريرية.

بالإضافة إلى ذلك، استكشفت الدراسة ثلاث تقنيات دمج—دمج النماذج المتوازية، وآلية الانتباه الذاتي، وتقنية دمج الميزات—باستخدام هياكل ResNet101 وXception. حقق دمج النماذج المتوازية أعلى دقة بلغت 99.67%، بالإضافة إلى مقاييس متفوقة في الدقة والاسترجاع ودرجة F1 لتصنيف الأورام. كما أدت آلية الانتباه الذاتي أداءً جيدًا، محققة دقة تبلغ 98.16%، بينما تأخرت تقنية دمج الميزات عند 95.50%. تؤكد النتائج أنه بينما يوفر الدمج المتوازي تمثيلًا قويًا للميزات، فإن الطرق الأخرى، على الرغم من فعاليتها، لا تستغل بالكامل مزايا النماذج الأساسية. تسلط النتائج الضوء على الحاجة الملحة لأن تحافظ النماذج على الأداء عبر مجموعات بيانات متنوعة لضمان قابليتها للتطبيق في البيئات السريرية الواقعية.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقدم الكبير في التعلم العميق، خاصة في سياق تحليل الصور الطبية للكشف عن أورام الدماغ. يؤكد على دور الشبكات العصبية العميقة، المدعومة بأجهزة قوية وأطر مثل TensorFlow وPyTorch، في أتمتة استخراج الميزات من فحوصات التصوير بالرنين المغناطيسي. من الجدير بالذكر أن نماذج مثل VGG وResNet وEfficientNet قد أظهرت معدلات دقة عالية، حيث حققت الدراسة دقة مثيرة للإعجاب تبلغ 99.02% باستخدام التعلم العميق على مجموعة بيانات تضم 3,000 فحص بالرنين المغناطيسي. يتم تقديم دمج محولات الرؤية (ViTs) مع مصنفات التعلم الآلي التقليدية مثل آلات الدعم الناقل (SVM) والغابات العشوائية كنهج واعد لتعزيز أداء التصنيف، خاصة في السيناريوهات ذات البيانات المحدودة.

تناقش الورقة أيضًا التحديات التي تواجه نشر نماذج التعلم العميق في البيئات السريرية، مثل الحاجة إلى مجموعات بيانات كبيرة مشروحة وأهمية قابلية تفسير النموذج. تتناول تقنيات المعالجة المسبقة المختلفة، بما في ذلك تحسين تباين الهيستوغرام المحدود (CLAHE) وهيستوغرام التدرجات الموجهة (HOG)، التي تحسن جودة الصورة وتعزز قوة النموذج ضد الإفراط في التكيف. تؤكد النتائج على إمكانيات النماذج الهجينة التي تجمع بين قدرات استخراج الميزات لـ ViTs مع نقاط القوة في التصنيف لـ SVMs والغابات العشوائية، مما يؤدي في النهاية إلى تحسين دقة التشخيص وقابلية التعميم في الكشف عن أورام الدماغ.

القيود

تسلط قسم القيود الضوء على عدة تحديات مرتبطة بتقنية دمج النماذج المتوازية، على الرغم من أدائها التنبؤي العالي. إن التشغيل المتزامن لشبكتين عميقتين على صورة واحدة يزيد بشكل كبير من التكاليف الحاسوبية، واستخدام ذاكرة GPU، ومدة التدريب، ووقت الاستدلال، مما قد يكون عائقًا أمام العيادات الصغيرة وأنظمة الصحة المتنقلة ذات الموارد المحدودة. بالإضافة إلى ذلك، قد يؤدي إعداد النموذج المزدوج إلى تكرار الميزات والإفراط في التكيف دون تنظيم مناسب، بينما تعقد تعقيدات إدارة نماذج متعددة مدربة بشكل متزامن وتحسين المعلمات التنفيذ.

علاوة على ذلك، بينما تقدم آليات الانتباه الذاتي توازنًا بين القابلية للتفسير والكفاءة، فإنها عمومًا تؤدي أداءً أقل في تصنيف الصور مقارنةً بنماذج دمج الميزات، التي توفر معالجة أسرع بتكاليف أقل. تشير الورقة أيضًا إلى عدم دقة معينة في الكشف عن الأورام، كما هو موضح في الأشكال، حيث يمكن أن يتم تصنيف مناطق الأورام ذات التباين المنخفض أو الميزات التشريحية بشكل خاطئ بسبب القيود في مساحة الميزات المشتركة للنماذج المتوازية. قد تنشأ هذه المشكلات من حجم مجموعة البيانات، والضوضاء، وغياب حالات الشذوذ أثناء التدريب، مما يزيد من تعقيد عدم وجود آليات انتباه محلية يمكن أن تعزز قدرة النموذج على التقاط القوام الحرج اللازم للتصنيف الدقيق.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-06455-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40596288
Publication Date: 2025-07-01
Author(s): D Vamsidhar et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

This research paper section discusses the critical role of accurate detection in the effective treatment of brain tumors, emphasizing the importance of medical imaging for early diagnosis. The study introduces two innovative approaches for tumor detection: the first combines image processing, vision transformers (ViT), and machine learning algorithms to analyze medical images, achieving an accuracy of 98.17%. The second approach employs a parallel model integration technique that merges two pre-trained deep learning models, ResNet101 and Xception, and utilizes local interpretable model-agnostic explanations (LIME) to enhance model interpretability, resulting in an impressive accuracy of 99.67%. The authors advocate for the parallel model integration technique as the superior method for tumor detection.

In conclusion, the proposed classification system leverages feature fusion, a self-attention mechanism, and parallel model integration to improve brain tumor classification. The parallel model integration method outperformed traditional approaches by effectively capturing complex spatial and semantic patterns in medical images, thereby enhancing feature discrimination and classification robustness. The incorporation of explainable AI techniques, such as LIME, addresses the challenges associated with deep learning models by ensuring transparency and reliability, which is crucial for clinical applications. Future work will focus on extending the model for multi-class classification of tumor types and enhancing its generalization capabilities for broader use.

Methods

The proposed methodology employs a deep learning approach utilizing various Convolutional Neural Networks (CNNs), specifically ResNet variants and Xception, for model selection and integration in medical imaging tasks, such as brain tumor segmentation. The workflow begins with an exploration of these models, followed by the application of Explainable Artificial Intelligence (XAI) techniques, particularly LIME, to enhance model interpretability and transparency. The results section presents a comparative analysis of the performance of ResNet variants, Xception, and hybrid models.

ResNet101 addresses the issue of gradient decay in deep networks through its innovative residual learning structure, which incorporates identity mappings via skip connections, facilitating effective gradient flow during backpropagation. With 101 convolutional layers, ResNet101 captures complex hierarchical features, making it particularly effective for nuanced tasks like tumor segmentation. Its design also emphasizes computational efficiency through bottleneck structures, while leveraging pre-trained weights from datasets like ImageNet to enhance its functional capabilities. In contrast, the Xception model utilizes depthwise separable convolutions, allowing for efficient feature extraction and the ability to learn complex patterns from high-dimensional datasets. This model is also pre-trained on large datasets, making it suitable for detailed image analysis tasks, including brain tumor detection, by balancing computational efficiency with model interpretability.

The section concludes with a discussion of the mathematical concepts underpinning the proposed methodology, focusing on the fusion of complementary feature representations and the classification layer, which are critical for validating the effectiveness of the approach.

Results

The research presents a hybrid machine learning approach for brain tumor classification by integrating Vision Transformers (ViTs) with classical algorithms such as Support Vector Machines (SVM) and Random Forests (RF). The model was evaluated on both a standard dataset and self-collected external data, revealing a significant drop in accuracy for ViT-based models when tested on unseen images, particularly those trained with engineered image variants like Histogram of Oriented Gradients (HOGs) and masked regions. While some models achieved over 98% accuracy on training data, performance on external datasets often fell below 50%, indicating overfitting to specific features rather than generalizable patterns. In contrast, models trained on CLAHE-enhanced images demonstrated more consistent performance, underscoring the importance of realistic data augmentations for improving generalization in clinical applications.

Additionally, the study explored three integration techniques—Parallel Model Integration, Self-Attention Mechanism, and Feature Fusion Technique—using ResNet101 and Xception architectures. The Parallel Model Integration yielded the highest accuracy of 99.67%, along with superior metrics in precision, recall, and F1-score for tumor classification. The Self-Attention Mechanism also performed well, achieving an accuracy of 98.16%, while the Feature Fusion Technique lagged behind at 95.50%. The results emphasize that while parallel integration provides robust feature representation, the other methods, although effective, do not fully exploit the advantages of the underlying models. The findings highlight the critical need for models to maintain performance across diverse datasets to ensure their applicability in real-world clinical settings.

Discussion

The discussion section of the research paper highlights the significant advancements in deep learning, particularly in the context of medical image analysis for brain tumor detection. It emphasizes the role of deep neural networks, supported by powerful hardware and frameworks like TensorFlow and PyTorch, in automating feature extraction from MRI scans. Notably, models such as VGG, ResNet, and EfficientNet have demonstrated high accuracy rates, with the study achieving an impressive 99.02% accuracy using deep transfer learning on a dataset of 3,000 MRI scans. The integration of Vision Transformers (ViTs) with traditional machine learning classifiers like Support Vector Machines (SVM) and Random Forests is presented as a promising approach to enhance classification performance, particularly in scenarios with limited data.

The paper also addresses the challenges faced in deploying deep learning models in clinical settings, such as the need for large annotated datasets and the importance of model explainability. It discusses various preprocessing techniques, including Contrast Limited Adaptive Histogram Equalization (CLAHE) and Histogram of Oriented Gradients (HOG), which improve image quality and enhance model robustness against overfitting. The findings underscore the potential of hybrid models that combine the feature extraction capabilities of ViTs with the classification strengths of SVMs and Random Forests, ultimately leading to improved diagnostic accuracy and generalizability in brain tumor detection.

Limitations

The section on limitations highlights several challenges associated with the Parallel Model Integration technique, despite its high predictive performance. The simultaneous operation of two deep networks on a single image significantly increases computational costs, GPU memory usage, training duration, and inference time, which can be prohibitive for smaller clinics and mobile health systems with limited resources. Additionally, the dual-model setup may lead to feature redundancy and overfitting without appropriate regularization, while the complexity of managing multiple synchronously trained models and hyperparameter optimization further complicates the implementation.

Moreover, while self-attention mechanisms offer a balance between interpretability and efficiency, they generally underperform in image classification compared to feature fusion models, which provide faster processing at lower costs. The paper also notes specific inaccuracies in tumor detection, as illustrated in the figures, where low-contrast tumor areas or anatomical features can be misclassified due to limitations in the shared feature space of the parallel models. These issues may stem from dataset size, noise, and the absence of outlier instances during training, compounded by the lack of localized attention mechanisms that could enhance the model’s ability to capture critical textures necessary for accurate classification.