تصنيف دقيق متعدد الفئات لأورام الدماغ باستخدام نموذج محول بصري محسّن يعتمد على صور فحص الرنين المغناطيسي A fine-tuned vision transformer based enhanced multi-class brain tumor classification using MRI scan imagery

المجلة: Frontiers in Oncology، المجلد: 14
DOI: https://doi.org/10.3389/fonc.2024.1400341
PMID: https://pubmed.ncbi.nlm.nih.gov/39091923
تاريخ النشر: 2024-07-18
المؤلف: C. Kishor Kumar Reddy وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

تبحث الدراسة في تطبيق نماذج المحولات البصرية المعدلة (FTVTs) لتصنيف أورام الدماغ باستخدام صور الرنين المغناطيسي، مقارنة أدائها مع نماذج التعلم العميق المعروفة مثل ResNet-50 وMobileNet-V2 وEfficientNet-B0. تتكون مجموعة البيانات من 7,023 صورة رنين مغناطيسي مصنفة إلى أربع فئات: الورم الدبقي، ورم السحايا، ورم الغدة النخامية، وعدم وجود ورم. تسلط الدراسة الضوء على أن نموذج FTVT-l16 حقق أعلى دقة بنسبة 98.70%، متجاوزًا دقة ResNet-50 (96.5%) وEfficientNet-B0 (95.1%) وMobileNet-V2 (94.9%). كما أظهرت نماذج FTVT الأخرى أداءً قويًا، مما يدل على فعاليتها في معالجة الصور الطبية.

في الختام، تقدم الدراسة تحليلًا مقارنًا شاملاً لمختلف هياكل التعلم العميق، مع التأكيد على مزايا نماذج FTVT في تحقيق دقة عالية لتصنيف أورام الدماغ. تتضمن البحث تقنيات معالجة مسبقة متنوعة ومنهجيات تدريب صارمة، مما يسهم في أداء النماذج. من الجدير بالذكر أن نماذج FTVT، وخاصة المتغيرات “L”، تفوقت باستمرار على المتغيرات “B”، مما يشير إلى قدرتها الفائقة في التقاط التفاصيل الدقيقة. تؤكد النتائج على قوة نماذج FTVT في تحليل الصور الطبية، مما يوفر إمكانات كبيرة للكشف المبكر وتصنيف أورام الدماغ.

مقدمة

تناقش مقدمة ورقة البحث التعقيدات المحيطة بأورام الدماغ، التي يمكن أن تكون خبيثة أو حميدة، وتسلط الضوء على التحديات في تشخيصها المبكر بسبب شكلها غير المنتظم ومواقعها المتنوعة. تركز الدراسة على تصنيف الأورام الدبقية، ورم السحايا، وأورام الغدة النخامية، التي تقدم تحديات تشخيصية ومتطلبات علاجية مميزة. تؤكد الزيادة في معدلات الإصابة والوفيات بسبب أورام الدماغ على ضرورة تحسين طرق التشخيص، خاصة من خلال تقنيات التعلم الآلي المتقدمة (ML) والتعلم العميق (DL) المطبقة على صور الرنين المغناطيسي.

تشدد الورقة على فعالية خوارزميات ML المختلفة، مثل الجيران الأقرب (KNN) والغابة العشوائية وNaive Bayes، في تصنيف الأورام، مع الاعتراف أيضًا بحدودها، خاصة في سيناريوهات البيانات عالية الأبعاد. في المقابل، يتم تسليط الضوء على الشبكات العصبية التلافيفية (CNNs) والمحولات البصرية (ViTs) لقدرتها الفائقة على استخراج الميزات ذات الصلة تلقائيًا من بيانات الرنين المغناطيسي غير المعالجة، مما يعزز دقة التصنيف. تقدم الدراسة المقترحة نماذج المحولات البصرية المعدلة (FTVTs) مع رؤوس مصنف محدثة ومعلمات محسّنة، بهدف تحقيق مقاييس أداء أعلى (الدقة، درجة F1، الدقة، والاسترجاع) في تصنيف أورام الدماغ. يبني البحث على دراسات سابقة، مما يظهر إمكانات هذه النماذج المتقدمة في مواجهة تحديات تشخيص أورام الدماغ.

الطرق

تحدد قسم الطرق في ورقة البحث المنهجيات المستخدمة لتصنيف أورام الدماغ، مع التركيز على نماذج التعلم العميق (DL) ونقل الميزات عبر التعلم الانتقالي (FTVT). يبدأ بفحص مفصل للمبادئ المعمارية لكل نموذج، يليه استكشاف لتنفيذها في تصنيف أورام الدماغ. يتم تقييم أداء نماذج DL مقابل نماذج FTVT المختلفة، بما في ذلك FTVT-b16 وFTVT-b32 وFTVT-l16 وFTVT-l32، إلى جانب نماذج معروفة مثل ResNet50 وMobileNet-V2 وEfficientNet-B0.

تضمنت الإجراءات التجريبية عدة خطوات رئيسية في معالجة البيانات، بدءًا من توحيد صور الرنين المغناطيسي للدماغ إلى حجم موحد قدره 256 × 256 بكسل لضمان كل من الدقة وكفاءة التدريب. تم تنفيذ استخراج الميزات باستخدام النماذج المذكورة، وتم استخدام مجموعة اختبار منفصلة لتقييم أداء النماذج بناءً على مقاييس مثل الدقة والدقة والاسترجاع ودرجة F1. شمل خط أنابيب التصنيف تحويل الملفات السداسية إلى صور، وتطبيق ضبابية غاوسية، واستخدام ImageDataGenerator لتحميل وتطبيع مجموعة البيانات. تم تدريب النموذج مع أوزان الفئات المحسوبة لمعالجة عدم توازن الفئات، وتم تقييم فعاليته من خلال مصفوفة الالتباس وتقرير التصنيف، مما يوفر في النهاية درجات احتمالية لتصنيف الأورام في صور الرنين المغناطيسي الجديدة.

النتائج

تم إجراء التجربة باستخدام منصة كاجل المملوكة لشركة جوجل، مع الاستفادة تحديدًا من نواة كاجل، التي تعمل كخادم دفتر ملاحظات جوبتر مجاني مع قدرات GPU مدمجة. مكنت هذه الإعدادات من استخدام موارد الحوسبة السحابية، مما خفف من قيود معالجة الآلات المحلية. بالنسبة للتجربة، تم استخدام حالتين من GPU T4، كل منهما مزودة بذاكرة سعة 15 جيجابايت وذاكرة وصول عشوائي سعة 29 جيجابايت، مما يسهل التعامل مع مجموعات بيانات الصور الكبيرة.

تم تنفيذ البرمجة للتجربة بلغة بايثون، مع الاستفادة من مكتبة PyTorch لتطوير نماذج التعلم العميق وFTVT. بالإضافة إلى ذلك، تم استخدام مكتبة Matplotlib لتصور النتائج، مما يضمن إمكانية التواصل الفعال وتحليل النتائج.

المناقشة

في هذا القسم، تناقش الدراسة مجموعة البيانات والمنهجيات المستخدمة لتصنيف أورام الدماغ متعددة الفئات باستخدام صور الرنين المغناطيسي. تتكون مجموعة البيانات من 7,023 صورة من ثلاثة مصادر: figshare وSARTAJ وBr35H، مصنفة إلى أربع فئات: الورم الدبقي، ورم السحايا، ورم الغدة النخامية، وعدم وجود ورم. تم حساب مصفوفة ارتباط بيرسون لتحليل العلاقات بين قيم البكسل عبر الصور، مما يوفر رؤى حول تشابه البيانات. تم تطبيق تقنيات معالجة البيانات، بما في ذلك زيادة الصور (إعادة الحجم، والتدوير، والتدوير، والتطبيع)، لتعزيز قوة النموذج وقدرته على التعميم، مما يحسن الأداء على البيانات غير المرئية.

تستخدم الدراسة هياكل تعلم عميق متنوعة، بما في ذلك الشبكات العصبية التلافيفية (CNNs) مثل ResNet-50 وEfficientNet-B0 وMobileNet-V2، بالإضافة إلى المحولات البصرية (ViTs). يتم ضبط كل نموذج بدقة لمهمة تصنيف أورام الدماغ المحددة، باستخدام تقنيات مثل الإسقاط والتطبيع الدفعي لمنع الإفراط في التكيف. يتم تدريب النماذج باستخدام مُحسّن الانحدار العشوائي (SGD) بمعدل تعلم قدره 0.001، ويتم تقييم الأداء باستخدام دالة خسارة الانتروبيا المتقاطعة. يسمح إدخال رأس مصنف مخصص في نموذج المحول البصري المعدل (FTVT) بالتكيف بشكل أفضل مع مهمة التصنيف، مما يعزز قدرة النموذج على تعلم الأنماط المعقدة ويحسن الدقة العامة في تحديد أنواع أورام الدماغ.

Journal: Frontiers in Oncology, Volume: 14
DOI: https://doi.org/10.3389/fonc.2024.1400341
PMID: https://pubmed.ncbi.nlm.nih.gov/39091923
Publication Date: 2024-07-18
Author(s): C. Kishor Kumar Reddy et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

The research investigates the application of Fine-Tuned Vision Transformer models (FTVTs) for brain tumor classification using MRI scans, comparing their performance against established deep learning models such as ResNet-50, MobileNet-V2, and EfficientNet-B0. The dataset comprises 7,023 MRI images categorized into four classes: glioma, meningioma, pituitary, and no tumor. The study highlights that the FTVT-l16 model achieved the highest accuracy of 98.70%, surpassing the accuracies of ResNet-50 (96.5%), EfficientNet-B0 (95.1%), and MobileNet-V2 (94.9%). Other FTVT models also demonstrated strong performance, indicating their efficacy in medical image processing.

In conclusion, the study provides a comprehensive comparative analysis of various deep learning architectures, emphasizing the advantages of FTVT models in achieving high accuracy for brain tumor classification. The research incorporates diverse preprocessing techniques and rigorous training methodologies, which contribute to the models’ performance. Notably, the FTVT models, particularly the “L” variants, consistently outperformed the “B” variants, suggesting their superior capability in capturing fine details. The findings underscore the robustness of FTVT models in medical image analysis, offering significant potential for early detection and classification of brain tumors.

Introduction

The introduction of the research paper discusses the complexities surrounding brain tumors, which can be either malignant or benign, and highlights the challenges in their early diagnosis due to their irregular morphology and heterogeneous locations. The study focuses on classifying glioma, meningioma, and pituitary tumors, which present distinct diagnostic challenges and treatment requirements. The increasing incidence and mortality rates of brain tumors underscore the necessity for improved diagnostic methods, particularly through advanced machine learning (ML) and deep learning (DL) techniques applied to MRI images.

The paper emphasizes the effectiveness of various ML algorithms, such as K-Nearest Neighbors (KNN), Random Forest, and Naive Bayes, in tumor classification, while also acknowledging their limitations, particularly in high-dimensional data scenarios. In contrast, convolutional neural networks (CNNs) and Vision Transformers (ViTs) are highlighted for their superior ability to automatically extract relevant features from unprocessed MRI data, thus enhancing classification accuracy. The proposed study introduces Fine-Tuned Vision Transformer models (FTVTs) with updated classifier heads and optimized hyperparameters, aiming to achieve higher performance metrics (accuracy, F1 score, precision, and recall) in brain tumor classification. The research builds on previous studies, demonstrating the potential of these advanced models in addressing the challenges of brain tumor diagnosis.

Methods

The methods section of the research paper outlines the methodologies employed for brain tumor classification, focusing on deep learning (DL) models and feature transfer via transfer learning (FTVT). It begins with a detailed examination of the architectural principles of each model, followed by an exploration of their implementation for brain tumor classification. A comparative analysis assesses the performance of DL models against various FTVT models, including FTVT-b16, FTVT-b32, FTVT-l16, and FTVT-l32, alongside established models such as ResNet50, MobileNet-V2, and EfficientNet-B0.

The experimental procedure involved several key steps in data preprocessing, starting with the standardization of brain MRI images to a uniform size of 256 × 256 pixels to ensure both resolution and training efficiency. Feature extraction was performed using the aforementioned models, and a separate test set was utilized to evaluate the models’ performance based on metrics such as accuracy, precision, recall, and F1-score. The classification pipeline included converting hexadecimal files to images, applying Gaussian blur, and utilizing ImageDataGenerator for loading and normalizing the dataset. The model was trained with computed class weights to address class imbalance, and its effectiveness was evaluated through a confusion matrix and classification report, ultimately providing probability scores for tumor classification in new MRI scans.

Results

The experiment was conducted using Google’s proprietary Kaggle platform, specifically utilizing the Kaggle Kernel, which functions as a free Jupyter notebook server with integrated GPU capabilities. This setup enabled the use of cloud computing resources, alleviating the limitations of local machine processing. For the experiment, two GPU T4 instances were employed, each equipped with 15 GB of memory and 29 GB of RAM, facilitating the handling of large image datasets.

The coding for the experiment was executed in Python, leveraging the PyTorch library for the development of Deep Learning and FTVT models. Additionally, the Matplotlib library was utilized for the visualization of results, ensuring that the findings could be effectively communicated and analyzed.

Discussion

In this section, the research discusses the dataset and methodologies employed for multi-class brain tumor classification using MRI scans. The dataset comprises 7,023 images from three sources: figshare, SARTAJ, and Br35H, categorized into four classes: glioma, meningioma, pituitary, and no tumor. A Pearson correlation matrix was calculated to analyze the relationships between pixel values across images, providing insights into the similarity of the data. Data preprocessing techniques, including image augmentation (resizing, flipping, rotating, and normalization), were applied to enhance model robustness and generalization, ultimately improving performance on unseen data.

The study employs various deep learning architectures, including Convolutional Neural Networks (CNNs) such as ResNet-50, EfficientNet-B0, and MobileNet-V2, as well as Vision Transformers (ViTs). Each model is fine-tuned for the specific task of classifying brain tumors, utilizing techniques like dropout and batch normalization to prevent overfitting. The models are trained using the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001, and performance is evaluated using the Cross Entropy Loss function. The introduction of a custom classifier head in the fine-tuned Vision Transformer (FTVT) model allows for better adaptation to the classification task, enhancing the model’s ability to learn intricate patterns and improving overall accuracy in identifying brain tumor types.