بنية المحول المشترك في تصنيف التصوير بالرنين المغناطيسي ثلاثي الأبعاد للدماغ: تطبيقها في تصنيف مرض الزهايمر Joint transformer architecture in brain 3D MRI classification: its application in Alzheimer’s disease classification

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-59578-3
PMID: https://pubmed.ncbi.nlm.nih.gov/38637671
تاريخ النشر: 2024-04-18
المؤلف: Sait Alp وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

تستقصي هذه الدراسة تطبيق محول الرؤية (ViT) في معالجة صور الرنين المغناطيسي (MRIs) لتشخيص مرض الزهايمر (AD)، وهو اضطراب تنكسي عصبي يؤثر بشكل أساسي على كبار السن. تستخدم الأبحاث ViT لاستخراج الميزات من MRIs، وتحويلها إلى تسلسل لنمذجة الاعتماديات وتصنيف الميزات من خلال محول السلاسل الزمنية. تم تقييم النموذج باستخدام MRIs الموزونة T1 من قاعدة بيانات مبادرة تصوير مرض الزهايمر (ADNI)، مع استخدام تقسيم عشوائي للتدريب والاختبار، مما أسفر عن دقة تشخيص عالية (99.048% للتصنيف الثنائي و99.014% للتصنيف متعدد الفئات) مقارنةً بهياكل التعلم العميق الأخرى.

تشير النتائج إلى أن النماذج المعتمدة على ViT (ViT-TST وViT-Bi-LSTM) تتفوق على نماذج CNN التقليدية (CNN-TST وCNN-Bi-LSTM) في التقاط الاعتماديات طويلة الأجل والسياق العالمي من خلال آلية الانتباه الذاتي، مما يعزز أداء التصنيف ويقلل من الإفراط في التكيف. بالإضافة إلى ذلك، تسلط الدراسة الضوء على فعالية التعلم المنقول مع ViT في إدارة مجموعات بيانات MRIs ثلاثية الأبعاد الكبيرة من خلال تحويلها إلى شرائح ثنائية الأبعاد، مما يسمح باستخدام النماذج المدربة مسبقًا ويقلل من الحاجة إلى مجموعات بيانات واسعة. لا يساهم هذا النهج فقط في تحسين استخراج الميزات بل يسهل أيضًا تصنيف السلاسل الزمنية، مما يوفر رؤى هامة حول الحالات الطبية الأساسية المرتبطة بـ AD.

طرق

في هذه الدراسة، استخدمنا محول الرؤية (ViT) لاستخراج الخصائص من شرائح MRI الموزونة T1، مستخدمين نموذج شبكة عصبية محول لتصنيف الميزات المتسلسلة مع الحفاظ على الترابط بين الشرائح. يتم تفصيل الهياكل المحددة لكل من شبكة العصبية المحولة وViT لتصنيف الميزات المتسلسلة في الأقسام التكميلية 1 و1.1.

بالإضافة إلى ذلك، تدعم المنهجية نظرة شاملة على مجموعة بيانات مبادرة تصوير مرض الزهايمر (ADNI) والخطوات الإجرائية للطريقة المقترحة، والتي تم توضيحها في الأقسام 2.1-2.4. يتم تمثيل الهيكل العام للطريقة المقترحة بصريًا في الشكل 1.

نتائج

تم اشتقاق نتائج الدراسة من تحليل من خطوتين يتضمن استخراج الميزات ومهام نمذجة التسلسل باستخدام بيانات MRI من ADNI1: Complete 3Yr 3T وADNI1: Complete 1Yr 1.5T. تم تقييم أداء التصنيف من خلال مقارنة مجموعات نماذج مختلفة، تحديدًا الشبكات العصبية التلافيفية (CNN) مع ذاكرة طويلة وقصيرة المدى ثنائية الاتجاه (Bi-LSTM)، وCNN مع محول، وViT مع Bi-LSTM.

شملت مقاييس الأداء التي تم تقييمها دقة النموذج، والدقة، وF-score، والاسترجاع. تم تقديم نتائج مفصلة لمجموعة بيانات ADNI1: Complete 1Yr 1.5T في القسم التكميلية 2.1، مما يبرز الفعالية المقارنة لهياكل النماذج المختلفة في سياق مهام التصنيف.

مناقشة

في هذه الدراسة، بحث المؤلفون في تطبيق محول الرؤية مع هيكل محول متسلسل لتصنيف صور الرنين المغناطيسي في مرض الزهايمر (AD) وضعف الإدراك المعتدل (MCI). باستخدام بيانات من مبادرة تصوير مرض الزهايمر (ADNI)، قاموا بتنفيذ مهام تصنيف ثنائية (تحكم طبيعي مقابل AD) ومتعددة الفئات (تحكم طبيعي، MCI، وAD). تفوق نموذج ViT-TST المقترح على الهياكل التقليدية، بما في ذلك CNNs وBi-LSTMs، محققًا أعلى درجات الدقة والدقة، خاصة في تقليل الإيجابيات الكاذبة، وهو أمر حاسم في التشخيص الطبي. تم التأكيد على قدرة النموذج على التقاط الاعتماديات طويلة الأجل والأنماط الزمانية المكانية في تسلسلات MRI، مما يظهر قوته في تحديد مرضى MCI وتمييزهم عن الأفراد الأصحاء.

تشير النتائج إلى أن هيكل ViT-TST مفيد بشكل خاص للتطبيقات الطبية بسبب توازنه بين الحساسية والنوعية، مما يجعله خيارًا مناسبًا للكشف المبكر عن AD. تسلط الدراسة الضوء على أهمية التعلم المنقول، الذي يسمح للنموذج بالاستفادة من مجموعات بيانات كبيرة لتحسين الأداء على مجموعات بيانات أصغر، قد تكون غير متوازنة. من خلال تحويل بيانات MRI ثلاثية الأبعاد إلى شرائح ثنائية الأبعاد واستخدام نهج تصنيف السلاسل الزمنية، عالج المؤلفون قيود الطرق التقليدية المعتمدة على الشرائح، مما يضمن الحفاظ على الاعتماديات بين الشرائح. بشكل عام، تشير النتائج إلى أن نموذج ViT-TST هو أداة واعدة لتعزيز دقة تصنيف AD، مما يسهل التدخلات الطبية في الوقت المناسب.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-59578-3
PMID: https://pubmed.ncbi.nlm.nih.gov/38637671
Publication Date: 2024-04-18
Author(s): Sait Alp et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

This study investigates the application of Vision Transformer (ViT) in processing Magnetic Resonance Images (MRIs) for the diagnosis of Alzheimer’s disease (AD), a neurodegenerative disorder primarily affecting the elderly. The research utilizes ViT to extract features from MRIs, transforming them into a sequence for modeling interdependencies and classifying features through a time series transformer. The model was evaluated using T1-weighted MRIs from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, employing a random split for training and testing, which yielded high diagnostic accuracy (99.048% for binary and 99.014% for multiclass classification) compared to other deep learning architectures.

The findings indicate that the ViT-based models (ViT-TST and ViT-Bi-LSTM) outperform traditional CNN models (CNN-TST and CNN-Bi-LSTM) in capturing long-term dependencies and global context through a self-attention mechanism, thereby enhancing classification performance and reducing overfitting. Additionally, the study highlights the efficacy of transfer learning with ViT in managing large 3D MRI datasets by converting them into 2D slices, which allows for the utilization of pre-trained models and minimizes the requirement for extensive datasets. This approach not only advances feature extraction but also facilitates time series classification, offering significant insights into the underlying medical conditions associated with AD.

Methods

In this study, we employed a Vision Transformer (ViT) to extract attributes from T1-weighted MRI slices, utilizing a transformer neural network model for the classification of sequential features while preserving the inter-association among the slices. The specific architectures of both the transformer neural network and the ViT for the sequential feature classification are detailed in Supplementary Sections 1 and 1.1.

Additionally, the methodology is supported by a comprehensive overview of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset and the procedural steps of the proposed method, which are outlined in Sections 2.1-2.4. The overall architecture of the proposed method is visually represented in Figure 1.

Results

The results of the study were derived from a two-step analysis involving feature extraction and sequence modeling tasks using MRI data from ADNI1: Complete 3Yr 3T and ADNI1: Complete 1Yr 1.5T. The classification performance was evaluated by comparing various model combinations, specifically Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM), CNN with Transformer, and Vision Transformer (ViT) with Bi-LSTM.

The performance metrics assessed included model accuracy, precision, F-score, and recall. Detailed results for the ADNI1: Complete 1Yr 1.5T dataset are provided in Supplement Section 2.1, highlighting the comparative effectiveness of the different model architectures in the context of the classification tasks.

Discussion

In this study, the authors investigated the application of a Vision Transformer with a sequential transformer architecture for the classification of MRI scans in Alzheimer’s disease (AD) and mild cognitive impairment (MCI). Utilizing data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), they performed both binary (normal control vs. AD) and multiclass (normal control, MCI, and AD) classification tasks. The proposed ViT-TST model outperformed traditional architectures, including CNNs and Bi-LSTMs, achieving the highest accuracy and precision scores, particularly in minimizing false negatives, which is critical in medical diagnostics. The model’s ability to effectively capture long-term dependencies and spatial-temporal patterns in MRI sequences was emphasized, showcasing its robustness in identifying MCI patients and differentiating them from healthy individuals.

The findings indicate that the ViT-TST architecture is particularly advantageous for medical applications due to its balanced sensitivity and specificity, making it a suitable choice for early detection of AD. The study highlights the importance of transfer learning, which allows the model to leverage large datasets to improve performance on smaller, potentially unbalanced datasets. By converting 3D MRI data into 2D slices and employing a time series classification approach, the authors addressed the limitations of conventional slice-based methods, ensuring that inter-slice dependencies were preserved. Overall, the results suggest that the ViT-TST model is a promising tool for enhancing the accuracy of AD classification, thereby facilitating timely medical interventions.