محول هجين قابل للتفسير لتصنيف متعدد لأمراض الرئة باستخدام أشعة الصدر Explainable hybrid transformer for multi-classification of lung disease using chest X-rays

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-90607-x
PMID: https://pubmed.ncbi.nlm.nih.gov/39994381
تاريخ النشر: 2025-02-24
المؤلف: Xiaoyang Fu وآخرون
الموضوع الرئيسي: تشخيص COVID-19 باستخدام الذكاء الاصطناعي

نظرة عامة

تتناول هذه الورقة البحثية القضية الحرجة لمرض الرئة، وهو أحد الأسباب الرئيسية للوفاة على مستوى العالم، من خلال اقتراح نموذج جديد للتعلم العميق يسمى LungMaxViT. يدمج هذا النموذج شبكة عصبية تلافيفية (CNN) مع بنية محول متعددة المحاور لتعزيز التعرف على الميزات وتحسين تصنيف أمراض الرئة من صور الأشعة السينية للصدر. تقيم الدراسة أداء LungMaxViT مقابل أربعة نماذج معروفة مسبقًا—ResNet50، MobileNetV2، ViT، وMaxViT—باستخدام مجموعتين بيانات عامتين: Chest X-ray14 وCOVID-19. تشير النتائج إلى أن LungMaxViT يحقق أداءً متفوقًا، بدقة تصنيف تبلغ 96.8%، ودرجات AUC تبلغ 98.3%، ودرجات F1 تبلغ 96.7% لمجموعة بيانات COVID-19، ودرجات AUC تبلغ 93.2% ودرجات F1 تبلغ 70.7% لمجموعة بيانات Chest X-ray14.

تسلط الورقة الضوء على التحديات التي يواجهها الأطباء في تشخيص أمراض الرئة بسبب الطبيعة الغامضة للآفات في صور الأشعة السينية ونقص الأطباء المتخصصين، خاصة في المناطق الريفية. لا يظهر نموذج LungMaxViT المقترح تحسينًا في الدقة والموثوقية في اكتشاف آفات الرئة المتعددة، بما في ذلك COVID-19، فحسب، بل يوفر أيضًا نتائج قابلة للتفسير من خلال تصورات Grad-CAM. تشير النتائج إلى أن هذا النموذج الهجين يمكن أن يعزز بشكل كبير من التعرف المبكر على أمراض الرئة، مما يساهم في تحسين النتائج السريرية. تشمل اتجاهات البحث المستقبلية استكشاف نماذج الأساس الطبية المعدلة واقتربات التعلم الذاتي متعددة الوسائط لتعزيز تصنيف مجموعات بيانات التصوير الطبي المختلفة.

الطرق

الإطار المقترح، LungMaxViT، يدمج الشبكات العصبية التلافيفية العميقة (CNNs) مع آليات الانتباه لتعزيز التنبؤ بأمراض الرئة من صور الأشعة السينية للصدر. تعالج هذه البنية الهجينة قيود الشبكات العصبية التلافيفية التقليدية، التي تحلل بشكل أساسي العلاقات المكانية بين البكسلات المجاورة دون النظر في علاقاتها الاتجاهية. من خلال الاستفادة من المحولات، يلتقط LungMaxViT بفعالية كل من العلاقات المكانية للبكسلات والمسافات بينها، مما يحسن من قدرات التعرف البصري. تشمل المنهجية عدة خطوات رئيسية: إعداد مجموعة البيانات، معالجة البيانات (التعزيز، تغيير الحجم، التطبيع)، اختيار النموذج من خلال التحليل المقارن للنماذج المدربة مسبقًا (ResNet50، MobileNetv2، ViT، وMaxViT)، وتطوير نموذج هجين يجمع بين مكونات CNN والمحولات متعددة المحاور. يتم تدريب الإطار مسبقًا على ImageNet1K ويتم ضبطه على مجموعات بيانات الأشعة السينية للصدر، مما يؤدي إلى طبقة تصنيف للتنبؤات متعددة الفئات. بالإضافة إلى ذلك، يتم استخدام Grad-CAM لتصور قابل للتفسير لموقع المرض.

تم إجراء التجارب على خادم Nvidia A100 عالي الأداء، الذي يتميز بمعالج Intel Xeon Gold 5218 وموارد GPU متعددة. تم تقييم إطار LungMaxViT باستخدام مقاييس مختلفة، بما في ذلك الدقة (ACC)، الخصوصية، الحساسية/التذكر، الدقة، ودرجة F1، المحسوبة باستخدام مصفوفات الارتباك لكل نموذج. توفر هذه المقاييس تقييمًا شاملاً لأداء النموذج، حيث تعكس الدقة نسبة التنبؤات الصحيحة، وتشير الدقة إلى احتمال التصنيفات الإيجابية، وتقيس الخصوصية التعرف الصحيح على الحالات السلبية، ويقيم التذكر التعرف على الفئات الإيجابية. تعمل درجة F1 على تحقيق توازن بين الدقة والتذكر، مما يضمن تقييمًا قويًا لفعالية النموذج.

النتائج

في هذه الدراسة، أجرينا تجربة شاملة باستخدام مجموعتين من البيانات، ChestX-ray14 وCOVID-QU-Ex، عبر أربعة نماذج كلاسيكية ونموذجنا المحسن. خضع كل نموذج لتعلم نقل باستخدام أوزان مدربة مسبقًا من مستودع PyTorch الرسمي. تم تقييم أداء هذه النماذج من خلال مقاييس التعلم العميق الرئيسية لتقييم فعاليتها في اكتشاف آفات الرئة.

لزيادة قابلية تفسير نتائجنا، استخدمنا خرائط حرارية تبرز مناطق آفات الرئة التي حددتها النماذج. لم تُظهر هذه الطريقة أداء النماذج فحسب، بل قدمت أيضًا رؤى حول عملية اتخاذ القرار للنموذج المحسن، مما يساهم في مجال الذكاء الاصطناعي القابل للتفسير في التصوير الطبي.

المناقشة

في هذا القسم، يناقش المؤلفون مجموعات البيانات المستخدمة لتقييم إطار التعلم العميق المقترح، وبشكل خاص مجموعتي بيانات Chest X-ray14 وCOVID-QU-Ex. تصنف مجموعة بيانات COVID-QU-Ex الحالات إلى ثلاث فئات: حالات COVID-19 إيجابية، عدوى غير COVID-19، وحالات طبيعية. توسعت مجموعة بيانات Chest X-ray14 من ثمانية إلى خمسة عشر فئة مرضية، بما في ذلك أمراض الرئة المختلفة، وتتميز بصور أشعة سينية رقمية موحدة بدقة 1024 × 1024 بكسل. يؤكد المؤلفون على أهمية تحليل البيانات لتحديد التحيزات والقيود داخل هذه المجموعات، والتي يمكن أن تؤثر على أداء النموذج. لتعزيز تنوع مجموعة البيانات وموثوقيتها، يقومون بتنفيذ تقنيات تعزيز البيانات مثل الضباب الغاوسي وتعديل تباين الهيستوغرام المحدود (CLAHE)، بالإضافة إلى طرق الانعكاس وإزالة الضوضاء لتحسين جودة الصورة واستخراج الميزات.

يقدم المؤلفون نموذجهم، LungMaxVit، الذي يدمج الشبكات العصبية التلافيفية (CNNs) مع آليات الانتباه لتحسين استخراج الميزات لتصنيف أمراض الرئة. يبرزون أهمية تقنيات المعالجة المسبقة، بما في ذلك تغيير حجم الصور إلى 224 × 224 بكسل وتطبيق استراتيجيات تعزيز مختلفة لمعالجة عدم توازن البيانات. تشمل بنية النموذج كتلة حالة أولية وكتلة ضغط وتحفيز (SE) لتعزيز استخراج الميزات، تليها كتلة MaxViT التي تستخدم الانتباه متعدد المحاور لتحسين تمثيل الميزات المكانية. يتم تقييم أداء LungMaxVit باستخدام مقاييس مثل الدقة، منطقة تحت المنحنى (AUC)، ودرجة F1، مما يظهر نتائج متفوقة مقارنة بالنماذج المدربة مسبقًا الأخرى، خاصة في اكتشاف COVID-19 وأمراض الرئة الأخرى، على الرغم من التحديات التي تطرحها عدم توازن الفئات في مجموعة البيانات.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-90607-x
PMID: https://pubmed.ncbi.nlm.nih.gov/39994381
Publication Date: 2025-02-24
Author(s): Xiaoyang Fu et al.
Primary Topic: COVID-19 diagnosis using AI

Overview

This research paper addresses the critical issue of lung disease, a leading cause of mortality globally, by proposing a novel deep learning model named LungMaxViT. This model integrates a convolutional neural network (CNN) with a multi-axis transformer architecture to enhance feature recognition and improve the classification of lung diseases from Chest X-ray images. The study evaluates the performance of LungMaxViT against four established pre-training models—ResNet50, MobileNetV2, ViT, and MaxViT—using two public datasets: Chest X-ray14 and COVID-19. The results indicate that LungMaxViT achieves superior performance, with a classification accuracy of 96.8%, AUC scores of 98.3%, and F1 scores of 96.7% for the COVID-19 dataset, and AUC scores of 93.2% and F1 scores of 70.7% for the Chest X-ray14 dataset.

The paper highlights the challenges faced by clinicians in diagnosing lung diseases due to the ambiguous nature of lesions in X-ray images and the shortage of skilled radiologists, particularly in rural areas. The proposed LungMaxViT model not only demonstrates improved accuracy and robustness in detecting multiple lung lesions, including COVID-19, but also provides explainable results through Grad-CAM visualizations. The findings suggest that this hybrid model can significantly enhance the early identification of lung diseases, thereby contributing to better clinical outcomes. Future research directions include exploring fine-tuned medical foundation models and multimodal self-supervised approaches to further advance the classification of various medical imaging datasets.

Methods

The proposed framework, LungMaxViT, integrates deep convolutional neural networks (CNNs) with attention mechanisms to enhance the prediction of lung diseases from Chest X-ray images. This hybrid architecture addresses the limitations of traditional CNNs, which primarily analyze spatial correlations among neighboring pixels without considering their directional relationships. By leveraging transformers, LungMaxViT effectively captures both spatial pixel correlations and their distances, thereby improving visual recognition capabilities. The methodology includes several key steps: dataset preparation, data processing (augmentation, resizing, normalization), model selection through comparative analysis of pre-trained models (ResNet50, MobileNetv2, ViT, and MaxViT), and the development of a hybrid model that combines CNN and multi-axis transformer components. The framework is pre-trained on ImageNet1K and fine-tuned on the Chest X-ray datasets, culminating in a classification layer for multi-class predictions. Additionally, Grad-CAM is utilized for explainable visualization of disease localization.

Experiments were conducted on a high-performance Nvidia A100 server, featuring an Intel Xeon Gold 5218 CPU and multiple GPUs. The evaluation of the LungMaxViT framework employed various metrics, including Accuracy (ACC), Specificity, Sensitivity/Recall, Precision, and F1-score, calculated using confusion matrices for each model. These metrics provide a comprehensive assessment of model performance, with accuracy reflecting the proportion of correct predictions, precision indicating the likelihood of positive classifications, specificity measuring the correct identification of negative cases, and recall assessing the identification of positive classes. The F1-score serves to balance precision and recall, ensuring a robust evaluation of the model’s effectiveness.

Results

In this study, we conducted a comprehensive experiment utilizing two datasets, ChestX-ray14 and COVID-QU-Ex, across four classic models and our improved model. Each model underwent transfer learning using pre-trained weights from the official PyTorch repository. The performance of these models was evaluated through key deep learning metrics to assess their effectiveness in detecting lung lesions.

To enhance the interpretability of our results, we employed heat maps that highlighted the areas of lung lesions identified by the models. This approach not only demonstrated the models’ performance but also provided insights into the decision-making process of the improved model, thereby contributing to the field of explainable AI in medical imaging.

Discussion

In this section, the authors discuss the datasets utilized for evaluating their proposed deep learning framework, specifically the Chest X-ray14 and COVID-QU-Ex datasets. The COVID-QU-Ex dataset categorizes instances into three classes: COVID-19 positive cases, Non-COVID-19 infections, and Normal instances. The Chest X-ray14 dataset has expanded from eight to fifteen pathology categories, including various lung diseases, and features standardized digital X-ray images at a resolution of 1024 × 1024 pixels. The authors emphasize the importance of data analysis to identify biases and limitations within these datasets, which can affect model performance. To enhance the dataset’s variability and robustness, they implement data augmentation techniques such as Gaussian blur and Contrast Limited Adaptive Histogram Equalization (CLAHE), alongside flipping and denoising methods to improve image quality and feature extraction.

The authors introduce their model, LungMaxVit, which integrates convolutional neural networks (CNNs) with attention mechanisms to optimize feature extraction for lung disease classification. They highlight the significance of preprocessing techniques, including resizing images to 224 × 224 pixels and applying various augmentation strategies to address data imbalance. The model’s architecture includes an initial state block and a squeeze-and-excitation (SE) block to enhance feature extraction, followed by a MaxViT block that employs multi-axis attention for improved spatial feature representation. The performance of LungMaxVit is evaluated using metrics such as accuracy, Area Under the Curve (AUC), and F1-score, demonstrating superior results compared to other pre-trained models, particularly in the detection of COVID-19 and other lung diseases, despite challenges posed by class imbalances in the dataset.