شبكات CNN-Transformer مع مشفر بديل ومفكك مزدوج لتجزئة الصور الطبية Alternate encoder and dual decoder CNN-Transformer networks for medical image segmentation

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-93353-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40087352
تاريخ النشر: 2025-03-14
المؤلف: Lin Zhang وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

يقدم هذا القسم نظرة عامة على التحديات والتطورات في تقسيم الصور الطبية، لا سيما في استخراج الآفات بدقة من الصور الطبية. أظهرت التطورات الأخيرة التي تستخدم الشبكات العصبية التلافيفية (CNNs) والمحولات (Transformers) وعدًا، ومع ذلك، غالبًا ما تكافح النماذج الحالية للاستفادة بشكل فعال من الميزات المحلية والعالمية بسبب الخصائص الفريدة لأنسجة الآفات.

لمعالجة هذه القيود، يقدم المؤلفون بنية جديدة تُدعى AD2Former، والتي تعتمد على إطار عمل التشفير-فك التشفير. تتميز هذه البنية بابتكارات رئيسية: مشفر بتعلم متناوب يسهل التفاعل في الوقت الحقيقي بين المعلومات المحلية والعالمية، مما يعزز الإرشاد المتبادل خلال عملية التعلم، وبنية فك تشفير مزدوجة تستخدم فك التشفير المستقل والدمج من خلال وحدة انتباه القناة. تهدف هذه الطريقة إلى تقليل الميزات الزائدة مع زيادة التقاط المناطق المستهدفة والحدود الضبابية. تشير النتائج التجريبية على مجموعات بيانات تقسيم الآفات متعددة الأعضاء والجلد إلى أن AD2Former يتفوق على الطرق الحالية، مما يظهر فعاليته في تحليل الصور الطبية.

الطرق

يستعرض هذا القسم تطور تقنيات تقسيم الصور الطبية، مع تسليط الضوء على الانتقال من طرق التعلم الآلي التقليدية إلى أساليب التعلم العميق المتقدمة، لا سيما الشبكات العصبية التلافيفية (CNNs) والمحولات. في البداية، أظهرت الشبكات العصبية التلافيفية، وخاصة U-Net ونسخها مثل U-Net++ وV-Net، تحسينات كبيرة في دقة وموثوقية التقسيم من خلال تحسينات معمارية متنوعة، مثل الهياكل متعددة الدقة والاتصالات الكثيفة. ومع ذلك، غالبًا ما تجاهلت هذه الطرق المعتمدة على CNN أهمية المعلومات السياقية العالمية.

لمعالجة هذه القيود، بدأ الباحثون في دمج هياكل المحولات، التي تتفوق في التقاط الاعتمادات بعيدة المدى. تشمل التطورات الملحوظة المحول البصري (ViT) وتعديلات مختلفة تجمع بين استخراج الميزات المحلية من CNNs مع قدرات السياق العالمي للمحولات. على سبيل المثال، يدمج إطار عمل AD2Former المقترح طبقات CNN والمحولات في مشفره، مستخدمًا ResNet مدرب مسبقًا لاستخراج الميزات الأولية وتعزيز معالجة المعلومات العالمية من خلال طبقات المحولات. تتضمن عملية فك التشفير مشفرين فرعيين يعالجان خرائط الميزات متعددة المقاييس بشكل مستقل ويدمجانها باستخدام وحدة انتباه القناة، مما ينتج في النهاية توقعات تقسيم عالية الدقة. تهدف هذه الطريقة الهجينة إلى الاستفادة من نقاط القوة في كلا المعمارين، مما يحسن أداء التقسيم في مهام التصوير الطبي.

النتائج

في قسم النتائج، يظهر المؤلفون فعالية نموذج AD2Former المقترح لمهام تقسيم الأعضاء المتعددة، مقارنين إياه بأساليب CNN والمحولات المتطورة. حقق AD2Former معامل تشابه دايس (DSC) قدره 83.18% ومسافة هاوسدورف (HD) قدرها 20.89%، متفوقًا على المنافسين في تقسيم الكبد والبنكرياس والطحال والمعدة. ومن الجدير بالذكر أنه تفوق على الطرق الثانية الأفضل، MISSFormer وHiFormer، بنسبة 4.04% و3.15% في تقسيم البنكرياس والمعدة، على التوالي. أظهرت التحليلات الإحصائية باستخدام اختبار t لعينات مرتبطة قيمة p أقل من 0.05، مما يدل على وجود اختلافات كبيرة في الأداء. كما أظهرت المقارنات البصرية قدرة AD2Former الفائقة على تحديد حدود الأعضاء، لا سيما في الحالات الصعبة ذات الحواف الضبابية.

بالإضافة إلى ذلك، تم تقييم AD2Former على مجموعة بيانات ISIC2018 لآفات الجلد، حيث تفوق مرة أخرى على الطرق الحالية عبر مقاييس متعددة، محققًا DSC قدره 91.28%، وحساسية 92.00%، وخصوصية 98.82%، ودقة 96.49%. أظهر النموذج سرعة استدلال قدرها 36 صورة في الثانية ومعدل استخدام ذاكرة قدره 99% أثناء التدريب. تم تأكيد الأهمية الإحصائية لهذه النتائج بقيمة p أقل من 0.05. كشفت المقارنات البصرية مع نماذج مثل U-Net وSwin-Unet أن AD2Former يدير بفعالية الحدود المعقدة والسيناريوهات ذات التباين المنخفض، مما يظهر قوته وقدراته على التعميم في مهام تقسيم الصور.

المناقشة

في هذا القسم، يناقش المؤلفون شبكة دمج المعلومات متعددة المقاييس المقترحة، AD2Former، التي تدمج هياكل CNN والمحولات لتحسين تقسيم الصور الطبية. تم التحقق من الطريقة باستخدام مجموعتين من البيانات: مجموعة بيانات تقسيم الأعضاء المتعددة Synapse، التي تتكون من 30 مسحًا بالأشعة المقطعية للبطن، ومجموعة بيانات ISIC 2018، التي تحتوي على 2,594 صورة RGB لآفات الجلد. أظهرت النتائج أن AD2Former حقق معامل دايس قدره 83.18% ومسافة هاوسدورف (HD) قدرها 20.89% على مجموعة بيانات Synapse، بينما على مجموعة بيانات ISIC 2018، وصل إلى معامل دايس قدره 91.28%، وخصوصية 98.82%، وحساسية 92.00%، ودقة 96.49%. تشير هذه النتائج إلى أن البنية المقترحة تلتقط بشكل فعال كل من الميزات المحلية والعالمية، مما يعزز أداء التقسيم.

كما يعترف المؤلفون بالقيود، لا سيما في السيناريوهات التي تحتوي على عينات غير متوازنة بشدة، حيث قد يتراجع أداء النموذج. يقترحون العمل المستقبلي لمعالجة هذه القضية من خلال دمج الشبكات التنافسية التوليدية (GANs) مع البنية الحالية. بشكل عام، تؤكد النتائج على إمكانيات AD2Former في التطبيقات السريرية، بينما تسلط الضوء أيضًا على مجالات التحسين المستقبلية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-93353-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40087352
Publication Date: 2025-03-14
Author(s): Lin Zhang et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

The section presents an overview of the challenges and advancements in medical image segmentation, particularly in accurately extracting lesions from medical images. Recent developments utilizing convolutional neural networks (CNNs) and Transformers have shown promise, yet existing models often struggle to effectively leverage both local and global features due to the unique characteristics of lesion tissues.

To address these limitations, the authors introduce a novel architecture named AD2Former, which is based on an encoder-decoder framework. This architecture features two key innovations: an alternating learning encoder that facilitates real-time interaction between local and global information, enhancing mutual guidance during the learning process, and a dual decoder architecture that employs independent decoding and fusion through a channel attention module. This approach aims to minimize redundant features while maximizing the capture of target regions and fuzzy boundaries. Experimental results on multi-organ and skin lesion segmentation datasets indicate that AD2Former outperforms existing methods, showcasing its effectiveness in medical image analysis.

Methods

The section outlines the evolution of medical image segmentation techniques, highlighting the transition from traditional machine learning methods to advanced deep learning approaches, particularly Convolutional Neural Networks (CNNs) and Transformers. Initially, CNNs, especially U-Net and its variants like U-Net++ and V-Net, demonstrated significant improvements in segmentation accuracy and robustness through various architectural enhancements, such as multi-resolution structures and dense connections. However, these CNN-based methods often overlooked the importance of global contextual information.

To address this limitation, researchers have begun integrating Transformer architectures, which excel at capturing long-range dependencies. Notable advancements include the Vision Transformer (ViT) and various adaptations that combine local feature extraction from CNNs with the global context capabilities of Transformers. For instance, the proposed AD2Former framework merges CNN and Transformer layers in its encoder, utilizing a pre-trained ResNet for initial feature extraction and enhancing global information processing through Transformer layers. The decoding process involves two sub-decoders that independently process multi-scale feature maps and fuse them using a channel attention module, ultimately producing high-resolution segmentation predictions. This hybrid approach aims to leverage the strengths of both architectures, improving segmentation performance in medical imaging tasks.

Results

In the results section, the authors demonstrate the effectiveness of their proposed AD2Former model for multi-organ segmentation tasks, comparing it against state-of-the-art CNN and Transformer-based methods. The AD2Former achieved a Dice Similarity Coefficient (DSC) of 83.18% and a Hausdorff Distance (HD) of 20.89%, outperforming competitors in liver, pancreas, spleen, and stomach segmentation. Notably, it surpassed the second-best methods, MISSFormer and HiFormer, by 4.04% and 3.15% in pancreas and stomach segmentation, respectively. Statistical analysis using a paired Student’s t-test yielded a p-value of less than 0.05, indicating significant performance differences. Visual comparisons further illustrated AD2Former’s superior ability to delineate organ boundaries, particularly in challenging cases with blurred edges.

Additionally, the AD2Former was evaluated on the ISIC2018 skin lesion dataset, where it again outperformed existing methods across multiple metrics, achieving a DSC of 91.28%, sensitivity of 92.00%, specificity of 98.82%, and accuracy of 96.49%. The model demonstrated an inference speed of 36 images per second and a memory utilization rate of 99% during training. The statistical significance of these results was confirmed with a p-value of less than 0.05. Visual comparisons with models such as U-Net and Swin-Unet revealed that AD2Former effectively managed complex boundaries and low-contrast scenarios, showcasing its robustness and generalization capabilities in image segmentation tasks.

Discussion

In this section, the authors discuss their proposed multi-scale information fusion network, AD2Former, which integrates CNN and Transformer architectures for improved medical image segmentation. The method was validated using two datasets: the Synapse multi-organ segmentation dataset, comprising 30 abdominal CT scans, and the ISIC 2018 dataset, containing 2,594 RGB images of skin lesions. The results demonstrated that AD2Former achieved a Dice coefficient of 83.18% and a Hausdorff Distance (HD) of 20.89% on the Synapse dataset, while on the ISIC 2018 dataset, it reached a Dice coefficient of 91.28%, specificity of 98.82%, sensitivity of 92.00%, and accuracy of 96.49%. These results indicate that the proposed architecture effectively captures both local and global features, enhancing segmentation performance.

The authors also acknowledge limitations, particularly in scenarios with severely imbalanced samples, where the model’s performance may decline. They propose future work to address this issue by integrating Generative Adversarial Networks (GANs) with the existing architecture. Overall, the findings underscore the potential of AD2Former in clinical applications, while also highlighting areas for further improvement.