تقسيم الأسنان بالأشعة السينية تلقائيًا مع الانتباه المجموع Automatic X-ray teeth segmentation with grouped attention

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-84629-0
PMID: https://pubmed.ncbi.nlm.nih.gov/39747360
تاريخ النشر: 2025-01-02
المؤلف: Wenjin Zhong وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تقدم البحث شبكة الانتباه المجمعة ودمج الطبقات المتقاطعة (GCNet)، وهو نموذج جديد مصمم لتقسيم الأشعة السينية للأسنان، حيث يتناول التحديات مثل حجم مجموعات البيانات الصغيرة، واهتمامات خصوصية المرضى، وتداخل الضوضاء. يتضمن النموذج مكونين رئيسيين: وحدات الانتباه العالمي المجمعة (GGA)، التي تلتقط وتنظم بفعالية ميزات القوام والملامح، ووحدات دمج الطبقات المتقاطعة (CLF)، التي تدمج هذه الميزات مع المعلومات الدلالية العميقة لتعزيز دقة التقسيم. تظهر النتائج التجريبية على مجموعة بيانات الأشعة السينية البانورامية لأسنان الأطفال أن GCNet يتفوق على النماذج الحالية، حيث حقق معامل Dice قدره 0.9338، وحساسية قدرها 0.9426، ونوعية قدرها 0.9821، مع توفير حدود تقسيم أكثر وضوحًا.

تسلط النتائج الضوء على فعالية آلية GGA في استخراج الميزات ذات الصلة من البيانات منخفضة المستوى، مما يخلق خريطة عالمية مجمعة توجه معالجة الميزات عالية المستوى نحو المناطق الحرجة. بالإضافة إلى ذلك، فإن قدرة وحدة CLF على دمج الميزات من عدة مشفرات تسمح بالاستخراج الدقيق للمعلومات الموضعية والقوام. يؤكد الدراسة أن آلية الانتباه الأحادي الاتجاه المجمعة (GUA) مفيدة بشكل خاص لمجموعات البيانات الصغيرة، حيث تخفف من آثار الضوضاء وتقلل من التعقيد الحسابي، مما يجعل GCNet أداة واعدة لتقسيم الأسنان وقابلة للتطبيق المحتمل على مهام التصوير الطبي الأخرى التي تتميز بظروف ضوضائية وحدود غير واضحة. بشكل عام، يتم وضع GCNet لتعزيز كفاءة التشخيص وتخفيف عبء العمل على المتخصصين في الرعاية الصحية في تصوير الأسنان.

طرق

في هذا القسم، يوضح المؤلفون المنهجية المستخدمة في بحثهم، مع التركيز على بنية الشبكة وتفاصيل تنفيذها. تم تنفيذ النموذج باستخدام PyTorch 1.10.0 وCUDA 11.8 على وحدة معالجة الرسوميات RTX 4090 مع 24 جيجابايت من VRAM. استمر نظام التدريب لمدة 200 دورة مع حجم دفعة قدره 2، باستخدام مُحسِّن آدم بمعدل تعلم قدره \(1 \times 10^{-3}\). لضمان سلامة جودة الصورة الأصلية، خضعت صور الأشعة السينية التي كانت بحجم \(1991 \times 1227\) بكسل لسلسلة من خطوات المعالجة المسبقة: تم تمديد العرض إلى 2048 بكسل، وتم قص الارتفاع إلى 1024 بكسل. بعد ذلك، تم استخدام الاستيفاء الثنائي الخطي لتقليل حجم الصور إلى أبعاد \(1024 \times 512\) بكسل.

يهيئ هذا القسم المسرح للنتائج التجريبية والتحليل اللاحق، مشيرًا إلى نهج منهجي لتحضير الصور يهدف إلى تعزيز أداء النموذج المقترح مع الحفاظ على الخصائص الأساسية للصورة.

نقاش

في السنوات الأخيرة، تقدمت الشبكات العصبية التلافيفية (CNNs) بشكل كبير في التصوير الطبي، خاصة في مهام تقسيم الصور. أدى تقديم هياكل مثل U-Net إلى العديد من التعديلات، مثل VNet وAttention U-Net، التي تعزز قدرات استخراج الميزات. تم دمج آليات الانتباه، التي تم تطويرها في البداية لنمذجة التسلسل، في CNNs لتحسين التركيز على ميزات الصورة الحرجة مع التخفيف من تداخل الضوضاء. من الجدير بالذكر أن نماذج مثل SENet وGT U-Net أظهرت فعالية دمج الانتباه مع الطبقات التلافيفية لتعزيز الأداء، خاصة في السياقات الصعبة مثل تقسيم الأشعة السينية للأسنان، حيث يمكن أن تؤدي الضوضاء والبيانات المحدودة إلى الإفراط في التكيف.

يتميز النموذج المقترح بهيكل على شكل U مع خمسة مشفرات وثلاثة مفككات، بما في ذلك مفكك مزدوج المخرجات (DOD) الذي يدمج الميزات عالية المستوى مع خريطة عالمية مجمعة من وحدة الانتباه العالمي المجمعة (GGA). يهدف هذا التصميم إلى التقاط معلومات القوام والملامح الأساسية مع الحفاظ على الكفاءة الحسابية من خلال تقنيات مثل الالتفافات القابلة للفصل حسب العمق والالتفافات الموسعة. تم تصميم GGA ومكوناته، مثل وحدة الانتباه الأحادي الاتجاه المجمعة (GUA) ووحدة دمج الميزات العالمية (GFF)، خصيصًا لتعزيز توقع الحواف ودقة التقسيم من خلال دمج المعلومات عبر دقة وطبقات مختلفة. يتم تعزيز أداء النموذج بشكل أكبر من خلال دالة خسارة تركز على البكسلات التي يصعب توقعها، مما يضمن تحسين نتائج التقسيم، خاصة في السيناريوهات ذات التباين المنخفض.

تشير النتائج التجريبية إلى أن النموذج المقترح يتفوق على الهياكل التقليدية للتقسيم، محققًا مقاييس متفوقة في الدقة، ومتوسط تقاطع الاتحاد (IoU)، وتقييمات E-measure. لقد أثبت دمج إشارات إشراف الحواف أنه حاسم في تعزيز توقع الحدود، مما يؤدي إلى نتائج تقسيم أكثر موثوقية. تكشف المقارنات المرئية أن النموذج يتفوق في تحديد الحواف والملامح، مما يعالج التحديات المرتبطة بالحدود الضبابية في صور الأشعة السينية للأسنان، ويظهر قوة ضد الضوضاء والتغيرات في مجموعة البيانات. بشكل عام، تؤكد النتائج فعالية النموذج في تقسيم الصور الطبية، خاصة في سياق تصوير الأسنان.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-84629-0
PMID: https://pubmed.ncbi.nlm.nih.gov/39747360
Publication Date: 2025-01-02
Author(s): Wenjin Zhong et al.
Primary Topic: Dental Radiography and Imaging

Overview

The research introduces the Grouped Attention and Cross-Layer Fusion Network (GCNet), a novel model designed for the segmentation of dental X-rays, addressing challenges such as small dataset sizes, patient privacy concerns, and noise interference. The model incorporates two primary components: Grouped Global Attention (GGA) modules, which effectively capture and organize texture and contour features, and Cross-Layer Fusion (CLF) modules, which integrate these features with deep semantic information to enhance segmentation accuracy. Experimental results on the Children’s Dental Panoramic Radiographs dataset demonstrate that GCNet outperforms existing models, achieving a Dice coefficient of 0.9338, sensitivity of 0.9426, and specificity of 0.9821, while providing clearer segmentation boundaries.

The findings highlight the efficacy of the GGA mechanism in extracting relevant features from low-level data, creating a Grouped Global Map that directs high-level feature processing towards critical areas. Additionally, the CLF module’s ability to merge features from multiple encoders allows for precise extraction of positional and texture information. The study emphasizes that the Grouped Uni-directional Attention (GUA) mechanism is particularly advantageous for small datasets, as it mitigates the effects of noise and reduces computational complexity, making GCNet a promising tool for dental segmentation and potentially applicable to other medical imaging tasks characterized by noisy conditions and indistinct boundaries. Overall, GCNet is positioned to enhance diagnostic efficiency and alleviate the workload of healthcare professionals in dental imaging.

Methods

In this section, the authors detail the methodology employed in their research, focusing on the network architecture and its implementation specifics. The model was executed using PyTorch 1.10.0 and CUDA 11.8 on an RTX 4090 GPU with 24GB of VRAM. The training regimen spanned 200 epochs with a batch size of 2, utilizing the Adam optimizer with a learning rate of \(1 \times 10^{-3}\). To ensure the integrity of the original image quality, X-ray images initially sized at \(1991 \times 1227\) pixels underwent a series of preprocessing steps: the width was mirror-padded to 2048 pixels, and the height was cropped to 1024 pixels. Following this, bi-linear interpolation was employed to downsample the images to dimensions of \(1024 \times 512\) pixels.

The section sets the stage for subsequent experimental results and analysis, indicating a systematic approach to image preparation that aims to enhance the performance of the proposed model while preserving essential image characteristics.

Discussion

In recent years, convolutional neural networks (CNNs) have significantly advanced medical imaging, particularly in image segmentation tasks. The introduction of architectures like U-Net has led to various adaptations, such as VNet and Attention U-Net, which enhance feature extraction capabilities. Attention mechanisms, initially developed for sequence modeling, have been integrated into CNNs to improve focus on critical image features while mitigating noise interference. Notably, models like SENet and GT U-Net have demonstrated the effectiveness of combining attention with convolutional layers to enhance performance, especially in challenging contexts like dental X-ray segmentation, where noise and limited data can lead to overfitting.

The proposed model features a U-shaped architecture with five encoders and three decoders, including a Dual-output Decoder (DOD) that merges high-level features with a Grouped Global Map from the Grouped Global Attention Module (GGA). This design aims to capture essential texture and contour information while maintaining computational efficiency through techniques like depthwise separable convolutions and dilated convolutions. The GGA and its components, such as the Grouped Uni-directional Attention Module (GUA) and Global Feature Fusion Module (GFF), are specifically tailored to enhance edge prediction and segmentation accuracy by effectively fusing information across different resolutions and layers. The model’s performance is further bolstered by a loss function that emphasizes difficult-to-predict pixels, ensuring improved segmentation outcomes, particularly in low-contrast scenarios.

Experimental results indicate that the proposed model outperforms traditional segmentation architectures, achieving superior metrics in accuracy, Mean Intersection over Union (IoU), and E-measure evaluations. The integration of edge supervision signals has proven crucial in enhancing boundary prediction, leading to more reliable segmentation results. Visual comparisons reveal that the model excels in delineating edges and contours, addressing challenges associated with blurred boundaries in dental X-ray images, and demonstrating robustness against noise and variability in the dataset. Overall, the findings underscore the model’s effectiveness in medical image segmentation, particularly in the context of dental imaging.