تقسيم الأسنان على الصور متعددة الأنماط باستخدام نموذج تعديل أي شيء Tooth segmentation on multimodal images using adapted segment anything model

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-96301-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40263461
تاريخ النشر: 2025-04-22
المؤلف: Peijuan Wang وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تناقش ورقة البحث الأهمية المتزايدة لتقسيم الأسنان في الممارسات السنية بسبب زيادة عدد المرضى والتحول الرقمي في مستشفيات الأسنان. يقدم البحث طريقة جديدة لتقسيم الأسنان تُسمى Tooth-ASAM، والتي تعدل نموذج Segment Anything (SAM) لتحسين أداء التقسيم. تستخدم هذه الطريقة مشفر صور قائم على محول متخصص ومفكك أقنعة مصمم خصيصًا لصور الأسنان وتم تقييمها بدقة باستخدام مجموعات بيانات متعددة الوسائط، بما في ذلك صور التصوير المقطعي المحوسب باستخدام شعاع مخروط (CBCT)، وأشعة سينية بانورامية، وصور كاميرا دقيقة. تشير النتائج إلى أن Tooth-ASAM تفوقت على الطرق الحديثة في مقاييس رئيسية مثل معامل دايس، والتقاطع على الاتحاد (IoU)، والمسافة هاوسدورف (HD95)، ومتوسط المسافة السطحية المتماثلة (ASSD)، مما يوضح إمكانياتها للتطبيقات السريرية في تقويم الأسنان، وجراحة زراعة الأسنان، وطب الأسنان الاصطناعي.

في الختام، يبرز البحث الدور الحاسم لتقسيم الأسنان الدقيق في التشخيص المدعوم بالحاسوب وتخطيط العلاج عبر مختلف التخصصات السنية. التحديات التي تطرحها أنماط الأسنان غير المنتظمة والحدود غير الواضحة تتطلب تقنيات تقسيم متقدمة. تظهر الطريقة المعتمدة على SAM المقدمة في هذا البحث وعدًا كبيرًا، كما يتضح من التقييمات الكمية والنوعية. يجب أن تستكشف الأبحاث المستقبلية خوارزميات التعلم شبه المراقب أو غير المراقب لتعزيز توضيح البيانات ومعالجة قيود مجموعات البيانات الحالية. بالإضافة إلى ذلك، سيكون من الضروري دمج التصوير متعدد الوسائط ومعالجة الاعتبارات الأخلاقية مثل خصوصية البيانات والتحيز لتحسين دقة التشخيص ورعاية المرضى في طب الأسنان.

طرق

توضح قسم المنهجية تنفيذ نموذج Segment Anything (SAM) وتكييفه من خلال نهج ضبط دقيق يُعرف باسم ضبط المحول. يتكون SAM من ثلاثة مكونات رئيسية: مشفر صور يقوم بإنشاء تمثيلات للصورة المدخلة، ومشفر مطالب يقوم بتحويل المطالب المحددة من قبل المستخدم (كلاهما نادر وكثيف) إلى فضاء الميزات، ومفكك أقنعة يدمج هذه التمثيلات لإنتاج القناع النهائي للتقسيم. يعزز ضبط المحول أداء النموذج من خلال إضافة وحدات قابلة للتدريب إلى كل كتلة محول، مما يسمح بتدريب فعال محدد للمهام مع الحفاظ على سلامة النموذج الأصلي.

استخدمت الإعدادات التجريبية PyTorch 2.0 على وحدة معالجة الرسوميات NVIDIA GeForce GTX 4090، مع توحيد جميع الصور إلى حجم 1×512×512 وتعديلها إلى نطاق 0-255. تم استخدام مُحسِّن آدم بمعدل تعلم قدره 1e-4 وانخفاض قدره 0.05، عبر 100 دورة مع حجم دفعة قدره 2. شملت مقاييس الأداء معامل دايس، والتقاطع على الاتحاد (IoU)، ومسافة هاوسدورف 95% (HD95)، ومتوسط المسافة السطحية المتماثلة (ASSD)، والتي تقيم بدورها دقة التقسيم ودقة الحدود. أظهرت النتائج أن الطرق المقترحة، وخاصة Tooth-ASAM، حققت أداءً متفوقًا عبر مجموعات بيانات متنوعة، كما يتضح من درجات دايس وIoU الأعلى وقيم HD95 وASSD الأقل.

نتائج

في قسم النتائج من الدراسة، تم مقارنة أداء طريقة Tooth-ASAM المقترحة كميًا ضد خمس طرق تقسيم موجودة: CTA_UNet، SAM، MedSAM2D، WeSAM، وMSA. تم إجراء التقييم باستخدام مجموعات بيانات متنوعة، بما في ذلك NC، Tooth، Vident-lab، وMICCAI-Tooth، مع تحديد أحجام الصور وتوزيعات الرقع المحددة. تم تقييم الأهمية الإحصائية باستخدام اختبار فريدمان، مما أسفر عن قيم p قدرها 0.881، 0.213، 0.399، و0.582 لمجموعات البيانات المعنية، جميعها تتجاوز العتبة 0.05. وهذا يشير إلى عدم وجود اختلافات ذات دلالة إحصائية في الأداء بين النماذج، مما يشير إلى أن فعاليتها متقاربة نسبيًا.

أظهرت النتائج النوعية أن Tooth-ASAM أنتجت حدودًا أكثر دقة في أشكال الأسنان المعقدة داخل صور CBCT مقارنة بالطرق الأخرى، كما هو موضح في الأشكال المقدمة. ومن الجدير بالذكر أنه بينما أظهرت CTA_UNet، SAM، وMedSAM2D عيوبًا ملحوظة، حققت كل من MSA وTooth-ASAM نتائج تقسيم متفوقة، خاصة في المناطق الصعبة من صور الأشعة السينية. أكدت المقارنات البصرية أيضًا أن Tooth-ASAM يمكن أن تحدد بدقة حدود التقسيم، حتى في السيناريوهات ذات التباين المنخفض، كما يتضح من الصور المقسمة المكدسة عبر مجموعات بيانات متنوعة.

مناقشة

تسلط قسم المناقشة الضوء على تقديم وأداء Tooth-ASAM، وهو نموذج تقسيم جديد يعتمد على نموذج Segment Anything (SAM) مصمم لصور الأسنان متعددة الوسائط. تعالج هذه الطريقة بفعالية تعقيدات الهياكل السنية، وخاصة الأنماط غير المنتظمة والحدود غير الواضحة بين جذور الأسنان والعظام الفكية، والتي تعتبر حاسمة للتقسيم السريري الدقيق. من خلال الاستفادة من قدرات التعميم وعدم الحاجة إلى تدريب مسبق لنموذج SAM، يظهر Tooth-ASAM أداءً متفوقًا عبر مجموعات بيانات متنوعة، متفوقًا على الطرق الحالية من حيث دقة التقسيم، خاصة في السيناريوهات الصعبة ذات التباين المنخفض.

على الرغم من نتائجها الواعدة، تواجه Tooth-ASAM قيودًا تتعلق بدقة معدات التصوير، وتنوع زوايا التصوير، والتعقيد الفطري لعلم تشريح الأسنان، مما قد يؤدي إلى تحديات تقسيم مثل التصاق الأسنان وضبابية الحواف. تقترح الدراسة أن تستكشف الأبحاث المستقبلية التعلم شبه المراقب ودمج الشبكات التنافسية التوليدية (GANs) لزيادة بيانات التدريب لتعزيز أداء النموذج. بالإضافة إلى ذلك، فإن تحسين بنية النموذج لتقليل التكاليف الحسابية وضمان تنوع مجموعة البيانات لتخفيف التحيزات أمران أساسيان لتحسين التنفيذ العملي في البيئات السريرية. بشكل عام، تمثل Tooth-ASAM تقدمًا كبيرًا في تقسيم صور الأسنان، مما يوفر رؤى قيمة للتطبيقات السريرية في تقويم الأسنان وجراحة الفم.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-96301-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40263461
Publication Date: 2025-04-22
Author(s): Peijuan Wang et al.
Primary Topic: Dental Radiography and Imaging

Overview

The research paper discusses the increasing importance of tooth segmentation in dental practices due to rising patient numbers and the digital transformation of dental hospitals. The study introduces a novel tooth segmentation method called Tooth-ASAM, which adapts the Segment Anything Model (SAM) for improved segmentation performance. This method employs a specialized adapter-based image encoder and mask decoder tailored for tooth images and was rigorously evaluated using multimodal datasets, including Cone Beam Computed Tomography (CBCT) images, panoramic X-rays, and micro-camera images. The results indicate that Tooth-ASAM outperformed state-of-the-art methods in key metrics such as the Dice coefficient, Intersection over Union (IoU), Hausdorff Distance (HD95), and Average Symmetric Surface Distance (ASSD), demonstrating its potential for clinical applications in orthodontics, oral implant surgery, and prosthodontics.

In conclusion, the study highlights the critical role of precise tooth segmentation in computer-assisted diagnosis and treatment planning across various dental specialties. The challenges posed by irregular dentition patterns and indistinct boundaries necessitate advanced segmentation techniques. The SAM-based approach presented in this research shows significant promise, as evidenced by both quantitative and qualitative assessments. Future research should explore semi-supervised or unsupervised learning algorithms to enhance data annotation and address the limitations of existing datasets. Additionally, integrating multi-modal imaging and addressing ethical considerations such as data privacy and bias will be essential for improving diagnostic accuracy and patient care in dentistry.

Methods

The methodology section outlines the implementation of the Segment Anything Model (SAM) and its adaptation through a fine-tuning approach known as adapter tuning. SAM consists of three primary components: an image encoder that generates embeddings for the input image, a prompt encoder that maps user-defined prompts (both sparse and dense) into the feature space, and a mask decoder that integrates these embeddings to produce the final segmentation mask. Adapter tuning enhances the model’s performance by adding trainable modules to each transformer block, allowing for efficient task-specific training while preserving the integrity of the original model.

The experimental setup utilized PyTorch 2.0 on an NVIDIA GeForce GTX 4090 GPU, with all images standardized to a size of 1×512×512 and scaled to a range of 0-255. The Adam optimizer was employed with a learning rate of 1e-4 and a decay of 0.05, across 100 epochs with a batch size of 2. Performance metrics included the Dice coefficient, Intersection over Union (IoU), 95% Hausdorff Distance (HD95), and Average Symmetric Surface Distance (ASSD), which collectively assess segmentation accuracy and boundary precision. Results indicated that the proposed methods, particularly Tooth-ASAM, achieved superior performance across various datasets, as evidenced by higher Dice and IoU scores and lower HD95 and ASSD values.

Results

In the results section of the study, the performance of the proposed Tooth-ASAM method was quantitatively compared against five existing segmentation methods: CTA_UNet, SAM, MedSAM2D, WeSAM, and MSA. The evaluation was conducted using various datasets, including NC, Tooth, Vident-lab, and MICCAI-Tooth, with specific image sizes and patch distributions outlined. Statistical significance was assessed using the Friedman test, yielding p-values of 0.881, 0.213, 0.399, and 0.582 for the respective datasets, all exceeding the threshold of 0.05. This indicates no statistically significant differences in performance among the models, suggesting their effectiveness is relatively comparable.

Qualitative results demonstrated that Tooth-ASAM produced more accurate boundary delineations in complex tooth morphologies within CBCT images compared to other methods, as illustrated in the provided figures. Notably, while CTA_UNet, SAM, and MedSAM2D exhibited perceptible artifacts, both MSA and Tooth-ASAM achieved superior segmentation results, particularly in challenging areas of X-ray images. The visual comparisons further confirmed that Tooth-ASAM could accurately localize segmentation boundaries, even in low-contrast scenarios, as evidenced by the overlaid segmented images across various datasets.

Discussion

The discussion section highlights the introduction and performance of Tooth-ASAM, a novel segmentation model based on the Segment Anything Model (SAM) tailored for multimodal dental images. This approach effectively addresses the complexities of dental structures, particularly the irregular patterns and indistinct boundaries between tooth roots and alveolar bone, which are crucial for accurate clinical segmentation. By leveraging the generalization and zero-shot capabilities of SAM, Tooth-ASAM demonstrates superior performance across various datasets, outperforming existing methods in terms of segmentation accuracy, especially in challenging low-contrast scenarios.

Despite its promising results, Tooth-ASAM faces limitations related to the resolution of imaging equipment, variability in shooting angles, and the inherent complexity of dental morphology, which can lead to segmentation challenges such as tooth adhesion and edge blurring. The study suggests that future research should explore semi-supervised learning and the integration of generative adversarial networks (GANs) for data augmentation to enhance model performance. Additionally, optimizing the model’s architecture to reduce computational costs and ensuring dataset diversity to mitigate biases are essential for improving practical implementation in clinical settings. Overall, Tooth-ASAM represents a significant advancement in dental image segmentation, offering valuable insights for clinical applications in orthodontics and oral surgery.