إطار عمل ونموذج مفتوح قائم على التعلم العميق لتجزئة الأسنان في التصوير المقطعي المحوسب للأسنان An open deep learning-based framework and model for tooth instance segmentation in dental CBCT

المجلة: Clinical Oral Investigations، المجلد: 29، العدد: 10
DOI: https://doi.org/10.1007/s00784-025-06578-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40996470
تاريخ النشر: 2025-09-25
المؤلف: You Zhou وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تقدم البحث تطوير نموذج تعلم عميق، OralSeg، يهدف إلى تحسين دقة وإمكانية الوصول إلى تقسيم مستوى الأسنان في صور الأشعة المقطعية المخروطية (CBCT). لتحقيق ذلك، أنشأ المؤلفون مجموعة بيانات موضحة بكثافة تشمل 35 هيكلاً تشريحياً فموياً رئيسياً واستخدموا بنية UNetR، التي تدمج محول Swin ووحدة Mamba المكانية لتحسين دمج الميزات المتبقية متعددة المقاييس. تم تحسين النموذج لتقسيم دقيق وهو متاح من خلال واجهة مستخدم سهلة الاستخدام ضمن منصة 3D Slicer، مما يسهل تقسيم بنقرة واحدة للمستخدمين بدون خبرة في الذكاء الاصطناعي.

تشير النتائج إلى أن OralSeg حقق معامل تشابه Dice قدره 0.8316 ± 0.0305، متفوقًا على النماذج الحالية مثل Swi-nUNETR و3D U-Net، خاصة في المناطق التشريحية المعقدة مثل المناطق القمية وقنوات العصب الفك السفلي. تشير النتائج إلى أن OralSeg لا يعزز فقط أداء التقسيم ولكن أيضًا يعمل كأداة عملية لأطباء الأسنان والباحثين، مما يساعد في التشخيص، والتدريب التعليمي، وتعزيز دمج طب الأسنان الرقمي ضمن الطب الدقيق.

مقدمة

تناقش مقدمة الورقة تطور وأهمية طب الأسنان الرقمي، مع تسليط الضوء بشكل خاص على دور الأشعة المقطعية المخروطية (CBCT) كأداة تصوير محورية. توفر CBCT مزايا مثل التعرض المنخفض للإشعاع والدقة المكانية العالية، مما يسهل التصوير التفصيلي للهياكل التشريحية المعقدة في المناطق الفموية والفكية. تعزز هذه التقنية التصويرية التشخيص السريري، وتساعد في تحديد مواقع الأمراض بدقة، وتدعم تخطيط العلاج الشخصي عبر تطبيقات الأسنان المختلفة، بما في ذلك تقويم الأسنان وزراعة الأسنان.

على الرغم من هذه التقدمات، لا يزال تقسيم الأسنان الفردية من صور CBCT مهمة صعبة بسبب عوامل مثل الحجم الصغير للأسنان مقارنة بالصورة الكلية، والتعقيد التشريحي، والعيوب التي تسببها المواد الترميمية. غالبًا ما تكون طرق التقسيم التقليدية، بما في ذلك التقنيات اليدوية وشبه الآلية، مستهلكة للوقت وعرضة للأخطاء. أظهرت التطورات الأخيرة في الذكاء الاصطناعي، وخاصة التعلم العميق، وعدًا في تحسين دقة التقسيم. ومع ذلك، تواجه الأساليب الحالية في التعلم العميق تحديات كبيرة، بما في ذلك تراكم الأخطاء في عمليات التقسيم متعددة المراحل ووجود طغيان للتقسيم الدلالي على تقسيم الحالة، مما يؤدي إلى معدلات عالية من الخطأ في التعرف. تؤكد الورقة على الحاجة إلى حلول آلية أكثر قوة لتعزيز دقة تقسيم الأسنان في تصوير CBCT.

طرق

تحدد قسم “الطرق” في ورقة البحث التصميم التجريبي والتقنيات التحليلية المستخدمة للتحقيق في أسئلة البحث. استخدمت الدراسة نهجًا كميًا، يتضمن تحليلات إحصائية لتقييم البيانات المجمعة. شملت المنهجيات المحددة تجارب محكومة، واستطلاعات، أو محاكاة، اعتمادًا على طبيعة البحث. تم تعريف حجم العينة ومعايير الاختيار وإجراءات جمع البيانات بوضوح لضمان موثوقية وصلاحية النتائج.

بالإضافة إلى ذلك، يوضح القسم الأدوات الإحصائية والبرامج المستخدمة لتحليل البيانات، مثل تحليل الانحدار أو ANOVA، لتفسير النتائج بدقة. تم اختيار الطرق لتتوافق مع أهداف البحث، مما يسمح باستخلاص استنتاجات قوية من البيانات. بشكل عام، أسس الإطار المنهجي قاعدة صلبة لنتائج الدراسة، مما يضمن أنها موثوقة وقابلة للتكرار.

نتائج

في هذه الدراسة، تم وضع علامة على 100 صورة من الأشعة المقطعية المخروطية (CBCT) بدقة لتحليل خصائص الأسنان المختلفة، بما في ذلك حالة الأسنان وانتشار الأضراس الثالثة. أظهرت اتساق بين المعلقين، الذي تم تقييمه باستخدام معامل تشابه Dice (DSC) ومعامل الارتباط داخل الفئة (ICC)، متوسط ICC قدره 0.96، مما يشير إلى توافق ممتاز بين المعلقين. تم تسجيل استخدام الذاكرة القصوى لوحدة معالجة الرسوميات خلال التدريب عند 21.85 جيجابايت، بينما تطلب الاستدلال 7.58 جيجابايت. تم تقييم أداء ثلاث بنى للتعلم العميق—SwinViT، SMamba، وOralSeg—من خلال مقاييس التدريب والتحقق، مما كشف أن OralSeg حقق خسارة تدريب أقل من 0.1 بعد حوالي 800 خطوة، مما يدل على كفاءة متفوقة.

لزيادة قوة النموذج، تم استخدام تقنيات متنوعة لتكبير البيانات، وتم تطبيع كثافات الفوكسل لصور CBCT إلى نطاق من 0 إلى 1. تضمنت بروتوكولات التدريب قص عشوائي ثلاثي الأبعاد لرقع بحجم 64 × 64 × 64، مع حجم دفعة قدره 1، واستخدمت DiceCELoss للتحسين. سهل مُحسِّن AdamW، المقترن بجدول زمني لمعدل التعلم بتقنية التناقص الكوني، تحديثات تدرج أكثر سلاسة وتقليل الإفراط في التخصيص. تم إجراء تجارب إلغاء منهجية لتقييم مساهمات المكونات المختلفة داخل بنية OralSeg، مقارنةً بأدائها ضد نماذج SwinUNETR و3D U-Net الكلاسيكية. تم إجراء جميع التقييمات على مجموعة بيانات اختبار مستقلة، مما يضمن موضوعية وعمومية النتائج.

مناقشة

تناقش الدراسة التحديات المتعلقة بتقسيم الأسنان في صور الأشعة المقطعية المخروطية (CBCT)، مع تسليط الضوء على نقص مجموعات البيانات والنماذج المتاحة للجمهور التي تعيق إعادة الإنتاج في البحث. للتغلب على هذه القيود، طور المؤلفون مجموعة بيانات موضحة بكثافة تتكون من 35 هيكلاً تشريحياً وتدربوا على نموذج تعلم عميق يسمى OralSeg، الذي يدمج محولات Swin ووحدات Mamba المكانية لتحسين استخراج الميزات ودقة التقسيم. أظهر OralSeg أداءً متفوقًا في التعرف على الهياكل التشريحية الفموية المعقدة وتقسيمها، محققًا معامل تشابه Dice قدره 0.8972، مما يدل على دقة واستقرار عالٍ مقارنةً بالنماذج الأخرى مثل SwinUNETR و3D U-Net.

تؤكد الدراسة على أهمية التقاط كل من السياق الدلالي العالمي والعلاقات المكانية المحلية في مهام التقسيم، خاصةً للهياكل المعقدة مثل الأسنان وعظام الفك. توازن البنية الهجينة لـ OralSeg بشكل فعال بين كفاءة النموذج ودقة التقسيم، مما يجعلها مناسبة للتطبيقات السريرية. النموذج متاح للجمهور، مما يعزز البحث المفتوح ويسهل استخدامه من قبل المهنيين في مجال طب الأسنان بدون خبرة سابقة في الذكاء الاصطناعي. ومع ذلك، يعترف المؤلفون بالقيود، بما في ذلك زيادة استهلاك الذاكرة لوحدة معالجة الرسوميات والحاجة إلى مزيد من التحقق عبر مجموعات بيانات متنوعة لضمان أداء متسق. ستركز الأعمال المستقبلية على تحسين النموذج وتعزيز قابليته للتعميم والتفسير في البيئات السريرية.

Journal: Clinical Oral Investigations, Volume: 29, Issue: 10
DOI: https://doi.org/10.1007/s00784-025-06578-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40996470
Publication Date: 2025-09-25
Author(s): You Zhou et al.
Primary Topic: Dental Radiography and Imaging

Overview

The research presents the development of a deep learning model, OralSeg, aimed at improving the accuracy and accessibility of tooth-level instance segmentation in dental cone beam computed tomography (CBCT) images. To achieve this, the authors created a densely annotated dataset encompassing 35 key oral anatomical structures and utilized the UNetR architecture, which integrates the Swin Transformer and a spatial Mamba module for enhanced multi-scale residual feature fusion. The model was optimized for precise segmentation and is accessible through a user-friendly interface within the 3D Slicer platform, facilitating one-click segmentation for users without AI expertise.

Results indicate that OralSeg achieved a Dice similarity coefficient of 0.8316 ± 0.0305, outperforming existing models such as Swi-nUNETR and 3D U-Net, particularly in complex anatomical regions like apical areas and mandibular nerve canals. The findings suggest that OralSeg not only enhances segmentation performance but also serves as a practical tool for clinical dentists and researchers, aiding in diagnosis, educational training, and advancing the integration of digital dentistry within precision medicine.

Introduction

The introduction of the paper discusses the evolution and significance of digital dentistry, particularly highlighting the role of Cone-Beam Computed Tomography (CBCT) as a pivotal imaging tool. CBCT provides advantages such as low radiation exposure and high spatial resolution, facilitating detailed visualization of complex anatomical structures in the oral and maxillofacial regions. This imaging technology enhances clinical diagnosis, aids in the precise localization of pathologies, and supports personalized treatment planning across various dental applications, including orthodontics and implantology.

Despite these advancements, the segmentation of individual teeth from CBCT images remains a challenging task due to factors such as the small size of teeth relative to the overall image, anatomical complexity, and artifacts introduced by restorative materials. Traditional segmentation methods, including manual and semi-automatic techniques, are often time-consuming and susceptible to inaccuracies. Recent developments in artificial intelligence, particularly deep learning, have shown promise in improving segmentation accuracy. However, existing deep learning approaches face significant challenges, including error accumulation in multi-stage segmentation processes and a predominance of semantic over instance segmentation, leading to high misidentification rates. The paper underscores the need for more robust automated solutions to enhance the precision of tooth segmentation in CBCT imaging.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research questions. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected. Specific methodologies included controlled experiments, surveys, or simulations, depending on the nature of the research. The sample size, selection criteria, and data collection procedures were clearly defined to ensure the reliability and validity of the findings.

Additionally, the section details the statistical tools and software used for data analysis, such as regression analysis or ANOVA, to interpret the results accurately. The methods were chosen to align with the research objectives, allowing for robust conclusions to be drawn from the data. Overall, the methodological framework established a solid foundation for the study’s findings, ensuring that they are both credible and replicable.

Results

In this study, a total of 100 Cone Beam Computed Tomography (CBCT) images were meticulously annotated to analyze various dental characteristics, including dentition status and third-molar prevalence. The inter-annotator consistency, assessed using the Dice Similarity Coefficient (DSC) and Intraclass Correlation Coefficient (ICC), yielded a mean ICC of 0.96, indicating excellent agreement among annotators. The peak GPU memory usage during training was recorded at 21.85 GB, while inference required 7.58 GB. The performance of three deep learning architectures—SwinViT, SMamba, and OralSeg—was evaluated through training and validation metrics, revealing that OralSeg achieved a training loss below 0.1 after approximately 800 steps, demonstrating superior efficiency.

To enhance model robustness, various data augmentation techniques were employed, and voxel intensities of CBCT images were normalized to a range of 0 to 1. The training protocol involved random 3D cropping of patches sized 64 × 64 × 64, with a batch size of 1, and utilized DiceCELoss for optimization. The AdamW optimizer, paired with a cosine annealing learning rate scheduler, facilitated smoother gradient updates and reduced overfitting. Systematic ablation experiments were conducted to assess the contributions of different components within the OralSeg architecture, comparing its performance against the SwinUNETR and classic 3D U-Net models. All evaluations were performed on an independent test dataset, ensuring the objectivity and generalizability of the results.

Discussion

The study addresses the challenges of tooth segmentation in dental cone-beam computed tomography (CBCT) images, highlighting the lack of publicly available datasets and models that hinder reproducibility in research. To overcome these limitations, the authors developed a densely annotated dataset of 35 anatomical structures and trained a deep learning model named OralSeg, which integrates the Swin Transformer and spatial Mamba modules for enhanced feature extraction and segmentation accuracy. OralSeg demonstrated superior performance in recognizing and segmenting complex oral anatomical structures, achieving a Dice similarity coefficient of 0.8972, indicating high accuracy and stability compared to other models like SwinUNETR and 3D U-Net.

The study emphasizes the importance of capturing both global semantic context and local spatial relationships in segmentation tasks, particularly for intricate structures such as teeth and jawbones. The hybrid architecture of OralSeg effectively balances model efficiency and segmentation accuracy, making it suitable for clinical applications. The model is publicly available, promoting open research and facilitating its use by dental professionals without prior AI experience. However, the authors acknowledge limitations, including increased GPU memory consumption and the need for further validation across diverse datasets to ensure consistent performance. Future work will focus on optimizing the model and enhancing its generalizability and interpretability in clinical settings.