نماذج التعلم العميق متعددة المناطق لتحديد الترميمات السنية والأطراف الصناعية في الأشعة السينية البانورامية Multi-Regional deep learning models for identifying dental restorations and prosthesis in panoramic radiographs

المجلة: BMC Oral Health، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12903-025-07138-0
PMID: https://pubmed.ncbi.nlm.nih.gov/41194095
تاريخ النشر: 2025-11-05
المؤلف: Zohaib Khurshid وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تقدم هذه الدراسة نهجًا جديدًا للتعلم العميق للكشف التلقائي عن مختلف الأطراف الصناعية والعلاجات السنية، باستخدام مجموعة بيانات متنوعة تضم 2,235 صورة شعاعية بانورامية من ثلاث كليات طب الأسنان. تضمنت المنهجية سير عمل شامل يشمل إعداد مجموعة البيانات، اختيار النموذج، والتدريب والتقييم الدقيق لثلاثة هياكل متقدمة للتعلم العميق: You Only Look Once (YOLO)v11، Faster Region-based Convolutional Neural Network (Faster R-CNN)، وVision Transformer (ViT). تم تقييم النماذج من خلال بروتوكول موحد، مما يضمن التحقق القوي من أدائها.

تشير النتائج إلى أن نموذج ViT تفوق على الآخرين، محققًا دقة كشف تبلغ 94.15%، ودقة تبلغ 94.64%، ودرجة F1 تبلغ 93.52%. في المقابل، حقق Faster R-CNN أعلى دقة متوسطة (mAP) تبلغ 82.0% عند عتبة تقاطع على اتحاد (IoU) تبلغ 0.50. تؤكد هذه النتائج على إمكانية الكشف التلقائي المدعوم بالذكاء الاصطناعي لتعزيز دقة التشخيص وتبسيط سير العمل في ممارسة طب الأسنان. تدعو الدراسة إلى دمج مثل هذه الأدوات الذكية لدعم الأطباء في اتخاذ القرارات في الوقت الحقيقي، مما يحسن موثوقية التشخيص ونتائج المرضى عبر مختلف التخصصات السنية.

مقدمة

لقد عزز دمج الذكاء الاصطناعي (AI) في طب الأسنان بشكل كبير من قدرات التشخيص، خاصة من خلال تحليل الصور الشعاعية. تستخدم البرامج التجارية الحالية لتصوير الأسنان وكشف التسوس بشكل أساسي خوارزميات مشفرة بالذكاء الاصطناعي، مما يتطلب مجموعات بيانات كبيرة ومتنوعة تشمل أنواعًا مختلفة من الصور الشعاعية والسريرية. وقد أبرز العمل الأخير الذي قام به أوربيبي وآخرون توفر مجموعات بيانات عامة للصور السنية، والتي تعتبر ضرورية لتحسين تنوع البيانات وتقليل التحيز الخوارزمي. غالبًا ما تكون الطرق التقليدية لتفسير الصور الشعاعية السنية، مثل الصور البانورامية، مستهلكة للوقت وتخضع للتباين والأخطاء البشرية، مما يبرز الحاجة إلى أنظمة الكشف المدعومة بالحاسوب (CAD) لتحسين دقة التشخيص.

تركز هذه الدراسة على تطوير نموذج تعلم عميق للتعرف التلقائي على الترميمات السنية—مثل الحشوات، ومواد قنوات الجذر، والتاج، والغرسات—باستخدام الصور الشعاعية البانورامية. يستخدم الإطار المقترح ثلاثة نماذج متقدمة للتعلم العميق للكشف عن الأجسام، تم تدريبها على مجموعة بيانات متعددة المراكز من باكستان وتايلاند والولايات المتحدة. تعزز هذه المجموعة المتنوعة من البيانات قدرة النموذج على التكيف مع مختلف الفئات السكانية للمرضى والممارسات السريرية، مما يحسن من قابليته للتعميم. ومع ذلك، تبرز التحديات مثل التباين في تباين الصورة، والرؤية التشريحية، ووجود الشوائب الحاجة إلى نموذج ذكاء اصطناعي قوي قادر على الأداء بشكل موثوق في البيئات السريرية الواقعية.

طرق البحث

توضح المنهجية الموضحة في هذا القسم خط الأنابيب الشامل المستخدم لتدريب ثلاثة نماذج متميزة: YOLOv11، Faster R-CNN، وVision Transformer (ViT) للكشف عن المواد السنية، بما في ذلك الحشوات، ومواد قنوات الجذر، والأعمدة السنية، والأطراف الصناعية السنية (التاج والجسور) في الصور الشعاعية البانورامية. يشمل سير العمل عدة مراحل حاسمة: إعداد مجموعة البيانات، اختيار النموذج، التدريب، التقييم، والنشر.

بالإضافة إلى ذلك، تلتزم الدراسة بقائمة التحقق للذكاء الاصطناعي في التصوير الطبي (CLAIM) 2024، مما يضمن أن البحث يفي بالمعايير المعمول بها للتقارير في هذا المجال. يهدف هذا النهج المنظم إلى تعزيز دقة وموثوقية الكشف عن المواد السنية في الصور الشعاعية.

النتائج

في هذه الدراسة، تم استخدام مجموعة بيانات تضم 2,235 صورة بانورامية عالية الدقة للأسنان لتقييم أداء ثلاثة نماذج: YOLOv11، Vision Transformer (ViT)، وFaster R-CNN، في الكشف عن العلاجات السنية. تم وضع علامات على الصور بعناية باستخدام Roboflow، مما يضمن بيانات حقيقية عالية الجودة. شمل إعداد مجموعة البيانات تغيير حجم الصور إلى 640 بكسل واستخدام تقنيات رؤية حاسوبية متقدمة لتعزيز تعميم النموذج وتوازن الفئات.

مرت النماذج بفترات تدريب متفاوتة—YOLOv11 لمدة 400 فترة، Faster R-CNN لمدة 500 فترة، وViT لمدة 200 فترة—تعكس سلوكيات التقارب المتميزة وخصائصها المعمارية. من الجدير بالذكر أن ViT، وهو نموذج قائم على المحولات، أظهر أداءً متفوقًا في وقت مبكر من عملية التدريب بفضل استخراج الميزات العالمية الفعال عبر آليات الانتباه الذاتي. على العكس من ذلك، احتاج Faster R-CNN القائم على المناطق إلى المزيد من الفترات لتحسين كل من مرحلة اقتراح المنطقة ومرحلة التصنيف. تم توجيه اختيارات الفترات من خلال أداء التحقق المبكر واستقرار الخسارة لضمان التعلم الفعال مع تقليل خطر الإفراط في التكيف.

المناقشة

تتناول قسم المناقشة من ورقة البحث مجموعة البيانات الشاملة المستخدمة لتدريب النماذج التي تهدف إلى الكشف عن حالات الأسنان المختلفة من الصور الشعاعية البانورامية. تتكون هذه المجموعة من أكثر من 2,235 صورة مأخوذة من باكستان وتايلاند والولايات المتحدة، تم وضع علامات عليها في ست فئات: الحشوات السنية، والتاج، والأسنان المعالجة بقنوات الجذر، والغرسات، والأعمدة السنية، والجسور. يعزز تنوع مجموعة البيانات في تقنيات التصوير وخصائص المرضى قدرة النموذج على التكيف مع سيناريوهات سريرية مختلفة. من الجدير بالذكر أن عملية وضع العلامات تضمنت ثلاثة متخصصين في الأسنان ذوي خبرة، مما يضمن بيانات حقيقية عالية الجودة مع معامل كوهين κ يبلغ 0.89، مما يعكس اتفاقًا قويًا بين المقيمين.

تم استخدام تقنيات معالجة البيانات، بما في ذلك تغيير حجم الصور وزيادة البيانات، لتوحيد مجموعة البيانات وتحسين تعميم النموذج. تم تغيير حجم الصور إلى أبعاد محددة مناسبة لهياكل YOLOv11 وFaster R-CNN، بينما عززت طرق الزيادة تنوع مجموعة البيانات، خاصة للفئات الممثلة تمثيلًا ناقصًا. تم تقييم النماذج—YOLOv11، Faster R-CNN، وVision Transformer (ViT)—استنادًا إلى دقتها وموثوقيتها في الكشف عن حالات الأسنان. حقق YOLOv11 دقة متوسطة (mAP) تبلغ 70.7%، بينما وصلت Faster R-CNN إلى دقة AP تبلغ 0.82 عند IoU = 0.50. أظهر نموذج ViT أداءً استثنائيًا بدقة تبلغ 94.15%، مما يدل على فعاليته في تصنيف حالات الأسنان. بشكل عام، تسلط الدراسة الضوء على أهمية مجموعة بيانات جيدة التوصيف ومتنوعة، إلى جانب استراتيجيات معالجة مسبقة وتدريب نموذج قوية، في تعزيز تحليل الصور السنية الآلي.

Journal: BMC Oral Health, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12903-025-07138-0
PMID: https://pubmed.ncbi.nlm.nih.gov/41194095
Publication Date: 2025-11-05
Author(s): Zohaib Khurshid et al.
Primary Topic: Dental Radiography and Imaging

Overview

This study presents a novel deep learning approach for the automated detection of various dental prostheses and treatments, utilizing a diverse dataset of 2,235 panoramic radiographs from three dental colleges. The methodology involved a comprehensive workflow that included dataset preparation, model selection, and rigorous training and evaluation of three advanced deep learning architectures: You Only Look Once (YOLO)v11, Faster Region-based Convolutional Neural Network (Faster R-CNN), and Vision Transformer (ViT). The models were assessed through a standardized protocol, ensuring robust validation of their performance.

The results indicate that the ViT model outperformed the others, achieving a detection accuracy of 94.15%, precision of 94.64%, and an F1-score of 93.52%. In contrast, Faster R-CNN achieved the highest mean average precision (mAP) of 82.0% at an Intersection over Union (IoU) threshold of 0.50. These findings underscore the potential of AI-driven automated detection to enhance diagnostic accuracy and streamline workflows in dental practice. The study advocates for the integration of such AI tools to support clinicians in real-time decision-making, thereby improving diagnostic reliability and patient outcomes across various dental specialties.

Introduction

The integration of artificial intelligence (AI) in dentistry has significantly enhanced diagnostic capabilities, particularly through the analysis of radiographic images. Current commercial software for dental radiography and caries detection predominantly utilizes AI-coded algorithms, necessitating large and diverse datasets that include various types of radiographic and clinical images. Recent work by Uribe et al. has highlighted the availability of public datasets for dental images, which are crucial for improving data diversity and reducing algorithmic bias. Traditional methods of interpreting dental radiographs, such as panoramic images, are often time-consuming and subject to variability and human error, emphasizing the need for automated computer-aided detection (CAD) systems to improve diagnostic accuracy.

This study focuses on developing a deep learning model for the automated identification of dental restorations—such as fillings, root canal materials, crowns, and implants—using panoramic radiographs. The proposed framework employs three advanced deep learning models for object detection, trained on a multicenter dataset from Pakistan, Thailand, and the United States. This diverse dataset enhances the model’s adaptability to different patient populations and clinical practices, thereby improving its generalizability. However, challenges such as variations in image contrast, anatomical visibility, and the presence of artifacts highlight the necessity for a robust AI model capable of performing reliably in real-world clinical settings.

Methods

The methodology outlined in this section details the comprehensive pipeline utilized for training three distinct models: YOLOv11, Faster R-CNN, and Vision Transformer (ViT) for the detection of dental materials, including fillings, root canal materials, endodontic posts, and dental prostheses (crowns and bridges) in panoramic radiographs. The workflow encompasses several critical stages: dataset preparation, model selection, training, evaluation, and deployment.

Additionally, the study adheres to the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) 2024, ensuring that the research meets established standards for reporting in this domain. This structured approach aims to enhance the accuracy and reliability of dental material detection in radiographic images.

Results

In this study, a dataset comprising 2,235 high-resolution panoramic tooth images was utilized to assess the performance of three models: YOLOv11, Vision Transformer (ViT), and Faster R-CNN, in detecting dental treatments. The images were meticulously annotated using Roboflow, ensuring high-quality ground truth data. The dataset preparation involved resizing images to 640 pixels and employing advanced computer vision techniques to enhance model generalization and class balance.

The models underwent varying training epochs—YOLOv11 for 400 epochs, Faster R-CNN for 500 epochs, and ViT for 200 epochs—reflecting their distinct convergence behaviors and architectural characteristics. Notably, ViT, a transformer-based model, demonstrated superior performance earlier in the training process due to its effective global feature extraction via self-attention mechanisms. Conversely, the region-based Faster R-CNN required more epochs to optimize both the region proposal and classification stages. Epoch selections were guided by early validation performance and loss stabilization to ensure effective learning while mitigating the risk of overfitting.

Discussion

The discussion section of the research paper details the comprehensive dataset utilized for training models aimed at detecting various dental conditions from panoramic radiographic images. This dataset comprises over 2,235 images sourced from Pakistan, Thailand, and the USA, annotated into six classes: dental fillings, crowns, root canal-treated teeth, implants, endodontic posts, and bridges. The dataset’s diversity in imaging techniques and patient demographics enhances the model’s adaptability to different clinical scenarios. Notably, the annotation process involved three experienced dental specialists, ensuring high-quality ground truth with a Cohen’s κ of 0.89, reflecting strong inter-rater agreement.

Data preprocessing techniques, including image resizing and augmentation, were employed to standardize the dataset and improve model generalization. Images were resized to specific dimensions suitable for the YOLOv11 and Faster R-CNN architectures, while augmentation methods enhanced the dataset’s diversity, particularly for underrepresented classes. The models—YOLOv11, Faster R-CNN, and Vision Transformer (ViT)—were evaluated based on their precision and accuracy in detecting dental conditions. YOLOv11 achieved a mean Average Precision (mAP) of 70.7%, while Faster R-CNN reached an AP of 0.82 at IoU = 0.50. The ViT model demonstrated exceptional performance with an accuracy of 94.15%, indicating its effectiveness in classifying dental conditions. Overall, the study highlights the importance of a well-annotated and diverse dataset, alongside robust preprocessing and model training strategies, in advancing automated dental image analysis.