رسم بياني تشخيصي آلي قائم على التعلم العميق في الأشعة السينية البانورامية: مقارنة بين YOLOv11 و YOLOv12 Deep Learning–Based Automated Diagnostic Charting on Panoramic Radiography: Comparison of YOLOv11 and YOLOv12

المجلة: Odontology
DOI: https://doi.org/10.1007/s10266-026-01333-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41699374
تاريخ النشر: 2026-02-16
المؤلف: Onur Mutlu وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تستقصي الدراسة فعالية اثنين من هياكل التعلم العميق المتقدمة، YOLOv11 و YOLOv12، للكشف التلقائي عن 13 حالة سنية في الأشعة السينية البانورامية، بهدف تحسين سير العمل السريري وتقليل التباين في التشخيص. تم استخدام مجموعة بيانات هجينة مكونة من 2,297 صورة، وتم تقييم مقاييس الأداء من خلال متوسط الدقة (mAP@0.5)، والدقة، والاسترجاع، ودرجة F1 على كل من مجموعات الاختبار الداخلية والخارجية. أظهرت النتائج أن YOLOv11 تفوق على YOLOv12، محققًا mAP@0.5 قدره 0.857 على مجموعة البيانات الداخلية و0.806 على مجموعة البيانات الخارجية، مما يدل على قدرات تعميم قوية. بينما تفوق النموذج في الكشف عن الحالات المحددة جيدًا مثل التيجان والغرسات، أظهر أداءً أقل للحالات الدقيقة مثل فقدان العظام والتسوس.

تضع النتائج YOLOv11 كهيكل متفوق للكشف عن الأجسام السنية متعددة الفئات، مما يشير إلى إمكانيته كأداة دعم قرار سريري يمكن أن توفر “رأي ثانٍ” موثوق لتعزيز دقة وكفاءة التشخيص. ومع ذلك، تؤكد الدراسة على أن الأداء المتفاوت عبر الحالات المختلفة يشير إلى ضرورة الحفاظ على الخبرة السريرية جنبًا إلى جنب مع التقدم التكنولوجي. يجب أن تهدف الأبحاث المستقبلية إلى التحقق من قوة النموذج مع مجموعات بيانات أكبر ومتعددة المراكز لتسهيل دمجه في الممارسة السريرية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الدور الحاسم للأشعة السنية في طب الأسنان الحديث، مشددة على أهميتها في تشخيص الأمراض، وتخطيط العلاجات، ومراقبة الأمراض السنية. بينما تُستخدم الأشعة السينية ثنائية الأبعاد على نطاق واسع، فإنها تقدم تحديات مثل التداخل التشريحي وتشوهات الصورة. لمعالجة هذه القيود، ظهرت الذكاء الاصطناعي (AI)، وخاصة تقنيات التعلم العميق مثل الشبكات العصبية التلافيفية (CNNs)، كحل تحويلي في الأشعة السنية. تعزز هذه الأنظمة دقة التشخيص، وتقلل من التباين بين الأطباء، وتحسن كفاءة سير العمل من خلال تحليل مجموعات بيانات كبيرة من الصور الشعاعية.

تركز الورقة بشكل خاص على تطبيق عائلة هياكل “You Only Look Once” (YOLO)، والتي أظهرت أداءً عاليًا في الكشف عن حالات سنية متنوعة في الصور البانورامية. أظهرت الدراسات السابقة أن هذه الهياكل تحقق حساسية ودقة ودرجات F1 تتجاوز 0.90 للكشف عن الترميمات السنية وأكثر من 0.80 لتحديد التسوس. على الرغم من فعالية الذكاء الاصطناعي في الكشف عن العديد من الأمراض السنية، هناك فجوة ملحوظة في رسم التشخيص الشامل بناءً على الصور البانورامية. تهدف هذه الدراسة إلى سد تلك الفجوة من خلال مقارنة أداء نموذجين متقدمين من التعلم العميق، YOLOv11 و YOLOv12، في الكشف التلقائي عن 13 حالة سنية مختلفة، ساعية في النهاية إلى إنشاء أداة موثوقة لرسم التشخيص في الممارسة السريرية.

الطرق

في هذه الدراسة، تم تنفيذ جميع تجارب التعلم العميق على محطة عمل حوسبة عالية الأداء مزودة بوحدة معالجة الرسوميات NVIDIA GeForce RTX 3090، والتي تحتوي على 24 جيجابايت من ذاكرة GDDR6X VRAM، ومعالج AMD Ryzen 9 5900X، وذاكرة نظام 64 جيجابايت من DDR4 بسرعة 3600 ميجاهرتز. شملت بيئة البرمجيات المستخدمة في التجارب نظام التشغيل Ubuntu 22.04، وإصدار Python 3.11، ومكتبة التعلم العميق PyTorch v2.5.1، ومنصة الحوسبة المتوازية CUDA v12.6.

سهلت هذه التكوينات القوية من الأجهزة والبرمجيات تدريب وتنفيذ نماذج التعلم العميق بكفاءة، مما يضمن أداءً مثاليًا وإدارة موارد طوال العمليات التجريبية.

النتائج

تم تقييم قدرة التعميم للنموذج النهائي YOLOv11 باستخدام مجموعة اختبار هجينة تضمنت بيانات داخلية وخارجية. حقق النموذج أداءً مشرفًا على مجموعة الاختبار الداخلية، مع دقة متوسطة قدرها 0.824، واسترجاع قدره 0.867، ودرجة F1 قدرها 0.842، ومتوسط دقة متوسط عند IoU 0.5 (mAP@0.5) قدره 0.857. بالمقابل، كانت نتائج مجموعة الاختبار الخارجية أقل قليلاً، حيث حققت دقة متوسطة قدرها 0.786، واسترجاع قدره 0.767، ودرجة F1 قدرها 0.771، وmAP@0.5 قدره 0.806.

كشف تحليل الأداء حسب الفئة أن النموذج تفوق في الكشف عن الأسنان “المتأثرة” (درجة F1: 0.959)، و”المعالجة بجذر القناة” (درجة F1: 0.954)، و”غير المنفجرة” (درجة F1: 0.948) ضمن مجموعة البيانات الداخلية. على العكس، واجه صعوبة مع “التفرع” (درجة F1: 0.659)، و”فقدان العظام” (درجة F1: 0.703)، و”السن المكسور” (درجة F1: 0.732). بالنسبة لمجموعة البيانات الخارجية، لوحظ أداء قوي لـ “التيجان” (درجة F1: 0.932) و”الأقواس” (درجة F1: 0.907)، بينما كانت “فقدان العظام” هي الفئة الأضعف مرة أخرى (درجة F1: 0.576).

المناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على الموافقة الأخلاقية والإطار المنهجي المنظم المستخدم في الدراسة، والتي تهدف إلى مقارنة أداء هياكل التعلم العميق، YOLOv11 و YOLOv12، للكشف التلقائي عن الحالات السنية باستخدام الأشعة السينية البانورامية. استخدمت الدراسة مجموعة بيانات مختارة مكونة من 2,297 صورة، مع بروتوكول توضيحي صارم لـ 13 حالة سنية، ونفذت عملية إدارة بيانات شاملة لتعزيز جودة المدخلات. كشفت النتائج أن YOLOv11 تفوق على YOLOv12، محققًا متوسط دقة (mAP@0.5) قدره 85.7% على مجموعة الاختبار الداخلية، مما يدل على قوته في الكشف عن الآفات الدقيقة. بالمقابل، أظهر YOLOv12 مقاييس أداء أقل، خاصةً للحالات ذات الميزات الدقيقة مثل فقدان العظام والتسوس.

كما أكدت النتائج التحديات التي تواجهها نماذج الذكاء الاصطناعي في الكشف بدقة عن الحالات السنية التي تتميز بتباين إشعاعي منخفض وتنوع مورفولوجي. على الرغم من الأداء الواعد لنموذج YOLOv11، تم الاعتراف بالقيود مثل الاعتماد على مجموعة بيانات مؤسسة واحدة والتحيز المحتمل في التوضيحات. تقترح الدراسة أنه بينما يمكن أن تكون الأنظمة المعتمدة على الذكاء الاصطناعي أدوات دعم تشخيصية قيمة، يجب أن تكمل الحكم السريري بدلاً من استبداله، خاصة في الحالات المعقدة. تشمل اتجاهات البحث المستقبلية تعزيز هياكل النموذج واستكشاف التدريب المحدد للمهام لتحسين قدرات الكشف عبر حالات سنية متنوعة.

Journal: Odontology
DOI: https://doi.org/10.1007/s10266-026-01333-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41699374
Publication Date: 2026-02-16
Author(s): Onur Mutlu et al.
Primary Topic: Dental Radiography and Imaging

Overview

The study investigates the efficacy of two advanced deep learning architectures, YOLOv11 and YOLOv12, for the automated detection of 13 dental conditions in panoramic radiographs, aiming to enhance clinical workflows and reduce diagnostic variability. A hybrid dataset of 2,297 images was utilized, with performance metrics evaluated through mean Average Precision (mAP@0.5), Precision, Recall, and F1-score on both internal and external test sets. Results indicated that YOLOv11 outperformed YOLOv12, achieving an mAP@0.5 of 0.857 on the internal dataset and 0.806 on the external dataset, demonstrating robust generalization capabilities. While the model excelled in detecting well-defined conditions like crowns and implants, it showed lower performance for subtle pathologies such as bone loss and caries.

The findings position YOLOv11 as a superior architecture for multi-class dental object detection, suggesting its potential as a clinical decision support tool that can provide a reliable “second opinion” to enhance diagnostic accuracy and efficiency. However, the study emphasizes that the varying performance across different conditions indicates the necessity of maintaining clinical expertise alongside technological advancements. Future research should aim to validate the model’s robustness with larger, multi-center datasets to further facilitate its integration into clinical practice.

Introduction

The introduction of this research paper highlights the critical role of dental radiography in modern dentistry, emphasizing its importance for diagnosing, planning treatments, and monitoring dental diseases. While two-dimensional radiographs are widely used, they present challenges such as anatomical superimposition and image distortions. To address these limitations, artificial intelligence (AI), particularly deep learning techniques like Convolutional Neural Networks (CNNs), has emerged as a transformative solution in dental radiology. These systems enhance diagnostic accuracy, reduce variability among clinicians, and improve workflow efficiency by analyzing large datasets of radiographic images.

The paper specifically focuses on the application of the ‘You Only Look Once’ (YOLO) family of architectures, which have shown high performance in detecting various dental conditions in panoramic images. Previous studies have demonstrated that these architectures achieve sensitivity, precision, and F1-scores exceeding 0.90 for detecting dental restorations and above 0.80 for identifying dental caries. Despite the effectiveness of AI in detecting numerous dental pathologies, there is a notable gap in comprehensive diagnostic charting based on panoramic images. This study aims to fill that gap by comparing the performance of two advanced deep learning models, YOLOv11 and YOLOv12, in the automated detection of 13 different dental conditions, ultimately seeking to establish a reliable tool for diagnostic charting in clinical practice.

Methods

In this study, all deep learning experiments were executed on a high-performance computing workstation equipped with an NVIDIA GeForce RTX 3090 GPU, which features 24 GB of GDDR6X VRAM, an AMD Ryzen 9 5900X CPU, and 64 GB of DDR4 3600MHz system memory. The software environment utilized for the experiments included the Ubuntu 22.04 operating system, Python version 3.11, the PyTorch v2.5.1 deep learning library, and the CUDA v12.6 parallel computing platform.

This robust hardware and software configuration facilitated efficient training and execution of deep learning models, ensuring optimal performance and resource management throughout the experimental processes.

Results

The final YOLOv11 model’s generalization capability was assessed using a hybrid test set that included both internal and external data. The model achieved commendable performance on the internal test set, with an average precision of 0.824, recall of 0.867, F1-score of 0.842, and mean Average Precision at IoU 0.5 (mAP@0.5) of 0.857. In contrast, the external test set results were slightly lower, yielding an average precision of 0.786, recall of 0.767, F1-score of 0.771, and mAP@0.5 of 0.806.

Class-specific performance analysis revealed that the model excelled in detecting “Impacted” (F1-score: 0.959), “Root-canal treated” (F1-score: 0.954), and “Unerupted” (F1-score: 0.948) teeth within the internal dataset. Conversely, it struggled with “Furcation” (F1-score: 0.659), “Bone loss” (F1-score: 0.703), and “Fractured tooth” (F1-score: 0.732). For the external dataset, strong performance was noted for “Crowns” (F1-score: 0.932) and “Braces” (F1-score: 0.907), while “Bone loss” was again the weakest category (F1-score: 0.576).

Discussion

The discussion section of the research paper highlights the ethical approval and systematic methodological framework employed in the study, which aimed to compare the performance of two deep learning architectures, YOLOv11 and YOLOv12, for automated detection of dental conditions using panoramic radiographs. The study utilized a curated dataset of 2,297 images, with a rigorous annotation protocol for 13 dental conditions, and implemented a comprehensive data management process to enhance input quality. The findings revealed that YOLOv11 outperformed YOLOv12, achieving a mean Average Precision (mAP@0.5) of 85.7% on the internal test set, indicating its robustness in detecting subtle lesions. In contrast, YOLOv12 demonstrated lower performance metrics, particularly for conditions with subtle features like bone loss and caries.

The results also emphasized the challenges faced by AI models in accurately detecting dental conditions characterized by low radiographic contrast and morphological variability. Despite the promising performance of the YOLOv11 model, limitations such as reliance on a single institution’s dataset and potential biases in annotations were acknowledged. The study suggests that while AI-based systems can serve as valuable diagnostic support tools, they should complement rather than replace clinical judgment, particularly in complex cases. Future research directions include enhancing model architectures and exploring task-specific training to improve detection capabilities across diverse dental conditions.