التقييم الآلي لصحة المحيط الجذري بناءً على مؤشر المحيط الجذري الشعاعي باستخدام خوارزميات الكشف عن الأجسام من نوع YOLOv8 وYOLOv11 وYOLOv12 Automated assessment of periapical health based on the radiographic periapical index using YOLOv8, YOLOv11, and YOLOv12 one-stage object detection algorithms

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-21761-5
PMID: https://pubmed.ncbi.nlm.nih.gov/41115912
تاريخ النشر: 2025-10-20
المؤلف: Shehabeldin Saber وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تستكشف هذه الدراسة فعالية خوارزميات YOLO (أنت تنظر مرة واحدة فقط) الحديثة للكشف الآلي وتصنيف التهاب اللثة القمي باستخدام نظام تسجيل المؤشر القمي (PAI)، الذي يتراوح من 1 إلى 5. تم وضع مجموعة بيانات تتكون من 699 صورة شعاعية رقمية قميّة غير محددة الهوية بعناية من قبل خبراء مدربين، وتم تقسيمها لاحقًا إلى مجموعات تدريب، وتحقق، واختبار. تم تقييم أداء ثلاثة نماذج تعلم عميق—YOLOv8m، YOLOv11m، وYOLOv12m—باستخدام مقاييس مثل الدقة، الاسترجاع، درجة F1، متوسط الدقة (mAP50)، التقاطع على الاتحاد (IoU)، ومصفوفات الالتباس. ومن الجدير بالذكر أن YOLOv11m وYOLOv12m حققا درجات دقة أعلى (88.5% و89.1% على التوالي) مقارنة بـ YOLOv8m (86.8%)، حيث أظهر YOLOv11m أيضًا أعلى استرجاع (86.2%) ودرجة F1 (87.1%).

تؤكد النتائج على إمكانية استخدام خوارزميات YOLO في أتمتة الكشف وتصنيف التهاب اللثة القمي، مما يشير إلى إمكانية دمجها في سير العمل السريري. برز YOLOv11m كنموذج الأكثر توازنًا، حيث يقدم توازنًا ملائمًا بين دقة الكشف وسرعة المعالجة، مما يجعله مناسبًا بشكل خاص للاستخدام السريري. بينما قدم YOLOv12m دقة أعلى، إلا أنه فعل ذلك بمعدل استدلال أبطأ، وسمح YOLOv8m، على الرغم من تعقيده، باستدلال سريع. لتعزيز قابلية تعميم وموثوقية YOLOv11m للاستخدام السريري، يجب أن تركز الأبحاث المستقبلية على توسيع مجموعات بيانات التدريب، والحصول على تحقق خارجي، وتنفيذ مراقبة مستمرة للأداء. بالإضافة إلى ذلك، تدعو الدراسة إلى استكشاف وكلاء الذكاء الاصطناعي المعتمدين على LLM وسير العمل لتحسين التكامل السريري والنتائج في الكشف عن الآفات القمية.

الطرق

في هذه الدراسة، تم تنفيذ جميع الطرق وفقًا للإرشادات واللوائح المعمول بها. حصلت بروتوكولات البحث على موافقة من لجنة الأخلاقيات البحثية بكلية طب الأسنان في الجامعة البريطانية في مصر، برقم الموافقة 24-004 الصادر في 10 يناير 2024. بالإضافة إلى ذلك، تم الحصول على موافقة مستنيرة لاستخدام البيانات من جميع المشاركين وأوصيائهم القانونيين، مما يضمن الالتزام بالمعايير الأخلاقية طوال عملية البحث.

النتائج

يقدم قسم النتائج تقييمًا شاملاً لمقاييس الأداء لثلاثة نماذج مدربة: YOLOv8m، YOLOv11m، وYOLOv12m، كما هو موضح في الجدول 1. تم تقييم النماذج على معايير مختلفة، بما في ذلك عدد الطبقات التلافيفية، والمعلمات القابلة للتدريب، وحقب التدريب، وإجمالي وقت التدريب، وGFLOPS، ووقت المعالجة لكل صورة. حققت النماذج درجات متوسط دقة متشابهة (mAP50)، حيث كانت YOLOv8m عند 86.4% وYOLOv11m وYOLOv12m عند 86.6%. ومن الجدير بالذكر أن YOLOv11m أظهر أعلى دقة (88.5%) واسترجاع (86.2%)، بالإضافة إلى أعلى درجة F1 (87.1%). أظهرت تحليل مصفوفة الالتباس أن جميع النماذج أدت بشكل أفضل في الفئات 3-5 مقارنة بالفئات 1 و2، حيث برز YOLOv11m بشكل خاص في توقع الفئات 1 و2.

أظهرت التقييمات النوعية، الموضحة في الشكل 5، أن النماذج تتماشى بشكل وثيق مع الحقيقة الأرضية على مجموعة البيانات الداخلية للاختبار، مما يظهر تحديدًا دقيقًا وأخطاءً قليلة. ومع ذلك، كشفت الأداء على مجموعة الاختبار الخارجية عن تحديات مثل نقص الكشف عن الآفات الصغيرة والارتباك في الدرجات، مما يشير إلى الحاجة إلى تعزيز تنوع بيانات التدريب واستراتيجيات التAugmentation. أظهرت المقارنات الإحصائية باستخدام اختبار مك نيمار الدقيق أن YOLOv11m تفوقت بشكل كبير على كل من YOLOv8m وYOLOv12m في نتائج مستوى الآفة، مع قيم p تبلغ 0.0015 و0.0161 على التوالي. بشكل عام، بينما أظهر YOLOv11m ميزة طفيفة في مقاييس الأداء، تشير التداخلات في فترات الثقة إلى أن الفروق قد لا تكون ذات دلالة إحصائية على مستوى الصورة.

المناقشة

شملت الدراسة جمعًا رجعيًا لـ 699 صورة شعاعية رقمية قميّة من عيادات مختلفة لتطوير نموذج تعلم عميق (DL) لتقييم حالات الأنسجة القمية باستخدام المؤشر القمي (PAI). تضمنت مجموعة البيانات عوامل ديموغرافية متنوعة وتم وضعها من قبل محترفين ذوي خبرة، محققة توافقًا عاليًا بين المراقبين (قيم كابا 0.97 و0.96). أكدت تحليل القوة بعد الحدث أن أداء النموذج (درجات F1 بين 0.83 و0.86) كان أعلى بكثير من الصدفة، مما يدل على كفاية مجموعة البيانات لتدريب نموذج قوي. خضعت البيانات لعمليات معالجة مسبقة شاملة، بما في ذلك تقنيات التAugmentation التي زادت من حجم مجموعة بيانات التدريب عشرة أضعاف، مما عزز قابلية تعميم النموذج.

استخدمت الدراسة النسخ المتوسطة من بنية YOLO (YOLOv8m، YOLOv11m، وYOLOv12m) للكشف عن الكائنات وتصنيفها. برز YOLOv11m كنموذج الأكثر فعالية، حيث أظهر أداءً متوازنًا من حيث الدقة وكفاءة المعالجة، متفوقًا بشكل خاص في الكشف عن درجات PAI المنخفضة. تشير النتائج إلى أنه بينما أدت النماذج بشكل جيد، تم ملاحظة قيود مثل احتمال الإفراط في التكيف بسبب حجم مجموعة البيانات وعدم توازن الفئات. يوصي المؤلفون بإجراء أبحاث مستقبلية لتوسيع مجموعة البيانات واستكشاف تكامل الذكاء الاصطناعي المتقدم في سير العمل السريري لتحسين الكشف عن الآفات القمية، مما يعزز في النهاية النتائج السريرية في ممارسة طب الأسنان.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-21761-5
PMID: https://pubmed.ncbi.nlm.nih.gov/41115912
Publication Date: 2025-10-20
Author(s): Shehabeldin Saber et al.
Primary Topic: Dental Radiography and Imaging

Overview

This study explores the effectiveness of recent YOLO (You Only Look Once) algorithms for the automated detection and classification of apical periodontitis using the Periapical Index (PAI) scoring system, which ranges from 1 to 5. A dataset comprising 699 de-identified digital periapical radiographs was meticulously annotated by calibrated experts and subsequently divided into training, validation, and testing sets. The performance of three deep learning models—YOLOv8m, YOLOv11m, and YOLOv12m—was evaluated using metrics such as Precision, Recall, F1 score, mean average precision (mAP50), Intersection over Union (IoU), and confusion matrices. Notably, YOLOv11m and YOLOv12m achieved higher Precision scores (88.5% and 89.1%, respectively) compared to YOLOv8m (86.8%), with YOLOv11m also demonstrating the highest Recall (86.2%) and F1 score (87.1%).

The findings underscore the potential of YOLO algorithms in automating the detection and classification of apical periodontitis, suggesting their integration into clinical workflows. YOLOv11m emerged as the most balanced model, offering a favorable trade-off between detection accuracy and processing speed, making it particularly suitable for clinical application. While YOLOv12m provided higher precision, it did so at a slower inference rate, and YOLOv8m, despite its complexity, allowed for rapid inference. To enhance the generalizability and reliability of YOLOv11m for clinical use, future research should focus on expanding training datasets, obtaining external validation, and implementing continuous performance monitoring. Additionally, the study advocates for the exploration of LLM-based AI agents and workflows to further improve clinical integration and outcomes in the detection of periapical lesions.

Methods

In this study, all methods were conducted in compliance with established guidelines and regulations. The research protocols received approval from the research ethics committee of the Faculty of Dentistry at The British University in Egypt, with the approval number 24-004 issued on January 10, 2024. Additionally, informed consent for the use of data was obtained from all participants and their legal guardians, ensuring ethical standards were upheld throughout the research process.

Results

The results section presents a comprehensive evaluation of the performance metrics for three trained models: YOLOv8m, YOLOv11m, and YOLOv12m, as detailed in Table 1. The models were assessed on various parameters, including the number of convolutional layers, trainable parameters, training epochs, total training time, GFLOPS, and processing time per image. The models achieved similar mean Average Precision (mAP50) scores, with YOLOv8m at 86.4% and YOLOv11m and YOLOv12m at 86.6%. Notably, YOLOv11m exhibited the highest precision (88.5%) and recall (86.2%), as well as the maximum F1 score (87.1%). The confusion matrix analysis indicated that all models performed better on classes 3-5 compared to classes 1 and 2, with YOLOv11m particularly excelling in predicting classes 1 and 2.

Qualitative assessments, illustrated in Figure 5, demonstrated that the models aligned closely with ground truth on the internal testing dataset, showcasing precise localization and minimal errors. However, performance on the external test set revealed challenges such as under-detection of small lesions and grade confusions, suggesting a need for enhanced training data diversity and augmentation strategies. Statistical comparisons using McNemar’s exact test indicated that YOLOv11m significantly outperformed both YOLOv8m and YOLOv12m in lesion-level outcomes, with p-values of 0.0015 and 0.0161, respectively. Overall, while YOLOv11m showed a slight advantage in performance metrics, the overlap in confidence intervals suggests that the differences may not be statistically significant at the image level.

Discussion

The study involved the retrospective collection of 699 digital periapical radiographs from various clinics to develop a deep learning (DL) model for assessing periapical tissue conditions using the periapical index (PAI). The dataset included diverse demographic factors and was annotated by experienced professionals, achieving high intra- and inter-observer agreement (kappa values of 0.97 and 0.96). A post-hoc power analysis confirmed that the model’s performance (F1 scores between 0.83 and 0.86) was significantly above chance, indicating the dataset’s adequacy for robust model training. The data underwent extensive preprocessing, including augmentation techniques that increased the training dataset size tenfold, thereby enhancing model generalizability.

The study utilized the medium variants of the YOLO architecture (YOLOv8m, YOLOv11m, and YOLOv12m) for object detection and classification. YOLOv11m emerged as the most effective model, demonstrating a balanced performance in terms of accuracy and processing efficiency, particularly excelling in detecting lower PAI scores. The findings suggest that while the models performed well, limitations such as potential overfitting due to dataset size and class imbalance were noted. The authors recommend future research to expand the dataset and explore advanced AI integration in clinical workflows to improve periapical lesion detection, ultimately enhancing clinical outcomes in dental practice.