مقارنة بين تحليل السيفالومتري المدعوم بالذكاء الاصطناعي وتحليل التتبع الرقمي الذي يقوم به أطباء تقويم الأسنان Comparison of AI-assisted cephalometric analysis and orthodontist-performed digital tracing analysis

المجلة: Progress in Orthodontics، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s40510-024-00539-x
PMID: https://pubmed.ncbi.nlm.nih.gov/39428414
تاريخ النشر: 2024-10-21
المؤلف: Sabahattin Bor وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

هدفت الدراسة إلى تقييم أداء ثلاثة منصات تحليل سيفالومتري مدعومة بالذكاء الاصطناعي—CephX وWeDoCeph وWebCeph—مقابل الطريقة التقليدية للرسم الرقمي باستخدام برنامج NemoCeph. تم تصنيف ما مجموعه 1500 فيلم سيفالومتري جانبي إلى الفئة الأولى والفئة الثانية والفئة الثالثة، مع اختيار 40 فيلمًا عشوائيًا من كل فئة للتحليل. تمت معالجة الأفلام بواسطة منصات الذكاء الاصطناعي دون تدخل يدوي وتم تحليلها أيضًا بواسطة أخصائي تقويم الأسنان باستخدام NemoCeph.

أشارت النتائج إلى وجود اختلافات كبيرة في القياسات الزاوية (ANB وFMA وIMPA وNLA) وبعض القياسات الخطية (U1-NA وCo-A) عبر الطرق المختلفة، مع تسليط الضوء بشكل خاص على التباين في مجموعة الفئة الثانية التي تم تحليلها باستخدام NemoCeph. لوحظت دلالة إحصائية للقياسات الزاوية (ANB: $p < 0.05$، FMA: $p < 0.001$، IMPA: $p < 0.001$، NLA: $p < 0.001$) والقياسات الخطية (U1-NA: $p = 0.002$، Co-A: $p = 0.002$). تؤكد النتائج على ضرورة الاختيار الدقيق لطرق التحليل في تشخيص تقويم الأسنان، حيث أن المنصات المدعومة بالذكاء الاصطناعي، رغم كفاءتها وتقليلها للأخطاء البشرية، لا تزال تتطلب إشرافًا من أخصائيي تقويم الأسنان المدربين لضمان الدقة والموثوقية في تخطيط العلاج.

مقدمة

تستعرض مقدمة ورقة البحث المفاهيم الأساسية للذكاء الاصطناعي (AI) وتفرعاته، وخاصة التعلم الآلي والتعلم العميق. يشمل الذكاء الاصطناعي أنظمة مصممة لتكرار الوظائف الإدراكية البشرية، بينما يركز التعلم الآلي على النماذج الإحصائية التي تسمح لأجهزة الكمبيوتر بالتعلم من البيانات دون برمجة صريحة. يستخدم التعلم العميق، وهو فرع من التعلم الآلي، الشبكات العصبية الاصطناعية لاستخراج الميزات الهرمية تلقائيًا من البيانات، حيث تعتبر الشبكات العصبية التلافيفية نموذجًا بارزًا لتحليل البيانات المرئية. لقد وجدت هذه التقنيات تطبيقات في مجالات متنوعة، بما في ذلك تقويم الأسنان، حيث يساعد الذكاء الاصطناعي في مراقبة الأسنان، وتحديد مراحل النضج، وتقسيم الفك والأسنان، وتخطيط العلاج، والتحليل السيفالومتري.

يعد التحليل السيفالومتري أمرًا حيويًا لجراحة تقويم الأسنان والجراحة الفكية، حيث يوفر رؤى حول نمو وتطور الجمجمة والوجه. تشمل الطرق التقليدية الرسم اليدوي والرقمي، لكن التقدم الأخير قدم تحليلًا مدعومًا بالذكاء الاصطناعي من خلال منصات مخصصة. بينما يوفر التحليل السيفالومتري المدعوم بالكمبيوتر العديد من المزايا، فإنه لا يزال يعتمد على التنفيذ البشري، مما قد يؤدي إلى أخطاء بسبب عوامل مثل التعب والخبرة. بالمقابل، لا تخضع أنظمة الذكاء الاصطناعي لمثل هذه القيود، مما يمكّن من إجراء تحليلات أسرع وأكثر دقة. تهدف هذه الدراسة بشكل خاص إلى تقييم دقة وثبات ثلاثة منصات تحليل سيفالومتري قائمة على الذكاء الاصطناعي—WebCeph وWeDoCeph وCephX—مقابل الرسم الرقمي الذي يقوم به أخصائي تقويم أسنان ذو خبرة باستخدام NemoCeph، مما يبرز الحاجة إلى دراسات مقارنة في هذا المجال الناشئ.

طرق البحث

في هذه الدراسة الاستعادية، قام المؤلفون بتحليل 1,890 صورة سيفالومتري جانبي تم الحصول عليها من مرضى تتراوح أعمارهم بين 12 و18 عامًا الذين تلقوا علاج تقويم الأسنان في قسم تقويم الأسنان بجامعة يوزونجو ييل (YYU). تم إجراء الدراسة بعد الحصول على موافقة أخلاقية من لجنة الأخلاقيات في YYU. من بين 1,890 فيلمًا تم مراجعتها في البداية، تم اعتبار 1,500 منها مؤهلة للإدراج بناءً على معايير محددة مسبقًا. ضمنت هذه العملية اختيار مجموعة بيانات قوية لمزيد من تحليل نتائج تقويم الأسنان.

النتائج

أشارت نتائج تقييم المتكرر الداخلي إلى أن موثوقية القياسات الزاوية كانت متفاوتة عبر الفئات المختلفة، حيث أظهرت FMA في الفئة الأولى، وNLA في الفئة الثانية، وكلا من ANB وFMA في الفئة الثالثة معاملات ارتباط داخل الفئة (ICCs) أقل من 0.75، مما يشير إلى موثوقية أقل. بالمقابل، أظهرت معظم المعلمات الأخرى توافقًا جيدًا إلى ممتاز. في تحليل المتكرر الخارجي، لوحظت اتجاهات مماثلة، حيث أظهرت زوايا FMA وNLA في الفئة الثانية وخط U-E في الفئة الثالثة أيضًا قيم ICC أقل من 0.75، بينما حافظت القياسات الأخرى على موثوقية جيدة إلى ممتازة.

تم تحديد اختلافات ذات دلالة إحصائية في القياسات الزاوية بين مجموعات التحليل السيفالومتري عبر جميع الفئات. في الفئة الأولى، لوحظت اختلافات كبيرة في ANB (p < 0.05)، FMA (p < 0.001)، IMPA (p < 0.001)، وNLA (p < 0.001). كانت مجموعة WebCeph لديها أعلى متوسط ANB (3.83°) وIMPA (92.97°)، بينما أبلغت مجموعة NemoCeph عن أعلى متوسط FMA (28.8°). في الفئة الثانية، لوحظت مرة أخرى اختلافات كبيرة في ANB (p < 0.05)، FMA (p < 0.05)، IMPA (p < 0.05)، وNLA (p < 0.001)، حيث أظهرت مجموعة WebCeph أعلى ANB (6.63°) وIMPA (99.01°). كما كشفت الفئة الثالثة عن اختلافات كبيرة في القياسات الزاوية (ANB وFMA وIMPA وNLA؛ p < 0.05 أو p < 0.001)، حيث أظهرت مجموعة NemoCeph أدنى ANB (-1.95°) وأعلى FMA (28.88°) وIMPA (85.65°). كما أظهرت القياسات الخطية اختلافات كبيرة بين المجموعات، خاصة في U1-NA وCo-A وCo-Gn عبر جميع الفئات، مع تسجيل قيم ملحوظة لمجموعتي NemoCeph وWeDoCeph. تم استخدام اختبار Tukey بعد المقارنة للقيام بالمقارنات الزوجية حيث تم الكشف عن اختلافات كبيرة.

المناقشة

في هذه الدراسة، قام المؤلفون بالتحقيق في دقة التحليلات السيفالومترية التي أجرتها ثلاث منصات ذكاء اصطناعي (WebCeph وWeDoCeph وCephX) مقارنة بالرسم الرقمي اليدوي التقليدي (NemoCeph) عبر فئات هيكلية مختلفة (الفئة الأولى والثانية والثالثة). ضمنت معايير الإدراج صور سيفالومترية عالية الجودة، بينما استبعدت معايير الاستبعاد الصور التي تحتوي على معالم غير واضحة أو تشوهات سنية كبيرة. تم تحليل ما مجموعه 120 صورة، مع قياس ومقارنة المعلمات الزاوية والخطية عبر المجموعات. كشفت التحليلات الإحصائية عن اختلافات كبيرة في كل من القياسات الزاوية والخطية بين منصات الذكاء الاصطناعي ومجموعة الأخصائيين في تقويم الأسنان، مما يبرز التباين في النتائج بسبب طرق المعايرة والتحيزات المحتملة الموجودة في أنظمة الذكاء الاصطناعي.

أشارت النتائج إلى أنه بينما قدمت منصات الذكاء الاصطناعي تحليلات متسقة بشكل عام، لوحظت اختلافات كبيرة في قياسات معينة، خاصة في المعلمات الخطية مثل Co-A وCo-Gn. أكدت الدراسة على أهمية طرق المعايرة، مشيرة إلى أن المعايرة اليدوية (كما استخدمتها WeDoCeph) أسفرت عن نتائج أكثر توافقًا مع مجموعة الأخصائيين مقارنة بالمعايرة الآلية (CephX). علاوة على ذلك، اعترف المؤلفون بالقيود، بما في ذلك الاعتماد على أخصائي تقويم أسنان واحد لتحديد المعالم والاحتمالية لوجود تحيزات في البيانات والخوارزميات في أنظمة الذكاء الاصطناعي. يوصون بإجراء دراسات مستقبلية لتضمين مجموعة أوسع من الخبرات في تقويم الأسنان وتقييم تأثير التدخل اليدوي على الدقة في التحليلات السيفالومترية.

Journal: Progress in Orthodontics, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s40510-024-00539-x
PMID: https://pubmed.ncbi.nlm.nih.gov/39428414
Publication Date: 2024-10-21
Author(s): Sabahattin Bor et al.
Primary Topic: Dental Radiography and Imaging

Overview

The study aimed to evaluate the performance of three AI-assisted cephalometric analysis platforms—CephX, WeDoCeph, and WebCeph—against the traditional digital tracing method using NemoCeph software. A total of 1500 lateral cephalometric films were classified into Class I, Class II, and Class III, with 40 films randomly selected from each class for analysis. The films were processed by the AI platforms without manual intervention and also analyzed by an orthodontist using NemoCeph.

The findings indicated significant discrepancies in angular measurements (ANB, FMA, IMPA, and NLA) and certain linear measurements (U1-NA and Co-A) across the different methods, particularly highlighting the variability in the Class II group analyzed with NemoCeph. Statistical significance was noted for angular measurements (ANB: $p < 0.05$, FMA: $p < 0.001$, IMPA: $p < 0.001$, NLA: $p < 0.001$) and linear measurements (U1-NA: $p = 0.002$, Co-A: $p = 0.002$). The results underscore the necessity for careful selection of analysis methods in orthodontic diagnostics, as AI-assisted platforms, while efficient and reducing human error, still require oversight by trained orthodontists to ensure accuracy and reliability in treatment planning.

Introduction

The introduction of the research paper outlines the foundational concepts of artificial intelligence (AI) and its subsets, particularly machine learning and deep learning. AI encompasses systems designed to replicate human cognitive functions, while machine learning focuses on statistical models that allow computers to learn from data without explicit programming. Deep learning, a subset of machine learning, employs artificial neural networks to automatically extract hierarchical features from data, with convolutional neural networks being a prominent model for visual data analysis. These technologies have found applications in various fields, including orthodontics, where AI aids in dental monitoring, maturation stage identification, jaw and teeth segmentation, treatment planning, and cephalometric analysis.

Cephalometric analysis is crucial for orthodontic and orthognathic surgeries, providing insights into craniofacial growth and development. Traditional methods include manual and digital tracings, but recent advancements have introduced AI-assisted analysis through dedicated platforms. While computer-aided cephalometric analysis offers numerous advantages, it remains reliant on human execution, which can introduce errors due to factors like fatigue and expertise. In contrast, AI systems are not subject to such limitations, enabling quicker and potentially more accurate analyses. This study specifically aims to evaluate the accuracy and consistency of three AI-based cephalometric analysis platforms—WebCeph, WeDoCeph, and CephX—against digital tracing performed by an experienced orthodontist using NemoCeph, highlighting the need for comparative studies in this emerging field.

Methods

In this retrospective study, the authors analyzed 1,890 lateral cephalometric radiographs obtained from patients aged 12 to 18 years who received orthodontic treatment at the Department of Orthodontics, Yüzüncü Yıl University (YYU). The study was conducted following ethical approval from the YYU ethical board. Out of the initial 1,890 films reviewed, 1,500 were deemed eligible for inclusion based on predefined criteria. This selection process ensured a robust dataset for further analysis of orthodontic outcomes.

Results

The results of the intra-rater assessment indicated that the reliability of angular measurements varied across different classes, with the FMA in Class I, NLA in Class II, and both ANB and FMA in Class III exhibiting Intraclass Correlation Coefficients (ICCs) below 0.75, suggesting lower reliability. Conversely, most other parameters demonstrated good to excellent agreement. In the inter-rater analysis, similar trends were observed, with the FMA and NLA angles in Class II and the U-E line in Class III also showing ICC values below 0.75, while other measurements maintained good to excellent reliability.

Statistically significant differences were identified in angular measurements among cephalometric analysis groups across all classes. In Class I, significant differences were noted for ANB (p < 0.05), FMA (p < 0.001), IMPA (p < 0.001), and NLA (p < 0.001). The WebCeph group had the highest mean ANB (3.83°) and IMPA (92.97°), while the NemoCeph group reported the highest mean FMA (28.8°). In Class II, significant differences were again observed for ANB (p < 0.05), FMA (p < 0.05), IMPA (p < 0.05), and NLA (p < 0.001), with the WebCeph group showing the highest ANB (6.63°) and IMPA (99.01°). Class III also revealed significant differences in angular measurements (ANB, FMA, IMPA, NLA; p < 0.05 or p < 0.001), with the NemoCeph group exhibiting the lowest ANB (-1.95°) and the highest FMA (28.88°) and IMPA (85.65°). Linear measurements also showed significant differences among groups, particularly in U1-NA, Co-A, and Co-Gn across all classes, with notable values recorded for the NemoCeph and WeDoCeph groups. Post hoc Tukey's test was employed for pairwise comparisons where significant differences were detected.

Discussion

In this study, the authors investigated the accuracy of cephalometric analyses performed by three AI platforms (WebCeph, WeDoCeph, and CephX) compared to traditional manual digital tracing (NemoCeph) across different skeletal classes (Class I, II, and III). The inclusion criteria ensured high-quality cephalometric images, while exclusion criteria eliminated images with unclear landmarks or significant dental anomalies. A total of 120 images were analyzed, with angular and linear parameters measured and compared across the groups. Statistical analyses revealed significant differences in both angular and linear measurements among the AI platforms and the orthodontist-performed group, highlighting the variability in results due to calibration methods and potential biases inherent in the AI systems.

The findings indicated that while AI platforms generally provided consistent analyses, significant discrepancies were observed in specific measurements, particularly in linear parameters such as Co-A and Co-Gn. The study emphasized the importance of calibration methods, noting that manual calibration (as used by WeDoCeph) yielded results more consistent with the orthodontist group compared to automated calibration (CephX). Furthermore, the authors acknowledged limitations, including the reliance on a single orthodontist for landmark identification and the potential for data and algorithmic biases in AI systems. They recommend future studies to incorporate a broader range of orthodontic expertise and to evaluate the impact of manual intervention on accuracy in cephalometric analyses.