تقييم مقارن لبرامج تتبع السيفالومتري المعتمدة على الذكاء الاصطناعي المتاحة تجاريًا Comparative evaluation of commercially available AI-based cephalometric tracing programs

المجلة: BMC Oral Health، المجلد: 24، العدد: 1
DOI: https://doi.org/10.1186/s12903-024-05032-9
PMID: https://pubmed.ncbi.nlm.nih.gov/39425100
تاريخ النشر: 2024-10-18
المؤلف: Nida Baig وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

هدفت الدراسة إلى تقييم دقة وتوافق التشخيص لثلاثة برامج تتبع سيفالومتري جانبي قائم على الذكاء الاصطناعي: WebCeph™، Cephio، وCeppro DDH Inc.، مقارنةً بقياسات الخبراء البشريين. تم تحليل ما مجموعه 63 صورة شعاعية سيفالومترية جانبية باستخدام كل من البرامج شبه الآلية والقائمة على الذكاء الاصطناعي. قام الباحثون بتقييم موثوقية القياسات البشرية من خلال الملاحظات الداخلية والخارجية، مستخدمين تحليل التباين الأحادي المتكرر لمقارنة النتائج من برامج الذكاء الاصطناعي مع تلك الخاصة بالخبراء البشريين. تم تقييم الأداء التشخيصي بشكل إضافي من خلال اختبارات الحساسية والنوعية.

أظهرت النتائج أن موثوقية الخبراء البشريين كانت ممتازة (ICC > 0.9) لمعظم المعلمات السيفالومترية. ومع ذلك، لوحظت اختلافات كبيرة بين برامج الذكاء الاصطناعي والخبراء البشريين، حيث أظهر WebCeph™ عدم دقة في 10 من أصل 11 قياسًا، بينما أظهرت Cephio وCeppro DDH Inc. عدم دقة في 7 من أصل 11 قياسًا. كانت التغيرات التي تتجاوز وحدتين شائعة، خاصة في تحديد الأنماط الهيكلية السهمية والرأسية، بالإضافة إلى خصائص الأسنان والأنسجة الرخوة. وبالتالي، خلصت الدراسة إلى أن جميع برامج التتبع القائمة على الذكاء الاصطناعي أظهرت عدم دقة واختلافات ملحوظة، مما يشير إلى أن الأطباء يجب أن يتوخوا الحذر عند الاعتماد فقط على تحليلات الذكاء الاصطناعي في تخطيط وتقييم العلاج التقويمي.

مقدمة

تسلط المقدمة الضوء على تطور التحليل السيفالومتري على مدار السبعين عامًا الماضية، مع التأكيد على الاعتراف بالقيود في السيفالومترية الجانبية التقليدية. لقد حظيت دراسة محورية حول أخطاء القياس باهتمام كبير في تقويم الأسنان، مما يبرز الحاجة إلى تحسين الدقة. إن دمج الذكاء الاصطناعي (AI) في هذا المجال يمثل طريقًا واعدًا لتقليل الأخطاء، وأتمتة التحليلات، وتعزيز الدقة التشخيصية، خاصة في تقييم نتائج العلاج. أدت التطورات الأخيرة إلى تطوير تطبيقات قائمة على الذكاء الاصطناعي مؤتمتة بالكامل تهدف إلى تحسين موثوقية القياسات السيفالومترية، على الرغم من أن أدائها قد يتأثر بعوامل مثل مرحلة الأسنان، والعيوب الخلقية، وجودة الصورة.

تناقش المقدمة أيضًا برامج تتبع سيفالومتري قائمة على الذكاء الاصطناعي متاحة تجاريًا، بما في ذلك CephX، AudaxCeph، وغيرها، مشيرة إلى أنه بينما تم توثيق بعض الخوارزميات، مثل تلك المستخدمة في WebCeph وCEPPRO، تظل التفاصيل حول الآخرين نادرة. على الرغم من الدقة التي أظهرتها خوارزميات التعلم العميق في اكتشاف المعالم على الصور الشعاعية السيفالومترية، لا يزال هناك حاجة إلى تحليل مقارن مع أداء الأطباء. تهدف هذه الدراسة إلى تقييم الدقة، والكفاءة، وتوافق التشخيص لثلاثة برامج تتبع سيفالومتري جانبي مدعومة بالذكاء الاصطناعي، مع افتراض أن هذه الحلول القائمة على الذكاء الاصطناعي يمكن أن تحقق أداءً مماثلاً للطرق شبه الآلية للقياسات السيفالومترية الأساسية.

طرق

يستعرض قسم “المواد والطرق” التصميم التجريبي والإجراءات المستخدمة في الدراسة. يوضح المواد المحددة المستخدمة، بما في ذلك أي مواد كيميائية، معدات، وعينات بيولوجية، لضمان إمكانية تكرار التجارب. تشمل المنهجية البروتوكولات المتبعة لجمع البيانات، بما في ذلك أي تحليلات إحصائية تم تطبيقها لتفسير النتائج.

بالإضافة إلى ذلك، قد يصف القسم الظروف التجريبية، مثل درجة الحرارة، والمدة، وأي ضوابط تم تنفيذها للتحقق من النتائج. بشكل عام، يخدم هذا القسم كإطار شامل لفهم كيفية إجراء البحث، مما يسمح بالتقييم النقدي وإمكانية التكرار من قبل باحثين آخرين في هذا المجال.

نتائج

تشير نتائج الدراسة إلى أن القياسات القائمة على الذكاء الاصطناعي التي تم إنشاؤها بواسطة ثلاثة برامج أظهرت موثوقية ممتازة، مع معاملات ارتباط داخل الفئة (ICC) تتجاوز 0.99. كما أظهر المحققون البشريون توافقًا عاليًا داخل الفئة (ICC > 0.9)، بينما كان التوافق بين الفئتين بين محققين بشريين فوق 0.9 لجميع المعلمات باستثناء SNA، الذي كان لديه ICC قدره 0.89. وبالتالي، تم تحديد الوسيط للقياسات البشرية كمعيار مرجعي لمزيد من التحليلات. أظهر تحليل التباين الأحادي المتكرر اختلافات ذات دلالة إحصائية بين الكيانات الأربعة (البشر وثلاثة برامج ذكاء اصطناعي) عبر جميع المعلمات، مع مؤشرات المقارنات الزوجية بعد الاختبار تشير إلى اختلافات كبيرة بين برامج الذكاء الاصطناعي والمعيار المرجعي.

على وجه التحديد، أظهر WebCeph™ اختلافات كبيرة في 10 من أصل 11 معلمة مقارنةً بالمعيار المرجعي، بينما اختلفت CEPHIO وCEPPRO DDH Inc. في 7 من أصل 11 معلمة. تراوحت الفروق المتوسطة بين المعيار المرجعي وWebCeph™ من 2 إلى 9 وحدات لمعظم المعلمات، مع استثناءات ملحوظة لـ U1-Apog. كما لوحظت اختلافات في تحديد الأنماط الهيكلية السهمية والرأسية بين برامج الذكاء الاصطناعي، إلى جانب التغيرات في قياسات الأسنان والأنسجة الرخوة. على الرغم من أن بعض المعلمات حققت قيم حساسية ونوعية قريبة من 100%، إلا أن الأداء العام لجميع برامج الذكاء الاصطناعي كان منخفضًا، مع قيم أقل من 80%، باستثناء اكتشاف CEPHIO لـ FMA وSN-MP، الذي اقترب من 80%.

مناقشة

في هذه الدراسة، تم تقييم أداء ثلاثة برامج تتبع سيفالومتري قائمة على الذكاء الاصطناعي—WebCeph™، Cephio، وCeppro DDH Inc.—مقابل التحليل السيفالومتري الرقمي شبه الآلي الذي أجراه خبراء بشريون. ركز التحليل على القياسات الخطية والزوايا الرئيسية المستخدمة عادةً في تخطيط حالات تقويم الأسنان. على الرغم من القدرات السريعة للتحليل لبرامج الذكاء الاصطناعي، أشارت النتائج إلى عدم دقة كبيرة مقارنةً بقياسات الخبراء البشريين، خاصةً في تحديد المعالم الحرجة مثل النقطة A، التي تم الإبلاغ عنها سابقًا على أنها تحدٍ لكل من الطرق البشرية وطرق الذكاء الاصطناعي. وجدت الدراسة أنه بينما كانت موثوقية الخبراء البشريين عالية بشكل عام، أظهرت برامج الذكاء الاصطناعي اختلافات كبيرة، خاصةً في القياسات الزاوية، مما يشير إلى أن الخوارزميات المستخدمة قد لا تأخذ في الاعتبار التباينات التشريحية بشكل كافٍ.

تسلط النتائج الضوء على قيود برامج السيفالومترية القائمة على الذكاء الاصطناعي الحالية، التي، على الرغم من كفاءتها، لا تتطابق مع دقة التحليل البشري. كانت الأخطاء في تحديد المعالم وعدم اتساق القياسات شائعة، مما يثير القلق بشأن القابلية السريرية لهذه الأدوات القائمة على الذكاء الاصطناعي في تقويم الأسنان. تؤكد الدراسة على ضرورة استمرار مشاركة البشر في التحليل السيفالومتري لضمان الدقة التشخيصية وتقترح أن تركز التطورات المستقبلية في الذكاء الاصطناعي على تحسين اكتشاف المعالم ودمج مجموعات بيانات تشريحية متنوعة لتعزيز الأداء. بشكل عام، تتحدى النتائج الفكرة القائلة بأن الذكاء الاصطناعي يمكن أن يحل محل الخبرة البشرية بالكامل في التحليل السيفالومتري، مما يبرز أهمية الإشراف البشري في البيئات السريرية.

Journal: BMC Oral Health, Volume: 24, Issue: 1
DOI: https://doi.org/10.1186/s12903-024-05032-9
PMID: https://pubmed.ncbi.nlm.nih.gov/39425100
Publication Date: 2024-10-18
Author(s): Nida Baig et al.
Primary Topic: Dental Radiography and Imaging

Overview

The study aimed to evaluate the accuracy and diagnostic concordance of three AI-based lateral cephalometric tracing software programs: WebCeph™, Cephio, and Ceppro DDH Inc., in comparison to human expert measurements. A total of 63 lateral cephalometric radiographs were analyzed using both semi-automatic and AI-based software. The researchers assessed intra- and inter-observer reliability for human measurements, employing repeated-measures one-way ANOVA to compare results from AI programs with those of human experts. Diagnostic performance was further evaluated through sensitivity and specificity tests.

The findings revealed that human expert reliability was excellent (ICC > 0.9) for most cephalometric parameters. However, significant discrepancies were noted between the AI programs and human experts, with WebCeph™ showing inaccuracies in 10 out of 11 measurements, while Cephio and Ceppro DDH Inc. exhibited inaccuracies in 7 out of 11 measurements. Variations exceeding two units were common, particularly in defining sagittal and vertical skeletal patterns, as well as dental and soft tissue characteristics. Consequently, the study concluded that all three AI-based tracing programs demonstrated notable inaccuracies and inconsistencies, suggesting that clinicians should exercise caution when relying solely on AI analyses for orthodontic treatment planning and assessment.

Introduction

The introduction highlights the evolution of cephalometric analysis over the past seventy years, emphasizing the recognition of limitations in traditional lateral cephalometry. A pivotal study on measurement errors has garnered significant attention in orthodontics, underscoring the need for improved accuracy. The integration of artificial intelligence (AI) into this field presents a promising avenue for reducing errors, automating analyses, and enhancing diagnostic precision, particularly in treatment outcome assessments. Recent advancements have led to the development of fully automated AI-based applications aimed at improving the reliability of cephalometric measurements, although their performance may be influenced by factors such as dentition stage, congenital anomalies, and image quality.

The introduction also discusses various commercially available AI-based cephalometric tracing software, including CephX, AudaxCeph, and others, noting that while some algorithms, like those used in WebCeph and CEPPRO, are documented, details on others remain scarce. Despite the demonstrated accuracy of deep learning algorithms in detecting landmarks on cephalometric radiographs, a comparative analysis with clinician performance is still needed. This study aims to evaluate the accuracy, efficiency, and diagnostic concordance of three AI-assisted lateral cephalometric tracing software, positing that these AI solutions can achieve comparable performance to semi-automatic methods for essential cephalometric measurements.

Methods

The “Materials and Methods” section outlines the experimental design and procedures employed in the study. It details the specific materials used, including any reagents, equipment, and biological samples, ensuring reproducibility of the experiments. The methodology encompasses the protocols followed for data collection, including any statistical analyses applied to interpret the results.

Additionally, the section may describe the experimental conditions, such as temperature, duration, and any controls implemented to validate the findings. Overall, this section serves to provide a comprehensive framework for understanding how the research was conducted, allowing for critical evaluation and potential replication by other researchers in the field.

Results

The results of the study indicate that the AI-based measurements generated by three software programs demonstrated excellent reliability, with intra-class correlation coefficients (ICC) exceeding 0.99. Human investigators also exhibited high intra-class agreement (ICC > 0.9), while inter-class agreement between two human investigators was above 0.9 for all parameters except for SNA, which had an ICC of 0.89. Consequently, the median of human measurements was established as the reference standard for further analyses. A repeated-measures one-way ANOVA revealed statistically significant differences among the four entities (human and three AI programs) across all parameters, with post-hoc pairwise comparisons indicating significant discrepancies between the AI software and the reference standard.

Specifically, WebCeph™ showed significant differences in 10 out of 11 parameters compared to the reference standard, while CEPHIO and CEPPRO DDH Inc. differed in 7 out of 11 parameters. The mean differences between the reference standard and WebCeph™ ranged from 2 to 9 units for most parameters, with notable exceptions for U1-Apog. Differences in defining sagittal and vertical skeletal patterns were also observed among the AI software, alongside variations in dental and soft tissue measurements. Although some parameters achieved sensitivity and specificity values close to 100%, overall performance for all AI software was low, with values under 80%, except for CEPHIO’s detection of FMA and SN-MP, which approached 80%.

Discussion

In this study, the performance of three AI-based cephalometric tracing programs—WebCeph™, Cephio, and Ceppro DDH Inc.—was evaluated against semi-automatic digital cephalometric analysis conducted by human experts. The analysis focused on key linear and angular measurements commonly used in orthodontic case planning. Despite the rapid analysis capabilities of the AI programs, the results indicated significant inaccuracies compared to human expert measurements, particularly in identifying critical landmarks such as point A, which has been previously reported as challenging for both human and AI methods. The study found that while human expert reliability was generally high, the AI programs exhibited substantial discrepancies, particularly in angular measurements, suggesting that the algorithms used may not adequately account for anatomical variations.

The findings highlight the limitations of current AI-based cephalometric programs, which, despite their efficiency, do not match the accuracy of human analysis. Errors in landmark identification and measurement inconsistencies were prevalent, raising concerns about the clinical applicability of these AI tools in orthodontics. The study emphasizes the necessity for continued human involvement in cephalometric analysis to ensure diagnostic accuracy and suggests that future AI developments should focus on improving landmark detection and integrating diverse anatomical datasets to enhance performance. Overall, the results challenge the notion that AI can fully replace human expertise in cephalometric analysis, underscoring the importance of human oversight in clinical settings.