موثوقية ودقة البرمجيات المعتمدة على الذكاء الاصطناعي للتشخيص السيفالومتري. دراسة تشخيصية Reliability and accuracy of Artificial intelligence-based software for cephalometric diagnosis. A diagnostic study

المجلة: BMC Oral Health، المجلد: 24، العدد: 1
DOI: https://doi.org/10.1186/s12903-024-05097-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39468520
تاريخ النشر: 2024-10-28
المؤلف: Jean-Philippe Mercier وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تدرس الدراسة تأثير الذكاء الاصطناعي (AI) على التشخيص السيفالومتري في تقويم الأسنان، مع التركيز على مقارنة موثوقية ودقة وكفاءة البرمجيات المعتمدة على الذكاء الاصطناعي مقابل الطرق الرقمية التقليدية في تحليل 408 سيفالومترات جانبية. تم استخدام ثلاث تقنيات: تحديد المعالم يدويًا، التحديد التلقائي، والتحديد شبه التلقائي باستخدام الذكاء الاصطناعي. تم تقييم خمسة عشر متغيرًا، تشمل قياسات هيكلية، سنية، وقياسات الأنسجة الرخوة. أظهرت التحليلات الإحصائية وجود اختلافات كبيرة في دقة وضع المعالم بين الطرق (p < 0.01)، على الرغم من أن هذه الاختلافات لم تكن ذات دلالة سريرية كبيرة، حيث أظهرت الطريقة شبه التلقائية المعتمدة على الذكاء الاصطناعي دقة أفضل بشكل طفيف مقارنة بالتقنية التقليدية. فيما يتعلق بكفاءة الوقت، كانت الطريقة التلقائية المعتمدة على الذكاء الاصطناعي الأسرع، تليها التقنيات شبه التلقائية والتقليدية (p < 0.000). بينما سلطت الدراسة الضوء على الاختلافات الإحصائية في الدقة، خاصة لبعض القياسات مثل SNA (°)، Wits (مم)، و(L1)-NB (°)، كانت الأهمية السريرية العامة لهذه الاختلافات محدودة. تشير النتائج إلى أنه بينما يمكن أن يعزز الذكاء الاصطناعي التحليل السيفالومتري، خاصة من خلال وضع المعالم تلقائيًا مع تعديلات المشغل، فإن المزيد من البحث ضروري للتحقق من دقته وموثوقيته في البيئات السريرية.

مقدمة

تسلط مقدمة ورقة البحث الضوء على الدور الحاسم للتشخيص الشامل في ممارسة تقويم الأسنان، خاصة من خلال التحليل السيفالومتري الذي يسهل بواسطة الأشعة السينية الجانبية ثنائية الأبعاد. هذا التحليل ضروري لتصنيف سوء الإطباق السني والهيكلي، وتقييم الهياكل القحفية الوجهية، ومراقبة النمو وتقدم العلاج. تقليديًا، كان تتبع السيفالومتر مهمة يدوية عرضة للأخطاء، لكن التقدم في البرمجيات الرقمية قد حسن الكفاءة. لقد حولت التطورات الأخيرة في البرمجيات المعتمدة على الذكاء الاصطناعي هذه العملية من خلال أتمتة اكتشاف المعالم وتوليد القياسات، مما يعزز موثوقية التشخيص واتساقه عبر مشغلين مختلفين.

على الرغم من مزايا الذكاء الاصطناعي، مثل قدرته على التعلم من مجموعات بيانات واسعة من الأشعة السينية المعلّمة، لا تزال هناك مخاوف بشأن قابليته للتعميم عبر السكان المتنوعين وضرورة الحكم السريري الدقيق في الحالات المعقدة. كما أن القضايا الأخلاقية والقانونية، بما في ذلك خصوصية البيانات وتعاون الأطباء، ذات صلة أيضًا. بينما يحمل الذكاء الاصطناعي وعدًا كبيرًا لتبسيط تخطيط علاج تقويم الأسنان، فإن الجسم الحالي من الأبحاث محدود، وغالبًا ما يتضمن أحجام عينات صغيرة. من الضروري مواصلة التحقيق للتحقق من فعالية الذكاء الاصطناعي في التحليل السيفالومتري وتوسيع تطبيقاته السريرية.

طرق البحث

في هذه الدراسة، تم إجراء تحليلات سيفالومترية باستخدام جهاز الأشعة السينية Satelec X-Mind Pano Ceph D+، حيث قام طلاب مختلفون بأداء السيفالوجرامات ومشغل واحد بتتبع المعالم. تم استخدام ثلاث طرق تشخيصية: طريقة يدوية رقمية، والتي كانت بمثابة المعيار الذهبي، وطريقتان معتمدتان على الذكاء الاصطناعي – شبه تلقائية وتلقائية. تضمنت الطريقة اليدوية الرقمية وضع المعالم يدويًا، بينما استخدمت طرق الذكاء الاصطناعي البرمجيات لتحديد النقاط السيفالومترية تلقائيًا أو شبه تلقائيًا، مما يسمح بتحسين وضع المعالم.

تم إجراء التحليل الإحصائي باستخدام SPSS (الإصدار 25). تم تقييم طبيعة المتغيرات الكمية من خلال الانحراف، والتفرطح، واختبار كولموغوروف-سميرنوف، مع تحديد عتبة دلالة عند p < 0.01. شملت الإحصائيات الوصفية المتوسط، الوسيط، الانحراف المعياري، ونطاق الربع. تم تقييم دقة الطرق من خلال حساب متوسط الخطأ المطلق بالنسبة للطريقة التقليدية. تم تحليل الاختلافات في القياسات باستخدام اختبار ويلكوكسون لعينة مزدوجة واختبار t لقياسات مزدوجة، بينما تم استخدام اختبار فريدمان لمقارنة أوقات القياس عبر التقنيات الثلاث.

النتائج

في هذا القسم، يتم تقديم نتائج تحليل مقارن بين تقنيات التتبع الرقمية واليدوية في السيفالومتر، مع تسليط الضوء على مزايا الطرق الرقمية في تقليل أخطاء تحديد المعالم. استخدمت الدراسة تتبعًا رقميًا تقليديًا كمعيار ذهبي، مدعومًا بأبحاث سابقة تظهر دقته. تم استخدام البرمجيات WebCeph™، التي تم الموافقة عليها مؤخرًا من قبل إدارة الغذاء والدواء وKFAD، نظرًا لقدراتها في التعلم الآلي، المدربة على مجموعة بيانات شاملة من الصور القحفية الوجهية والبيانات السريرية المرتبطة.

تشير النتائج إلى أن كل من خيارات التتبع شبه التلقائي والتلقائي المعتمدة على الذكاء الاصطناعي أظهرت اختلافات ذات دلالة إحصائية مقارنة بالطريقة الرقمية التقليدية؛ ومع ذلك، لم تكن هذه الاختلافات ذات دلالة سريرية كبيرة. يتماشى هذا مع الدراسات السابقة التي تؤكد موثوقية الذكاء الاصطناعي في القياسات السيفالومترية، على الرغم من الإشارة إلى المخاوف بشأن قوة اكتشاف المعالم المعتمد على التعلم العميق. بالإضافة إلى ذلك، تم تسليط الضوء على التباينات في نتائج القياس، حيث أفادت بعض الدراسات بالتقديرات الزائدة والنقص لبعض الزوايا، مما يتناقض مع نتائج هذه الدراسة، التي أظهرت نقصًا في بعض القياسات مثل (U1)-NA (°) و(L1)-NB (°).

المناقشة

هدفت الدراسة إلى تقييم موثوقية ودقة وكفاءة الوقت للبرمجيات المعتمدة على الذكاء الاصطناعي لتحليل السيفالومتر الجانبي ثنائي الأبعاد مقارنة بالطرق الرقمية التقليدية. وفقًا لإرشادات STARD والمعايير الأخلاقية، شملت الأبحاث 408 سيفالوجرامات جانبية من مرضى تقويم الأسنان، مع استبعاد تلك ذات الجودة الرديئة أو غير المعترف بها من قبل الذكاء الاصطناعي. تضمنت الطريقة التقليدية تتبعًا يدويًا باستخدام NemoCeph™، بينما استخدمت الطرق المعتمدة على الذكاء الاصطناعي (التلقائية بالكامل وشبه التلقائية) خوارزميات التعلم العميق لاكتشاف المعالم وحساب القياسات. قامت الدراسة بتقييم معلمات سيفالومترية مختلفة وقياس الوقت المستغرق لكل طريقة، مما كشف أن الخيار التلقائي المعتمد على الذكاء الاصطناعي كان أسرع بكثير من كل من الطرق شبه التلقائية والتقليدية.

أشارت النتائج إلى أن الطريقة شبه التلقائية المعتمدة على الذكاء الاصطناعي قدمت عمومًا دقة أعلى من النسخة التلقائية بالكامل لمعظم المعلمات، على الرغم من أن كلا الطريقتين المعتمدتين على الذكاء الاصطناعي أظهرتا تباينات مقارنة بالتقنية التقليدية. تم العثور على اختلافات ذات دلالة إحصائية لعدة قياسات، ولكن القليل منها كان ذا دلالة سريرية، خاصة عند عتبات أكثر صرامة (1° أو 1 مم). تم تحقيق موثوقية عالية بين المشغلين وفيما بينهم، مما يشير إلى أن طرق الذكاء الاصطناعي يمكن أن تكون موثوقة، على الرغم من بعض القيود في الدقة. خلصت الدراسة إلى أنه بينما تظهر الطرق المعتمدة على الذكاء الاصطناعي وعدًا للتحليل السيفالومتري، فإن المزيد من البحث ضروري لتعزيز دقتها وموثوقيتها في البيئات السريرية.

القيود

تسلط القيود المتعلقة بالذكاء الاصطناعي في تحديد المعالم داخل الهياكل القحفية الوجهية الضوء على العديد من الدراسات. أشار ليو وآخرون [38] إلى أن المعالم الموجودة على الحدود المحددة جيدًا، مثل النقاط S وM، أظهرت أخطاء تحديد أصغر، بينما تلك الموجودة في مناطق أقل تحديدًا، مثل النقاط P وO، أظهرت عدم دقة أكبر. تتوافق هذه النتيجة مع ما أشار إليه كانغ وآخرون [39]، الذين لاحظوا أن الهياكل السيفالومترية ذات التباين العالي أظهرت أخطاء متوسطة أقل مقارنة بتلك ذات التباين المنخفض في الأشعة السينية. تؤكد الحاجة إلى تعديل المشغلين لتحديد مواقع المعالم للنقاط مثل القاعدة والعمود الأنفي الخلفي التحديات التي تواجه تحقيق الدقة.

علاوة على ذلك، أفاد إندرمون وآخرون [40] وجيانغ وآخرون [41] بصعوبات في تحديد المعالم بدقة مثل الجونون والنازيون، حيث تم عزو هذه الأخطاء إلى جودة الصورة الرديئة والتباينات التشريحية، مثل شكل القواطع. كما أكد دوران وآخرون [34] أن الذكاء الاصطناعي يظهر دقة منخفضة في قياسات الأنسجة الرخوة. توضح هذه الدراسات مجتمعة التأثير الكبير لوضوح التشريح وجودة الصورة على أداء الذكاء الاصطناعي في تحديد المعالم القحفية الوجهية.

Journal: BMC Oral Health, Volume: 24, Issue: 1
DOI: https://doi.org/10.1186/s12903-024-05097-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39468520
Publication Date: 2024-10-28
Author(s): Jean-Philippe Mercier et al.
Primary Topic: Dental Radiography and Imaging

Overview

The study investigates the impact of artificial intelligence (AI) on cephalometric diagnosis in orthodontics, specifically comparing the reliability, accuracy, and efficiency of AI-based software against conventional digital methods in analyzing 408 lateral cephalometries. Three techniques were employed: manual landmark localization, automatic localization, and semi-automatic localization using AI. Fifteen variables, encompassing skeletal, dental, and soft tissue measurements, were assessed. Statistical analysis revealed significant differences in landmark positioning accuracy among the methods (p < 0.01), although these differences were largely not clinically significant, with the semi-automatic AI method showing marginally better accuracy than the conventional technique. In terms of time efficiency, the automatic AI method was the quickest, followed by the semi-automatic and conventional techniques (p < 0.000). While the study highlighted statistical differences in accuracy, particularly for specific measurements such as SNA (°), Wits (mm), and (L1)-NB (°), the overall clinical relevance of these differences was limited. The findings suggest that while AI can enhance cephalometric analysis, particularly through automatic landmark placement with operator adjustments, further research is necessary to validate its accuracy and reliability in clinical settings.

Introduction

The introduction of the research paper highlights the critical role of comprehensive diagnosis in orthodontic practice, particularly through cephalometric analysis facilitated by 2D lateral radiography. This analysis is essential for classifying dental and skeletal malocclusions, assessing craniofacial structures, and monitoring growth and treatment progress. Traditionally, cephalometric tracing was a manual, error-prone task, but advancements in digital software have improved efficiency. Recent developments in AI-based software have further transformed this process by automating landmark detection and measurement generation, thereby enhancing diagnostic reliability and consistency across different operators.

Despite the advantages of AI, such as its ability to learn from extensive datasets of annotated radiographs, concerns regarding its generalizability across diverse populations and the necessity for nuanced clinical judgment in complex cases remain. Ethical and legal issues, including data privacy and clinician collaboration, are also pertinent. While AI holds significant promise for streamlining orthodontic treatment planning, the current body of research is limited, often featuring small sample sizes. Continued investigation is crucial to validate the effectiveness of AI in cephalometric analysis and to expand its clinical applications.

Methods

In this study, cephalometric analyses were conducted using a Satelec X-Mind Pano Ceph D+ X-ray machine, with different students performing the cephalograms and a single operator tracing the landmarks. Three diagnostic methods were employed: a digital manual method, which served as the gold standard, and two AI-based techniques—semi-automatic and automatic. The digital manual method involved the manual placement of landmarks, while the AI methods utilized software to automatically or semi-automatically localize cephalometric points, allowing for improved landmark placement.

Statistical analysis was carried out using SPSS (version 25). Normality of quantitative variables was assessed through skewness, kurtosis, and the Kolmogorov-Smirnov test, with a significance threshold set at p < 0.01. Descriptive statistics included mean, median, standard deviation, and interquartile range. The accuracy of the methods was evaluated by calculating the mean absolute error relative to the conventional method. Differences in measurements were analyzed using the Wilcoxon paired-sample test and Student's t-test for paired measurements, while the Friedman test was utilized to compare measurement times across the three techniques.

Results

In this section, the results of a comparative analysis between digital and manual tracing techniques in cephalometry are presented, highlighting the advantages of digital methods in reducing landmark localization errors. The study utilized conventional digital tracing as the gold standard, supported by previous research demonstrating its accuracy. The software WebCeph™, recently approved by the FDA and KFAD, was employed due to its machine learning capabilities, trained on a comprehensive dataset of craniofacial images and associated clinical data.

The findings indicate that both AI-based semiautomatic and automatic tracing options exhibited statistically significant differences compared to the conventional digital method; however, these differences were largely not clinically significant. This aligns with prior studies that affirm the reliability of AI in cephalometric measurements, although concerns regarding the robustness of deep learning-based landmark detection were noted. Additionally, discrepancies in measurement outcomes were highlighted, with some studies reporting overestimations and underestimations for specific angles, contrasting with the results of this study, which showed underestimations for certain measurements such as (U1)-NA (°) and (L1)-NB (°).

Discussion

The study aimed to evaluate the reliability, accuracy, and time efficiency of AI-based software for 2D lateral cephalometric analysis compared to conventional digital methods. Following the STARD guidelines and ethical standards, the research included 408 lateral cephalograms from orthodontic patients, excluding those with poor quality or unrecognized by the AI. The conventional method involved manual tracing using NemoCeph™, while the AI-based methods (fully automated and semi-automated) utilized deep learning algorithms for landmark detection and measurement calculation. The study assessed various cephalometric parameters and measured the time taken for each method, revealing that the AI-based automatic option was significantly faster than both the semi-automatic and conventional methods.

Results indicated that the AI-based semi-automatic method generally provided higher accuracy than the fully automated version for most parameters, although both AI methods showed discrepancies compared to the conventional technique. Statistically significant differences were found for several measurements, but only a few were clinically significant, particularly at stricter thresholds (1° or 1 mm). High intra- and inter-operator reliability was achieved, suggesting that the AI methods can be reliable, albeit with some limitations in accuracy. The study concluded that while AI-based methods show promise for cephalometric analysis, further research is necessary to enhance their accuracy and reliability in clinical settings.

Limitations

The limitations of AI in landmark localization within craniofacial structures are highlighted by various studies. Liu et al. [38] noted that landmarks situated on well-defined borders, such as points S and M, exhibited smaller localization errors, whereas those in less delineated areas, like points P and O, showed greater inaccuracies. This finding is corroborated by Kang et al. [39], who observed that cephalometric structures with high contrast yielded lower mean errors compared to those with low contrast in radiographs. The necessity for operators to adjust landmark localizations for points such as the basion and posterior nasal spine underscores the challenges faced in achieving accuracy.

Furthermore, Indermun et al. [40] and Jiang et al. [41] reported difficulties in accurately localizing landmarks like the Gonion and nasion, attributing these inaccuracies to poor image quality and anatomical variations, such as incisal shape. Duran et al. [34] also emphasized that AI demonstrates reduced accuracy in soft tissue measurements. Collectively, these studies illustrate the significant impact of anatomical clarity and image quality on the performance of AI in craniofacial landmark localization.