التعرف التلقائي على معالم الأنسجة الصلبة واللينة في الأشعة المقطعية المخروطية عبر التعلم العميق مع مجموعات بيانات متنوعة: دراسة منهجية Automatic identification of hard and soft tissue landmarks in cone-beam computed tomography via deep learning with diversity datasets: a methodological study

المجلة: BMC Oral Health، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12903-025-05831-8
PMID: https://pubmed.ncbi.nlm.nih.gov/40200295
تاريخ النشر: 2025-04-08
المؤلف: Yan Jiang وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تتناول هذه الدراسة التحديات المتعلقة بالكشف اليدوي عن المعالم في التصوير المقطعي المحوسب باستخدام شعاع مخروط (CBCT) لتقييمات الوجه والجمجمة، وهو أمر يستغرق وقتًا طويلاً ويعتمد على الخبرة الطبية. طور الباحثون خوارزمية تعلم عميق للتنبؤ تلقائيًا وتحديد 43 معلمًا من معالم الوجه والجمجمة من الأنسجة الرخوة والصلبة في صور CBCT من 498 مريضًا يعانون من أنواع مختلفة من سوء الإطباق. تم تقييم دقة الخوارزمية باستخدام متوسط الخطأ المطلق (MAE) ومتوسط الخطأ الشعاعي (MRE)، جنبًا إلى جنب مع معدل الكشف الناجح (SDR).

أشارت النتائج إلى أن الخوارزمية حققت متوسط خطأ مطلق قدره 0.74 مم ومتوسط خطأ شعاعي إجمالي قدره 1.76 ± 1.13 مم، مع معدلات SDR بلغت 60.16% و91.05% و97.58% ضمن نطاقات خطأ 2 مم و3 مم و4 مم، على التوالي. ومن الجدير بالذكر أن متوسط MRE لمعالم الأنسجة الصلبة كان 1.73 مم، بينما كان لمعالم الأنسجة الرخوة 1.84 مم. تشير النتائج إلى أن الخوارزمية المقترحة تقدم مستوى مقبول سريريًا من الدقة والموثوقية للكشف التلقائي عن المعالم عبر أنواع مختلفة من سوء الإطباق الهيكلي، مما يمهد الطريق لتحسين القدرات التشخيصية في تقييمات الوجه والجمجمة ثلاثية الأبعاد وإمكانية تطبيقات أوسع في الممارسة السريرية والبحث.

مقدمة

تسلط مقدمة ورقة البحث الضوء على أهمية التحليل السيفالومتري في تقويم الأسنان وجراحة الوجه والفكين، مع التأكيد على ضرورة تحديد المعالم التشريحية بدقة للتشخيص الفعال وتخطيط العلاج. الطرق اليدوية التقليدية تستغرق وقتًا طويلاً وتخضع للتفاوت، مما يدفع لاستكشاف الذكاء الاصطناعي (AI) لأتمتة قياس الصور السيفالومترية الجانبية ثنائية الأبعاد (2D). بينما أظهرت الأنظمة الآلية دقة محسنة وقابلية للتكرار، إلا أنها لا تزال تواجه تحديات بسبب القيود الجوهرية في التصوير ثنائي الأبعاد، مثل الأنسجة المتداخلة وتفاوت التكبير.

تناقش الورقة أيضًا مزايا التصوير المقطعي المحوسب باستخدام شعاع مخروط (CBCT) كوسيلة تصوير متفوقة، حيث توفر بيانات ثلاثية الأبعاد (3D) عالية الدقة مع الحد الأدنى من التعرض للإشعاع. يُعترف بـ CBCT كمعيار ذهبي لتشخيص الفجوات الفكية وتقييم الشذوذات المعقدة في الوجه والجمجمة. ومع ذلك، فإن التعرف اليدوي على المعالم في الصور ثلاثية الأبعاد لا يزال يتطلب جهدًا كبيرًا وعرضة للأخطاء. تهدف الدراسة إلى معالجة هذه التحديات من خلال تطوير نظام تحديد مواقع المعالم ثلاثية الأبعاد الآلي، مستفيدة من التقدمات الأخيرة في الذكاء الاصطناعي وتوسيع مجموعة البيانات لتشمل حالات متنوعة وفئات إضافية من المعالم. من المتوقع أن يعزز هذا النهج دقة وموثوقية تحديد المعالم الآلي، مما يحسن في النهاية تشخيص التشوهات السنية والوجهية.

الطرق

في هذه الدراسة، تم تقسيم مجموعات البيانات بشكل منهجي إلى مجموعات تدريب واختبار بنسبة 5:1، باستخدام صور تحتوي على 43 معلمًا، مخزنة بتنسيق nii.gz. تم تصنيف الصور إلى دقة منخفضة (حجم أدنى قدره $350 \times 350 \times 350$ مع تباعد فوكسي قدره 0.4 مم) ودقة عالية (حجم أدنى قدره $650 \times 650 \times 650$ مع تباعد فوكسي قدره 0.25 مم). تم إعادة عينة جميع الصور إلى دقة موحدة قدرها $96 \times 96 \times 96$ باستخدام الاستيفاء الثلاثي، مع تعديل إحداثيات المعالم بشكل متناسب وفقًا لدقتها.

شملت معلمات النموذج الفائقة حجم دفعة قدره 6، وحجم نواة غاوسي قدره 21 لخرائط الحرارة، وقيمة سيغما قدرها 2، وإجمالي 200 دورة تدريبية. تم استخدام مُحسِّن AdamW مع الإعدادات الافتراضية. تم التدريب على خادم يحتوي على معالج Intel Xeon Gold 6325 (2.90 GHz) وبطاقة رسومات NVIDIA A40 بسعة 48 جيجابايت من الذاكرة، باستخدام cuDNN الإصدار 8.9.0.2 لتسريع GPU، وتم تنفيذه في إطار عمل PyTorch (الإصدار 2.2.2). استغرق كل توقع لمعلم حوالي 4.2 ثوانٍ على GPU، بينما تطلبت عمليات المسح الاختباري 8.8 جيجابايت من ذاكرة التخزين المؤقت و2.1 جيجابايت من ذاكرة GPU، في حين كان متوسط عدد الحركات لكل وكيل 90 حركة للوصول بدقة إلى مواقع المعالم.

النتائج

تشير نتائج الدراسة حول الكشف عن المعالم المرجعية إلى مستوى عالٍ من الاتفاق بين طرق الكشف اليدوي والآلي، حيث تجاوزت معاملات الارتباط الداخلي بين المراقبين (ICCs) 0.9 لكل معلم. شملت التحليلات 498 مجموعة بيانات صور، حيث تم حساب المسافات الإقليدية بين إحداثيات المعالم المحددة يدويًا وتلك التي تم الكشف عنها تلقائيًا. تكشف النتائج، الملخصة في الجدول 4، أن متوسط الخطأ المطلق الإجمالي (MAE) عبر المحاور x وy وz كان 0.71 مم و0.67 مم و0.85 مم، على التوالي، مما أدى إلى متوسط خطأ شعاعي (MRE) قدره 1.76 ± 1.13 مم. كانت معدلات الكشف الناجح (SDR) 60.16% و91.05% و97.58% ضمن حدود خطأ 2 مم و3 مم و4 مم، على التوالي.

من بين 43 معلمًا تم تقييمها، تم تصنيف 32 على أنها من الأنسجة الصلبة و11 على أنها من الأنسجة الرخوة. كان متوسط MRE لمعالم الأنسجة الصلبة 1.73 مم، بينما كان لمعالم الأنسجة الرخوة 1.84 مم، وكلاهما يلبي معيار الدقة السريرية بحدود 2 مم. ومن الجدير بالذكر أن 78.1% من معالم الأنسجة الصلبة و72.7% من معالم الأنسجة الرخوة حققت MRE أقل من 2 مم. أظهر المعلم L1i أدنى MRE قدره 1.24 مم، بينما كان لدى Pogs أعلى MRE قدره 2.86 مم، مما يبرز التباين في دقة الكشف عبر معالم مختلفة.

المناقشة

في هذه الدراسة، طور المؤلفون خوارزمية مبتكرة للتعرف التلقائي على المعالم التشريحية في صور التصوير المقطعي المحوسب باستخدام شعاع مخروط (CBCT)، مما يظهر أداءً قويًا عبر تنوع ديموغرافيات المرضى وأنواع سوء الإطباق. تم إجراء مراجعة شاملة للأدبيات لإبلاغ تصميم الدراسة، مما أدى إلى تضمين 498 مسح CBCT مؤهل من مجموعة استرجاعية تضم 963 مسحًا. حققت الخوارزمية، المعتمدة على بنية U-Net المعززة بوحدة الانتباه العالمية الفعالة (EGA)، متوسط خطأ شعاعي (MRE) قدره 1.76 مم ومعدل كشف ناجح (SDR) قدره 97.58% ضمن هامش خطأ قدره 4 مم، مما يشير إلى دقة مقبولة سريريًا. يتجاوز هذا الأداء الأساليب السابقة، خاصة في الحالات الصعبة التي تتضمن عدم انتظام تشريحي.

تسلط الدراسة الضوء على إمكانية الكشف الآلي عن المعالم لتحسين دقة التشخيص في تقييمات الوجه والجمجمة، خاصة في المناطق المحرومة حيث قد يكون الوصول إلى الأطباء الخبراء محدودًا. تشير كفاءة الخوارزمية، التي تتطلب فقط 4.2 ثوانٍ لتوقع المعالم، إلى أنها يمكن أن تقلل بشكل كبير من الوقت والتكلفة المرتبطة بالتعرف اليدوي على المعالم. علاوة على ذلك، فإن تضمين مجموعة بيانات متنوعة، تشمل مراحل مختلفة من الأسنان والشذوذات التشريحية، يعزز من موثوقية الخوارزمية وقابليتها للتعميم. تدعو النتائج إلى إجراء أبحاث مستقبلية لاستكشاف دمج مجموعات بيانات CT عالية الجودة لتحسين أداء الخوارزمية، بهدف تحقيق قدرات تشخيصية آلية بالكامل في تصوير الوجه والجمجمة ثلاثي الأبعاد.

Journal: BMC Oral Health, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12903-025-05831-8
PMID: https://pubmed.ncbi.nlm.nih.gov/40200295
Publication Date: 2025-04-08
Author(s): Yan Jiang et al.
Primary Topic: Dental Radiography and Imaging

Overview

This study addresses the challenges of manual landmark detection in cone beam computed tomography (CBCT) for craniofacial evaluations, which is both time-consuming and reliant on medical expertise. The researchers developed a deep learning algorithm to automatically predict and locate 43 soft and hard tissue craniofacial landmarks in CBCT images from 498 patients with various malocclusions. The accuracy of the algorithm was assessed using mean absolute error (MAE) and mean radial error (MRE), alongside the successful detection rate (SDR).

The results indicated that the algorithm achieved a mean absolute error of 0.74 mm and an overall MRE of 1.76 ± 1.13 mm, with SDRs of 60.16%, 91.05%, and 97.58% within 2-, 3-, and 4-mm error ranges, respectively. Notably, the average MRE for hard tissue landmarks was 1.73 mm, while for soft tissue landmarks it was 1.84 mm. The findings suggest that the proposed algorithm offers a clinically acceptable level of accuracy and robustness for automatic landmark detection across various skeletal malocclusions, paving the way for enhanced diagnostic capabilities in 3D craniomaxillofacial assessments and the potential for broader applications in clinical practice and research.

Introduction

The introduction of the research paper highlights the significance of cephalometric analysis in orthodontics and maxillofacial surgery, emphasizing the necessity for accurate anatomical landmark identification for effective diagnosis and treatment planning. Traditional manual methods are time-consuming and subject to variability, prompting the exploration of artificial intelligence (AI) for automating the measurement of two-dimensional (2D) lateral cephalograms. While automated systems have shown improved accuracy and repeatability, they still face challenges due to inherent limitations in 2D imaging, such as overlapping tissues and varying magnifications.

The paper further discusses the advantages of cone-beam computed tomography (CBCT) as a superior imaging modality, providing high-resolution, three-dimensional (3D) data with minimal radiation exposure. CBCT is recognized as the gold standard for diagnosing maxillary discrepancies and assessing complex craniofacial anomalies. However, the manual identification of landmarks in 3D images remains labor-intensive and error-prone. The study aims to address these challenges by developing an automated 3D landmark localization system, leveraging recent advancements in AI and expanding the dataset to include diverse cases and additional landmark categories. This approach is expected to enhance the accuracy and robustness of automated landmark identification, ultimately improving the diagnosis of dental and maxillofacial deformities.

Methods

In this study, the datasets were systematically divided into training and test sets with a 5:1 ratio, utilizing images that contained 43 landmarks, stored in the nii.gz format. The images were categorized into low-resolution (minimum size of $350 \times 350 \times 350$ with a voxel spacing of 0.4 mm) and high-resolution (minimum size of $650 \times 650 \times 650$ with a voxel spacing of 0.25 mm). All images were resampled to a uniform resolution of $96 \times 96 \times 96$ using trilinear interpolation, with landmark coordinates proportionally scaled according to their respective resolutions.

The model’s hyperparameters included a batch size of 6, a Gaussian kernel size of 21 for the heat maps, a sigma value of 2, and a total of 200 training epochs. The AdamW optimizer was employed with default settings. Training was conducted on a server featuring an Intel Xeon Gold 6325 CPU (2.90 GHz) and an NVIDIA A40 GPU with 48 GB of memory, utilizing cuDNN version 8.9.0.2 for GPU acceleration, and implemented in the PyTorch framework (version 2.2.2). Each landmark prediction required approximately 4.2 seconds on the GPU, with test scans necessitating 8.8 GB of cache memory and 2.1 GB of GPU memory, while each agent averaged 90 moves to accurately reach the landmark positions.

Results

The results of the study on reference landmark detection indicate a high level of agreement between manual and automated detection methods, with interobserver intraclass correlation coefficients (ICCs) exceeding 0.9 for each landmark. The analysis involved 498 image datasets, where the Euclidean distances between manually identified and automatically detected landmark coordinates were computed. The findings, summarized in Table 4, reveal that the overall mean absolute error (MAE) across the x, y, and z axes was 0.71 mm, 0.67 mm, and 0.85 mm, respectively, resulting in a mean radial error (MRE) of 1.76 ± 1.13 mm. The successful detection rates (SDR) were 60.16%, 91.05%, and 97.58% within 2 mm, 3 mm, and 4 mm error thresholds, respectively.

Among the 43 landmarks assessed, 32 were classified as hard tissue and 11 as soft tissue. The average MRE for hard tissue landmarks was 1.73 mm, while for soft tissue landmarks, it was 1.84 mm, both of which satisfy the clinical accuracy criterion of a 2 mm threshold. Notably, 78.1% of hard tissue landmarks and 72.7% of soft tissue landmarks achieved an MRE of less than 2 mm. The landmark L1i exhibited the lowest MRE of 1.24 mm, whereas Pogs had the highest MRE of 2.86 mm, highlighting the variability in detection accuracy across different landmarks.

Discussion

In this study, the authors developed an innovative algorithm for the automatic identification of anatomical landmarks in Cone Beam Computed Tomography (CBCT) images, demonstrating robust performance across diverse patient demographics and malocclusion types. A comprehensive literature review was conducted to inform the study design, leading to the inclusion of 498 eligible CBCT scans from a retrospective collection of 963 scans. The algorithm, based on a U-Net architecture enhanced with an Efficient Global Attention (EGA) module, achieved a mean radial error (MRE) of 1.76 mm and a success detection rate (SDR) of 97.58% within a 4 mm error margin, indicating clinically acceptable accuracy. This performance surpasses previous methods, particularly in challenging cases involving anatomical irregularities.

The study highlights the potential of automated landmark detection to improve diagnostic accuracy in craniomaxillofacial assessments, particularly in underserved areas where access to expert clinicians may be limited. The algorithm’s efficiency, requiring only 4.2 seconds for landmark prediction, suggests it could significantly reduce the time and cost associated with manual landmark identification. Furthermore, the inclusion of a diverse dataset, encompassing various dentition phases and anatomical anomalies, enhances the algorithm’s robustness and generalizability. The findings advocate for future research to explore the integration of high-quality CT datasets to further refine algorithm performance, ultimately aiming for fully automated diagnostic capabilities in 3D craniomaxillofacial imaging.