التقسيم التلقائي واكتشاف المعالم في صور CBCT ثلاثية الأبعاد باستخدام التعلم شبه المراقب للمساعدة في تخطيط جراحة الفك التقويمي Automatic segmentation and landmark detection of 3D CBCT images using semi supervised learning for assisting orthognathic surgery planning

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-93317-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40087502
تاريخ النشر: 2025-03-14
المؤلف: Haomin Tang وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تتناول هذه الدراسة التحديات التي يواجهها المرضى الذين يعانون من وضع غير طبيعي للفك، مما يستلزم إجراء جراحة تقويم الفك لتحسين العلاقات الإطباقية والجمالية الوجهية. تستخدم الأبحاث نهج التعلم شبه المراقب لتحليل صور الأشعة المقطعية ثلاثية الأبعاد (CBCT)، مع التركيز على التقسيم التلقائي للفك العلوي والسفلي، بالإضافة إلى اكتشاف المعالم التشريحية. تشير النتائج إلى أن الخوارزمية شبه المراقبة حققت معامل ديس 93.41% لتقسيم الفك العلوي و96.89% لتقسيم الفك السفلي، إلى جانب متوسط خطأ اكتشاف المعالم قدره 1.908 ± 1.166 مم. تتفوق هذه النتائج على تلك التي تم الحصول عليها باستخدام خوارزمية مراقبة كاملة مع بيانات توضيحية مكافئة.

تخلص الدراسة إلى أن طريقة التعلم شبه المراقب المقترحة يمكن أن تساعد بشكل كبير في التخطيط قبل الجراحة لجراحة تقويم الفك من خلال توفير تقييمات دقيقة لوضع الفك العلوي والسفلي وعدم التماثل. هذه التطورات لديها القدرة على تحسين نتائج الجراحة وتحسين رعاية المرضى في البيئات السريرية.

الطرق

في هذه الدراسة، استخدم الإعداد التجريبي بطاقة رسومات NVIDIA RTX 3090 لتعزيز تدريب واختبار نموذج التعلم العميق على بيانات صور الأشعة المقطعية ثلاثية الأبعاد (CBCT). كان نظام التشغيل المستخدم هو Windows 10، وتم إجراء تحليل البيانات باستخدام Python الإصدار 3.7.11 وإطار التعلم العميق PyTorch الإصدار 1.10.2. تم تحسين تدريب نموذج VNet أو نموذج المعلم المتوسط باستخدام مُحسّن AdamW، مع تعيين معدل التعلم وتدهور الوزن على 0.0001، وتم تكوين معلمات بيتا عند 0.9 و0.999.

بالنسبة لمدخلات التدريب، تم توحيد جميع مواصفات الصور لتكون بأبعاد $132 \times 160 \times 160$. شمل عملية التدريب مراقبة دقة وقيم فقدان مجموعة التدريب لتقييم التقارب، حيث تم تدريب النموذج على مدى 100 دورة. كان الهدف من هذا النهج المنهجي هو ضمان تقييم أداء قوي لنموذج التعلم العميق على مجموعة بيانات صور CBCT المحددة.

النتائج

في هذه الدراسة، تم مقارنة أداء تقسيم طريقة التعلم شبه المراقب مع طريقة التعلم المراقب لتقسيم الفك العلوي والسفلي، وتم تقييمها باستخدام خمسة مقاييس: معامل تشابه ديس (DSC)، القيمة التنبؤية الإيجابية (PPV)، الحساسية (SEN)، متوسط المسافة السطحية (ASD)، والمسافة هاوسدورف (HD). أشارت النتائج إلى أن الطريقة شبه المراقبة تفوقت على الطريقة المراقبة، حيث حققت DSC قدره 93.41 لتقسيم الفك العلوي (تحسين بنسبة 1.62%) و96.89 لتقسيم الفك السفلي (تحسين بنسبة 1.56%). من الجدير بالذكر أن جودة التقسيم كانت أفضل للفك السفلي مقارنة بالفك العلوي، على الأرجح بسبب الهيكل الأكثر تعقيدًا للفك العلوي. بالإضافة إلى ذلك، أوضحت خريطة ملونة ثلاثية الأبعاد الفروق بين التقسيمات التلقائية واليدوية، مما يبرز المجالات التي تحتاج إلى تحسين في المستقبل.

بالنسبة لاكتشاف المعالم، قارن الدراسة أداء نهج التعلم شبه المراقب (باستخدام نموذج المعلم المتوسط) ضد طريقة VNet المراقبة، سواء مع أو بدون تقدير عدم اليقين. وُجد أن متوسط خطأ الاختبار عبر 18 معلمًا كان 2.11 مم للطريقة المقترحة، وهو أقل بمقدار 0.6 مم من المعلم المتوسط وأقل بمقدار 0.9 مم من VNet، على الرغم من معدل التقارب الأبطأ. أظهرت تصورات مخرجات النموذج من خلال خرائط الحرارة ثلاثية الأبعاد فعالية النهج شبه المراقب، حيث كانت إحداثيات المعالم المتوقعة تتماشى بشكل وثيق مع التسميات الفعلية، كما هو موضح في الأشكال والرسوم البيانية المصاحبة.

المناقشة

في هذه الدراسة، تم استخدام بيانات الأشعة المقطعية ثلاثية الأبعاد (CBCT) من 192 مريضًا للتصوير قبل الجراحة في جراحة تقويم الفك. كانت مجموعة البيانات تتكون أساسًا من صور غير مصنفة، مع مجموعة فرعية مصنفة من قبل ثلاثة خبراء سريريين ذوي خبرة. شمل عملية التسمية تقسيمًا يدويًا للهياكل التشريحية، بما في ذلك الفك العلوي والسفلي، ووضع 18 معلمًا. أشار معامل الارتباط بين الفئات (ICC) لوضع المعالم إلى توافق عالٍ بين الخبراء (0.95 إلى 1.00)، مع متوسط خطأ قدره 0.89 ± 0.34 مم عبر جميع التوضيحات. شمل معالجة البيانات السابقة التطبيع لقيم البكسل وإعادة أخذ العينات للحفاظ على أحجام الإدخال المتسقة للنموذج، إلى جانب تقنيات تعزيز البيانات لتحسين تعميم النموذج.

تمت معالجة مهام التقسيم واكتشاف المعالم باستخدام إطار عمل التعلم شبه المراقب، مع استخدام نموذج المعلم المتوسط بشكل خاص. استغل هذا الأسلوب كل من البيانات المصنفة وغير المصنفة، حيث تم تطبيق تنظيم الاتساق لتحسين التنبؤات. دمجت دالة الفقد النهائية خسائر المراقبة وغير المراقبة، مع معامل وزن تصاعدي لتحقيق التوازن بين مساهماتها أثناء التدريب. بالنسبة لاكتشاف المعالم، تم اعتماد نهج انحدار خريطة الحرارة، باستخدام خسارة جناح تكيفية لمعالجة التحديات في تحديد المعالم بدقة. قدمت الدراسة أيضًا آلية واعية بعدم اليقين لتعزيز موثوقية التنبؤات التي يقوم بها نموذج المعلم، بهدف تحسين دقة اكتشاف المعالم التلقائي في التطبيقات السريرية. شملت مقاييس التقييم معامل تشابه ديس، القيمة التنبؤية الإيجابية، الحساسية، ومتوسط المسافة السطحية، مما يوفر تقييمًا شاملاً لأداء النموذج.

القيود

تقدم الدراسة عدة قيود قد تؤثر على قابلية تعميم ودقة نتائجها. أولاً، فإن حجم العينة الصغيرة، المستمدة حصريًا من مستشفى غوييانغ لطب الأسنان، يحد من قوة النتائج. تتطلب الشبكات العصبية العميقة، وخاصة تلك ذات عدد المعلمات المرتفع، عادةً بيانات تدريب واسعة لتحسين الأداء. تأثرت عملية جمع البيانات المحدودة بعوامل مثل القيود المالية والزمنية، والتحديات في مشاركة البيانات بين المؤسسات، وتردد المشاركين.

ثانيًا، لم تحقق دقة المعالم المحددة تلقائيًا المعايير السريرية بالكامل، ربما بسبب عدم كفاية تحسين نموذج هيكل VNet. على الرغم من هذه القيود، فإن الطريقة المقترحة تظهر فائدة من خلال تمكين الأطباء من تحديد مواقع المعالم التقريبية بسرعة، مما يسهل سير العمل لديهم.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-93317-6
PMID: https://pubmed.ncbi.nlm.nih.gov/40087502
Publication Date: 2025-03-14
Author(s): Haomin Tang et al.
Primary Topic: Dental Radiography and Imaging

Overview

This study addresses the challenges faced by patients with abnormal jaw positioning, which necessitates orthognathic surgery to enhance occlusal relationships and facial aesthetics. The research employs a semi-supervised learning approach to analyze 3D cone beam computed tomography (CBCT) images, focusing on the automatic segmentation of the maxilla and mandible, as well as the detection of anatomical landmarks. The findings indicate that the semi-supervised algorithm achieved a Dice coefficient of 93.41% for maxillary segmentation and 96.89% for mandibular segmentation, alongside an average landmark detection error of 1.908 ± 1.166 mm. These results outperform those obtained using a fully supervised algorithm with equivalent data annotation.

The study concludes that the proposed semi-supervised learning method can significantly aid in the preoperative planning of orthognathic surgery by providing accurate assessments of maxillary and mandibular positioning and asymmetry. This advancement has the potential to enhance surgical outcomes and improve patient care in clinical settings.

Methods

In this study, the experimental setup utilized an NVIDIA RTX 3090 graphics card to enhance the training and testing of a deep learning model on Cone Beam Computed Tomography (CBCT) image data. The operating system employed was Windows 10, with data analysis conducted using Python version 3.7.11 and the deep learning framework PyTorch version 1.10.2. The training of the VNet or Mean Teacher model was optimized using the AdamW optimizer, with a learning rate and weight decay both set to 0.0001, and beta parameters configured at 0.9 and 0.999.

For the training input, all image specifications were standardized to dimensions of $132 \times 160 \times 160$. The training process involved monitoring the accuracy and loss value curves of the training set to assess convergence, with the model being trained over a total of 100 epochs. This methodological approach aimed to ensure robust performance evaluation of the deep learning model on the specified CBCT image dataset.

Results

In this study, the segmentation performance of a semi-supervised learning method was compared to that of a supervised learning method for maxillary and mandibular segmentation, evaluated using five metrics: Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), Sensitivity (SEN), Average Surface Distance (ASD), and Hausdorff Distance (HD). The results indicated that the semi-supervised method outperformed the supervised method, achieving a DSC of 93.41 for maxilla segmentation (1.62% improvement) and 96.89 for mandible segmentation (1.56% improvement). Notably, the segmentation quality was better for the mandible than the maxilla, likely due to the more complex structure of the maxilla. Additionally, a 3D color map illustrated the discrepancies between the automated and manual segmentations, highlighting areas for future improvement.

For landmark detection, the study compared the performance of the semi-supervised learning approach (using a mean teacher model) against the supervised VNet method, both with and without uncertainty estimation. The average test error across 18 landmarks was found to be 2.11 mm for the proposed method, which was 0.6 mm lower than the mean teacher and 0.9 mm lower than VNet, despite a slower convergence rate. Visualization of the model’s output through 3D heatmaps demonstrated the effectiveness of the semi-supervised approach, with the predicted landmark coordinates closely aligning with the actual labels, as presented in the accompanying figures and box plots.

Discussion

In this study, 3D Cone Beam Computed Tomography (CBCT) data from 192 patients were utilized for preoperative scanning in orthognathic surgery. The dataset primarily consisted of unlabeled images, with a subset labeled by three experienced clinical experts. The labeling process involved manual segmentation of anatomical structures, including the maxilla and mandible, and the placement of 18 landmarks. The inter-class correlation coefficient (ICC) for the landmark placements indicated high consistency among the experts (0.95 to 1.00), with a mean error of 0.89 ± 0.34 mm across all annotations. Data preprocessing included normalization of pixel values and resampling to maintain consistent input sizes for the model, alongside data augmentation techniques to enhance model generalization.

The segmentation and landmark detection tasks were approached using a semi-supervised learning framework, specifically employing the Mean Teacher model. This method leveraged both labeled and unlabeled data, applying consistency regularization to improve predictions. The final loss function combined supervised and unsupervised losses, with a ramp-up weighting coefficient to balance their contributions during training. For landmark detection, a heatmap regression approach was adopted, utilizing Adaptive Wing Loss to address challenges in accurately locating landmarks. The study also introduced an uncertainty-aware mechanism to enhance the reliability of the predictions made by the teacher model, ultimately aiming to improve the accuracy of automatic landmark detection in clinical applications. Evaluation metrics included the Dice similarity coefficient, positive predictive value, sensitivity, and mean surface distance, providing a comprehensive assessment of the model’s performance.

Limitations

The study presents several limitations that may affect the generalizability and accuracy of its findings. Firstly, the small sample size, derived exclusively from Guiyang Stomatological Hospital, restricts the robustness of the results. Deep neural networks, particularly those with high parameter counts, typically require extensive training data to enhance performance. The limited data collection was influenced by factors such as financial and temporal constraints, challenges in data sharing among institutions, and participant reluctance.

Secondly, the accuracy of automatically located landmarks did not fully meet clinical standards, potentially due to insufficient model refinement of the VNet architecture. Despite these limitations, the proposed method demonstrates utility by enabling clinicians to quickly identify approximate landmark locations, thereby streamlining their workflow.