تقسيم الأسنان المبتكر باستخدام الميزات الهرمية ونمذجة التسلسل ثنائي الاتجاه Innovative tooth segmentation using hierarchical features and bidirectional sequence modeling

المجلة: Pattern Recognition، المجلد: 175
DOI: https://doi.org/10.1016/j.patcog.2026.113045
تاريخ النشر: 2026-01-03
المؤلف: Xinxin Zhao وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تتناول ورقة البحث التحديات المتعلقة بتجزئة صور الأسنان في الرقمنة السنية، مع تسليط الضوء على قيود أجهزة الترميز التقليدية التي تستخدم خرائط ميزات ذات دقة ثابتة. غالبًا ما تؤدي هذه الأجهزة إلى تجزئة غير متصلة وتفريق غير كافٍ بين المناطق المستهدفة والخلفية بسبب عدم قدرتها على نمذجة السياق البيئي والعالمي بشكل فعال. بالإضافة إلى ذلك، فإن العبء الحسابي المرتبط بآليات الانتباه الذاتي المعتمدة على المحولات، والتي تتميز بالتعقيد التربيعي (O(n²))، يجعلها غير فعالة في معالجة صور الأسنان عالية الدقة. للتغلب على هذه المشكلات، يقترح المؤلفون جهاز ترميز جديد من ثلاث مراحل يستخدم تمثيل ميزات هرمي لالتقاط المعلومات القابلة للتكيف مع المقياس، مندمجًا التفاصيل منخفضة المستوى والسمات عالية المستوى من خلال دمج الميزات عبر المقاييس.

تستفيد طريقة التجزئة المقترحة من نمذجة التسلسل ثنائية الاتجاه وتمثيل الميزات الهرمي، مما يظهر أداءً متفوقًا مقارنةً بالتقنيات الحالية في مجال الأسنان. تشير النتائج التجريبية إلى أن قدرة النموذج على استخراج ودمج الميزات متعددة المقاييس تعزز بشكل كبير دقة التجزئة. علاوة على ذلك، فإن نموذج الفضاء الحالتنظيمي داخل كتلة Mamba يلتقط ميزات الصورة بشكل فعال مع تقليل التعقيد الحسابي من خلال المعالجة المتوازية. على الرغم من هذه التقدمات، تواجه الطريقة صعوبات في تجزئة المناطق ذات الإضاءة الضعيفة والأنسجة الفموية المتشابهة تشريحيًا. ومع ذلك، فإن نتائج التجزئة عالية الجودة من مجموعتي بيانات صور الأسنان تؤكد على إمكانياتها للتطبيقات السريرية. تشمل اتجاهات البحث المستقبلية استكشاف التجزئة الدلالية متعددة النماذج، وتحسين الأداء في ظروف الإضاءة المنخفضة، وتحسين النموذج للنشر الخفيف.

مقدمة

تتناول مقدمة هذه الورقة البحثية المهمة الحرجة لتجزئة الأسنان في صور الأسنان، والتي تعتبر ضرورية لتطبيقات مثل تشخيص الأمراض، وتتبع العلاج، وتحليل الصور. أظهرت التقدمات الأخيرة في نماذج التجزئة، وخاصة نموذج Segment Anything (SAM)، إمكانيات في تحسين قدرات التعميم؛ ومع ذلك، تكافح الطرق الحالية في التقاط المعلومات السياقية متعددة المقاييس والعالمية، مما يؤدي إلى جودة تجزئة دون المستوى الأمثل في صور الأسنان المعقدة والصاخبة. تسلط الورقة الضوء على قيود الأطر المعتمدة على المحولات بسبب تعقيدها الحسابي التربيعي، مما يعيق المعالجة الفعالة للصور عالية الدقة.

للتغلب على هذه التحديات، يقترح المؤلفون خوارزمية تجزئة جديدة تجمع بين التجزئة عالية الجودة والكفاءة المحسنة. مستلهمين من نمذجة التسلسل ذات التعقيد الخطي لـ Mamba وتصميم HQ-SAM المدرك للحدود، يقدم الإطار المقترح استراتيجية تمثيل هرمي وكتلة تسلسل ثنائية الاتجاه (BSB) لتعزيز استخراج الميزات متعددة المقاييس وفهم السياق. تهدف هذه الطريقة إلى تحقيق تحديد دقيق للحدود وتجزيء فعال للهياكل التشريحية الدقيقة مع الحفاظ على زمن استجابة منخفض، حتى عند الدقات العالية. تظهر النتائج التجريبية أن الطريقة المقترحة تتفوق على التقنيات الحالية، مما ينتج عنه توليد أقنعة تجزئة عالية الجودة مع معالجة التحديات الحسابية المرتبطة بصور الأسنان عالية الدقة.

النتائج

في هذا القسم، يقيم المؤلفون فعالية طريقتهم المقترحة من خلال إجراء تحليلات مقارنة مع الأساليب الحالية عبر مجموعتين بيانات متميزتين. تشير النتائج إلى أن طريقتهم تظهر مقاييس أداء متفوقة، مما يبرز مزاياها من حيث الدقة والكفاءة. تدعم التحليلات الإحصائية التفصيلية والتمثيلات المرئية للبيانات هذه النتائج، مما يبرز قوة التقنية المقترحة في سيناريوهات مختلفة.

المناقشة

في قسم المناقشة، تستعرض الورقة التقدمات الأخيرة في تجزئة الصور عالية الجودة، وتجزيء الصور بدون تدريب، ونماذج الانتباه الخطي، واستخراج الميزات متعددة المقاييس. تم اعتماد تقنيات التجزئة عالية الجودة، وخاصة تلك التي تستخدم الشبكات العصبية التلافيفية (CNNs)، على نطاق واسع لكنها تواجه صعوبات في الاعتماديات بعيدة المدى. عالجت إدخال الهياكل المعتمدة على المحولات، مثل Mask Transformer وSegmenter، هذه القيود من خلال تعزيز الدقة من خلال آليات الانتباه الذاتي الفعالة. بالإضافة إلى ذلك، ظهرت نماذج مثل MobileFormer لتلبية احتياجات التجزئة في الوقت الحقيقي على الأجهزة المحمولة، على الرغم من أنها لا تزال تواجه تحديات تتعلق بالتعقيد الحسابي في المهام عالية الدقة.

يسلط القسم الضوء أيضًا على أهمية تجزئة الصور بدون تدريب، والتي تمكن النماذج من تجزئة الفئات التي لم تُرَ أثناء التدريب. أظهرت التقنيات التي تستفيد من أزواج الصور والنصوص على نطاق واسع والتعلم متعدد النماذج، مثل نموذج SAM، وعدًا في هذا المجال. علاوة على ذلك، تم تطوير نماذج الانتباه الخطي مثل Mamba وRWKV لتحسين الكفاءة الحسابية واستخدام الذاكرة مقارنةً بالمحولات التقليدية. أخيرًا، يتم التأكيد على استخراج الميزات متعددة المقاييس كأمر حاسم لمهام التنبؤ الكثيف، حيث تظهر الهياكل الهجينة أداءً محسناً في التقاط كل من التفاصيل الدقيقة والسياق العالمي. يدمج النهج المقترح في الورقة هذه التقدمات لتعزيز تجزئة صور الأسنان، مع معالجة التحديات المحددة التي تطرحها هذا المجال.

Journal: Pattern Recognition, Volume: 175
DOI: https://doi.org/10.1016/j.patcog.2026.113045
Publication Date: 2026-01-03
Author(s): Xinxin Zhao et al.
Primary Topic: Dental Radiography and Imaging

Overview

The research paper addresses the challenges of tooth image segmentation in dental digitization, highlighting the limitations of traditional image encoders that utilize fixed-resolution feature maps. These encoders often result in discontinuous segmentation and inadequate differentiation between target regions and background due to their inability to effectively model environmental and global context. Additionally, the computational overhead associated with transformer-based self-attention mechanisms, characterized by quadratic complexity (O(n²)), renders them inefficient for processing high-resolution dental images. To overcome these issues, the authors propose a novel three-stage encoder that employs hierarchical feature representation to capture scale-adaptive information, integrating low-level details and high-level semantics through cross-scale feature fusion.

The proposed segmentation method leverages bidirectional sequence modeling and hierarchical feature representation, demonstrating superior performance compared to existing segmentation techniques in the dental domain. Experimental results indicate that the model’s ability to extract and fuse multi-scale features significantly enhances segmentation accuracy. Furthermore, the structured state space model within the Mamba block effectively captures image features while minimizing computational complexity through parallel processing. Despite these advancements, the method encounters difficulties in segmenting poorly lit regions and anatomically similar oral tissues. Nonetheless, the high-quality segmentation results from two dental image datasets underscore its potential for clinical applications. Future research directions include exploring multimodal semantic segmentation, improving performance under low-light conditions, and optimizing the model for lightweight deployment.

Introduction

The introduction of this research paper addresses the critical task of tooth segmentation in dental images, which is essential for applications such as disease diagnosis, treatment tracking, and image analysis. Recent advancements in segmentation models, particularly the Segment Anything Model (SAM), have shown potential in improving generalization capabilities; however, existing methods struggle with capturing multi-scale and global contextual information, leading to suboptimal segmentation quality in complex and noisy dental images. The paper highlights the limitations of Transformer-based frameworks due to their quadratic computational complexity, which hampers efficient processing of high-resolution images.

To overcome these challenges, the authors propose a novel segmentation algorithm that combines high-quality segmentation with improved efficiency. Drawing inspiration from the linear-complexity sequence modeling of Mamba and the boundary-aware design of HQ-SAM, the proposed framework introduces a hierarchical representation strategy and a bidirectional sequence block (BSB) to enhance multi-scale feature extraction and contextual understanding. This approach aims to achieve precise boundary delineation and effective segmentation of fine anatomical structures while maintaining low latency, even at high resolutions. The experimental results demonstrate that the proposed method outperforms existing techniques, effectively generating high-quality segmentation masks while addressing the computational challenges associated with high-resolution dental images.

Results

In this section, the authors evaluate the effectiveness of their proposed method by conducting comparative analyses with existing approaches across two distinct datasets. The results indicate that their method demonstrates superior performance metrics, highlighting its advantages in terms of accuracy and efficiency. Detailed statistical analyses and visual representations of the data support these findings, underscoring the robustness of the proposed technique in various scenarios.

Discussion

In the discussion section, the paper reviews recent advancements in high-quality image segmentation, zero-shot image segmentation, linear attention models, and multi-scale feature extraction. High-quality segmentation techniques, particularly those utilizing convolutional neural networks (CNNs), have been widely adopted but struggle with long-range dependencies. The introduction of Transformer-based architectures, such as the Mask Transformer and Segmenter, has addressed these limitations by enhancing accuracy through effective self-attention mechanisms. Additionally, models like MobileFormer have emerged to cater to real-time segmentation needs on mobile devices, although they still face challenges related to computational complexity in high-resolution tasks.

The section also highlights the significance of zero-shot image segmentation, which enables models to segment categories not seen during training. Techniques leveraging large-scale image-text pairs and multimodal learning, such as the SAM model, have shown promise in this area. Furthermore, linear attention models like Mamba and RWKV have been developed to improve computational efficiency and memory usage compared to traditional Transformers. Lastly, multi-scale feature extraction is emphasized as crucial for dense prediction tasks, with hybrid architectures demonstrating improved performance in capturing both fine-grained details and global context. The proposed approach in the paper integrates these advancements to enhance dental image segmentation, addressing the specific challenges posed by this domain.