تشخيص الأمراض الفموية والوجهية باستخدام التعلم العميق Diagnosing oral and maxillofacial diseases using deep learning

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-52929-0
PMID: https://pubmed.ncbi.nlm.nih.gov/38291068
تاريخ النشر: 2024-01-30
المؤلف: Junegyu Kang وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تقدم البحث DOLNet، وهو شبكة عصبية جديدة مصممة لتصنيف وتحديد مواقع الآفات السنية في الأشعة السينية البانورامية، مع معالجة التحديات مثل التحيزات الموضعية وعدم التوازن في الفئات. يستخدم DOLNet آلية انتباه هرمية تؤثر بشكل متبادل تعمل عبر مقاييس الصور المختلفة، مما يسمح للنموذج بتعلم كل من التمثيلات العالمية للفك والفروق المحلية بين الأنسجة الطبيعية وغير الطبيعية. من خلال استخدام الانتباه المحلي لتوليد خرائط انتباه عالمية بين القطع، يعزز النموذج معايرة تمثيلات مستوى القطعة، مما يؤدي إلى تحسين تمثيل مستوى الصورة الكاملة. بالإضافة إلى ذلك، تقنية مبتكرة لتكبير البيانات، تُسمى LesionMix، تقوم بتوليد حالات غير طبيعية جديدة من خلال دمج قصاصات الآفات مع الصور الطبيعية، مما يخفف بشكل فعال من عدم التوازن في الفئات.

تظهر النتائج التجريبية أن DOLNet يتفوق بشكل كبير على الأساليب الحالية، محققًا زيادات تصل إلى 42.4% في الاسترجاع و44.2% في درجات F1، متجاوزًا أداء التصنيف للأطباء البشريين ذوي الخبرة بنسبة 10.7% في الاسترجاع و10.8% في درجة F1. تُظهر الطريقة متانة في تحديد مواقع الآفات على الرغم من التغيرات في الحجم والموقع. تؤكد دراسة الإزالة أن الانتباه الهرمي وLesionMix يساهمان بشكل كبير في فعالية النموذج، خاصة في تشخيص الحالات الصعبة. على الرغم من أن النتائج واعدة، يعترف المؤلفون بالقيود في تمييز أنواع الآفات المتشابهة ويقترحون أن التحليلات متعددة الوسائط، بما في ذلك صور الأشعة المقطعية، واستخدام النماذج التوليدية لتوليد عينات التدريب يمكن أن تعزز الأداء في المستقبل.

الطرق

في هذا القسم، يحدد المؤلفون طرق تقييم الأداء المستخدمة لتقييم قدرات التصنيف لـ DOLNet. يستخدمون أربعة مقاييس رئيسية: الدقة، الاسترجاع، الدقة، ودرجة F1، بالإضافة إلى تقاطع الاتحاد (IoU) لمهام التقسيم. يتم تقييم أداء DOLNet مقارنة بثلاث طرق حديثة، مما يضمن مقارنة صارمة من خلال تنفيذ وتدريب هذه الطرق على مجموعة البيانات \( D \). ومن الجدير بالذكر أنه بالنسبة للطريقة الحديثة المشار إليها برقم 19، قام المؤلفون بتعزيز مجموعة التدريب بـ 100 عينة طبيعية إضافية لتلبية متطلباتها للتدريب المسبق على مجموعة بيانات كبيرة.

علاوة على ذلك، يتضمن الدراسة تقييمًا مقارنًا للأطباء البشريين، وتحديدًا ثلاثة جراحين فمويين ووجهين واثنين من الممارسين العامين، الذين قاموا بتصنيف الآفات من مجموعة فرعية مختارة عشوائيًا من \( D \) (المعروفة باسم \( D_{\text{tiny}} \) والتي تتكون من 45 ورم أميلوبلاستي (ABs)، 59 كيس كيراتوسي سني (OKCs)، 120 كيس سنّي (DCs)، و120 عينة طبيعية). قام كل طبيب بإجراء التصنيف بشكل مستقل، باستخدام صور JPEG من نفس مجموعة البيانات المستخدمة للتحقق من النموذج. تأكد المؤلفون من موثوقية عالية في تقييماتهم، حيث تجاوزت موثوقية المقيم الداخلي والخارجي 95% قبل التقييم النهائي.

النتائج

في هذا القسم، يقدم المؤلفون نتائج تجاربهم التي تهدف إلى تقييم الأداء التشخيصي لـ DOLNet مقارنة بالطرق السابقة والأطباء البشريين. تشير النتائج إلى أن DOLNet يظهر قدرات تشخيصية متفوقة، مع تحليل مفصل لكيفية مساهمة مكونات النموذج المختلفة في أدائه. على وجه التحديد، يتم تعريف دوال الخسارة المستخدمة في النموذج، بما في ذلك خسارة التصنيف \( L_{\text{cls}} \) وخسارة التحديد \( L_{\text{loc}} \)، رياضيًا، مما يبرز أدوارها في تحسين دقة النموذج.

بالإضافة إلى ذلك، تبحث الدراسة في التناسق بين مخرجات التصنيف والتحديد لـ DOLNet، فضلاً عن تأثير حجم الآفة وموقعها على النتائج التشخيصية. تكشف التجارب أن كل من حجم وموقع الآفات يؤثران بشكل كبير على دقة النموذج التشخيصية، مما يشير إلى أن DOLNet قوي عبر ظروف مختلفة. بشكل عام، تؤكد النتائج فعالية DOLNet في التشخيص السريري، مما يوفر رؤى حول آلياته التشغيلية والمجالات المحتملة لمزيد من البحث.

المناقشة

في هذا القسم، يناقش المؤلفون التقدم في الأساليب المعتمدة على الشبكات العصبية لتشخيص الصور السنية، مع التركيز بشكل خاص على تقنيات تعلم الميزات ذات الصلة بالطريقة المقترحة، DOLNet. يبرزون استخدام طرق تصوير مختلفة مثل الأشعة السينية، والأشعة المقطعية المخروطية (CBCT)، والتصوير بالرنين المغناطيسي (MRI) لتحديد الأورام السنية. بينما تقدم CBCT وMRI قدرات تشخيصية ثلاثية الأبعاد متفوقة، تظل الأشعة السينية السنية مفيدة بسبب انخفاض التعرض للإشعاع وتكلفتها. يشير المؤلفون إلى أن نماذج التعلم الآلي الحديثة، بما في ذلك GoogLeNet ونسخ YOLO، قد تم استخدامها لتصنيف وتحديد مواقع الآفات، لكن هذه النماذج تواجه قيودًا في التعامل مع الصور عالية الدقة بشكل فعال. يتناول DOLNet هذه التحديات من خلال استخدام نهج قائم على القطع يسمح بأحجام قطع مخصصة، مما يعزز قدرة النموذج على تعلم العلاقات عبر القطع من خلال آلية الانتباه.

يستفيض المؤلفون في توضيح أهمية تعلم الميزات في التصوير الطبي، مع التركيز على التحليل القائم على القطع، وآليات الانتباه، وتقنيات تكبير البيانات المبتكرة. يصفون كيف أن الطرق القائمة على القطع ضرورية للصور الطبية عالية الدقة، خاصة في الأشعة السينية السنية حيث غالبًا ما تكون الآفات موضعية. يتم تسليط الضوء على آليات الانتباه لقدرتها على التركيز على مناطق الصورة ذات الصلة، مما يعزز استخراج الميزات. تقوم تقنية تكبير البيانات المقترحة LesionMix بتوليد عينات تدريب إضافية من خلال دمج الآفات من الفئات الأقل في الصور الطبيعية، مما يعالج بشكل فعال عدم التوازن في الفئات. يختتم القسم بملخص لهندسة DOLNet، التي تتكون من مرحلتين: استخراج تمثيل مستوى القطعة والتصنيف/التحديد العالمي، مما يظهر إمكاناتها لتجاوز الأساليب الحالية في كل من دقة التصنيف وتحديد مواقع الآفات.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-52929-0
PMID: https://pubmed.ncbi.nlm.nih.gov/38291068
Publication Date: 2024-01-30
Author(s): Junegyu Kang et al.
Primary Topic: Dental Radiography and Imaging

Overview

The research introduces DOLNet, a novel neural network designed to classify and localize odontogenic lesions in panoramic radiographs, addressing challenges such as positional biases and class imbalances. DOLNet employs a mutually influencing hierarchical attention mechanism that operates across different image scales, allowing the model to learn both global representations of the jaw and local discrepancies between normal and abnormal tissues. By utilizing local attention to generate inter-patch global attention maps, the model enhances the calibration of patch-level representations, leading to improved whole-image-level representation. Additionally, an innovative data augmentation technique, termed LesionMix, synthesizes new abnormal cases by merging lesion crops with normal images, effectively mitigating class imbalances.

The experimental results demonstrate that DOLNet significantly outperforms existing methods, achieving increases of up to 42.4% in recall and 44.2% in F1 scores, and surpassing the classification performance of experienced human clinicians by 10.7% in recall and 10.8% in F1 score. The method shows robustness in lesion localization despite variations in size and position. An ablation study confirms that the hierarchical attention and LesionMix contribute substantially to the model’s effectiveness, particularly in diagnosing challenging cases. While the results are promising, the authors acknowledge limitations in distinguishing similar lesion types and suggest that multimodal analyses, including CT images, and the use of generative models for training sample synthesis could enhance future performance.

Methods

In this section, the authors outline the performance evaluation methods employed to assess the classification capabilities of DOLNet. They utilize four key metrics: precision, recall, accuracy, and F1 score, alongside the intersection over union (IoU) for segmentation tasks. The performance of DOLNet is benchmarked against three recent methodologies, ensuring a rigorous comparison by implementing and training these methods on dataset \( D \). Notably, for the recent method referenced as 19, the authors augmented the training set with an additional 100 normal samples to accommodate its requirement for pre-training on a large-scale dataset.

Furthermore, the study includes a comparative evaluation of human clinicians, specifically three oral and maxillofacial surgeons and two general practitioners, who classified lesions from a randomly selected subset of \( D \) (termed \( D_{\text{tiny}} \) consisting of 45 ameloblastomas (ABs), 59 odontogenic keratocysts (OKCs), 120 dentigerous cysts (DCs), and 120 normal samples). Each clinician performed the classification independently, using JPEG images of the same data set employed for model validation. The authors ensured high reliability in their evaluations, with intra- and inter-rater reliability exceeding 95% prior to the final assessment.

Results

In this section, the authors present the results of their experiments aimed at evaluating the diagnostic performance of DOLNet compared to previous methodologies and human clinicians. The findings indicate that DOLNet demonstrates superior diagnostic capabilities, with a detailed analysis of how various components of the model contribute to its performance. Specifically, the loss functions utilized in the model, including classification loss \( L_{\text{cls}} \) and localization loss \( L_{\text{loc}} \), are defined mathematically, highlighting their roles in optimizing the model’s accuracy.

Additionally, the study investigates the consistency between classification and localization outputs of DOLNet, as well as the impact of lesion size and location on diagnostic outcomes. The experiments reveal that both the size and location of lesions significantly influence the model’s diagnostic accuracy, suggesting that DOLNet is robust across varying conditions. Overall, the results affirm the effectiveness of DOLNet in clinical diagnostics, providing insights into its operational mechanics and potential areas for further research.

Discussion

In this section, the authors discuss the advancements in neural network-based approaches for dental image diagnosis, particularly focusing on feature learning techniques relevant to their proposed method, DOLNet. They highlight the use of various imaging modalities such as radiography, cone-beam computed tomography (CBCT), and magnetic resonance imaging (MRI) for identifying odontogenic tumors. While CBCT and MRI offer superior 3D diagnostic capabilities, dental radiography remains advantageous due to its lower radiation exposure and cost. The authors note that recent machine learning models, including GoogLeNet and YOLO variants, have been employed for lesion classification and localization, but these models face limitations in handling high-resolution images effectively. DOLNet addresses these challenges by utilizing a patch-based approach that allows for tailored patch sizes, enhancing the model’s ability to learn relationships across patches through an attention mechanism.

The authors further elaborate on the significance of feature learning in medical imaging, emphasizing patch-based analysis, attention mechanisms, and innovative data augmentation techniques. They describe how patch-based methods are crucial for high-resolution medical images, particularly in dental radiographs where lesions are often localized. Attention mechanisms are highlighted for their ability to focus on relevant image regions, enhancing feature extraction. The proposed LesionMix data augmentation technique synthesizes additional training samples by integrating lesions from minority classes into normal images, effectively addressing class imbalance. The section concludes with a summary of DOLNet’s architecture, which comprises two stages: patch-level representation extraction and global classification/localization, demonstrating its potential to outperform existing methods in both classification accuracy and lesion localization.