تصنيف أمراض الأسنان لدى الأطفال من الأشعة السينية البانورامية باستخدام نماذج تحويل اللغة الطبيعية والتعلم العميق Classification of pediatric dental diseases from panoramic radiographs using natural language transformer and deep learning models

المجلة: Frontiers in Artificial Intelligence، المجلد: 9
DOI: https://doi.org/10.3389/frai.2026.1754498
PMID: https://pubmed.ncbi.nlm.nih.gov/41930217
تاريخ النشر: 2026-03-03
المؤلف: Tuan D. Pham وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

تقدم ورقة البحث إطار عمل جديد قائم على النصوص لتصنيف الأمراض السنية لدى الأطفال من الأشعة السينية البانورامية، مع التركيز على إمكانيات معالجة اللغة الطبيعية في التصوير الطبي. من خلال توليد أوصاف نصية منظمة من الأشعة السينية باستخدام نموذج التحويل، تقيم الدراسة فعالية هذه الأوصاف في الكشف عن الأمراض الثنائية من خلال ثلاثة هياكل تعلم عميق: شبكة عصبية تلافيفية أحادية البعد (1D-CNN)، وشبكة الذاكرة طويلة وقصيرة الأجل (LSTM)، ونموذج تم تدريبه مسبقًا من تمثيلات الترميز ثنائية الاتجاه من التحويل (BERT). تشير النتائج إلى أن 1D-CNN حققت أعلى دقة بنسبة 84%، متفوقة على كل من نموذج BERT (دقة 77%) ونموذج LSTM (دقة 57%)، بالإضافة إلى ثلاثة شبكات عصبية تلافيفية تم تدريبها مسبقًا والتي قامت بتحليل الصور مباشرة.

تشير النتائج إلى أن التصنيف القائم على النصوص قد يكون بديلاً قابلاً للتطبيق للطرق التقليدية المعتمدة على الصور، حيث يقدم مزايا مثل تقليل الحاجة إلى معالجة الصور المعقدة وزيادة القابلية للتفسير. ومع ذلك، لا تزال هناك تحديات في ضمان القابلية العامة المتسقة عبر أنواع الأمراض المختلفة. تدعو الدراسة إلى مزيد من البحث لتحسين جودة توليد النصوص من الأشعة السينية، واستكشاف نماذج هجينة تدمج بين البيانات النصية والمرئية، والتحقق من صحة هذه الأساليب على مجموعات بيانات أكبر وأكثر تنوعًا. لا يحمل هذا المنهج وعودًا فقط لتصنيف الأمراض السنية، بل له أيضًا آثار أوسع على التشخيص المدعوم بالذكاء الاصطناعي في مجالات طبية أخرى، بما في ذلك الأشعة، والأمراض الجلدية، وعلم الأمراض.

مقدمة

تؤكد مقدمة ورقة البحث هذه على الحاجة الملحة للتشخيص الدقيق والمبكر للأمراض السنية لدى الأطفال لتسهيل العلاج الفعال ومنع المضاعفات طويلة الأمد. تعتبر الطرق التشخيصية التقليدية، التي تعتمد بشكل أساسي على التفسير اليدوي للأشعة السينية البانورامية، مستهلكة للوقت وعرضة للتباين بين الممارسين. لمعالجة هذه التحديات، تسلط الورقة الضوء على إمكانيات الذكاء الاصطناعي (AI)، وخاصة نماذج التعلم العميق، في تعزيز كفاءة التشخيص ودقته في طب الأسنان للأطفال. أدت التطورات الأخيرة إلى تطوير أطر عمل للذكاء الاصطناعي قادرة على اكتشاف الحالات السنية لدى الأطفال، مثل تحديد الجراثيم السنية الدائمة وتقييم مراحل تطور الأسنان، مما يبرز تأثير الذكاء الاصطناعي التحويلي على الكشف المبكر عن الأمراض واتخاذ القرارات السريرية.

على الرغم من التقدم الواعد، تشير الورقة إلى وجود تحديات كبيرة في تنفيذ الذكاء الاصطناعي في طب الأسنان للأطفال، بما في ذلك القضايا المتعلقة بجودة البيانات، وحجم مجموعة البيانات، وقابلية تعميم النموذج. يؤكد المؤلفون على أهمية البحث المستمر لتحسين منهجيات الذكاء الاصطناعي ومعالجة ندرة البيانات، والتي يمكن أن تعيق تدريب النموذج ودقة التشخيص. علاوة على ذلك، تقدم الورقة الاستخدام المبتكر لنماذج اللغة الكبيرة (LLMs)، مثل ChatGPT، لتوليد أوصاف نصية للحالات السنية من الأشعة السينية، مقترحة نهجًا جديدًا يدمج معالجة اللغة الطبيعية مع التعلم العميق لتحسين دقة التشخيص وقابلية التفسير. تهدف هذه الدراسة إلى استكشاف جدوى إطار عمل قائم على النصوص لتشخيص الأمراض السنية، مما قد يقدم بديلاً قابلاً للتوسع للطرق التقليدية المعتمدة على التصنيف القائم على الصور.

طرق

تحدد قسم “الطرق” الإجراءات التجريبية والتحليلية المستخدمة في الدراسة. توضح اختيار المشاركين، وتقنيات جمع البيانات، والتحليلات الإحصائية المستخدمة لتفسير النتائج. استخدمت الدراسة تصميم تجربة عشوائية محكومة، مما يضمن تخصيص المشاركين إما إلى المجموعة التجريبية أو مجموعة التحكم لتقليل التحيز. تم جمع البيانات من خلال استبيانات ومعايير موحدة، والتي خضعت بعد ذلك لاختبارات إحصائية صارمة، بما في ذلك اختبارات t وANOVA، لتقييم دلالة النتائج.

بالإضافة إلى ذلك، يصف القسم النماذج الرياضية المحددة المطبقة لتحليل البيانات، بما في ذلك تحليلات الانحدار لاستكشاف العلاقات بين المتغيرات. تؤكد المنهجية على إمكانية إعادة الإنتاج والشفافية، حيث توفر تفاصيل كافية لتكرارها من قبل باحثين آخرين. بشكل عام، فإن الطرق المستخدمة قوية ومناسبة لمعالجة الأسئلة البحثية المطروحة في الدراسة.

نتائج

قيمت الدراسة فعالية ChatGPT، استنادًا إلى بنية GPT-4 من OpenAI، في توليد أوصاف موجزة للاعتلالات العظمية المحتملة من الأشعة السينية البانورامية ذات الدقة المنخفضة، والتي تم تقليل دقتها بمقدار 25. سلطت الأوصاف المولدة الضوء على حالات سنية متنوعة، بما في ذلك فقدان العظام المحتمل، وعدم تماثل الفك، والأسنان الم impacted، وغيرها من الاعتلالات. استخدمت البحث نهجًا منهجيًا لتحويل بيانات النص إلى تسلسلات عددية لتدريب النموذج، مستخدمة تقسيمًا غير مصنف للاحتفاظ بالتدريب (90%) والتحقق (10%) عبر عدة تكرارات لضمان القوة.

بالإضافة إلى ذلك، تمت مقارنة أداء المصنفات القائمة على النصوص (1D-CNN، LSTM، BERT) مع المصنفات المعتمدة على الصور (SqueezeNet، GoogLeNet، AlexNet) باستخدام نفس الصور ذات الدقة المنخفضة. شمل عملية التدريب تقنيات تعزيز البيانات لتحسين تعميم النموذج. تم اشتقاق مقاييس الأداء من التحقق المتقاطع ذو 5 طيات، مع نتائج تشير إلى رؤى أولية حول سلوك النموذج والاتجاهات المقارنة بدلاً من دقة تشخيص نهائية. تؤكد النتائج على إمكانيات نماذج الذكاء الاصطناعي في التشخيص السني مع تسليط الضوء على الحاجة إلى مزيد من التحقق على مجموعات بيانات أكبر لتأسيس أداء قابل للتعميم سريريًا.

مناقشة

في هذه الدراسة، تم تقييم أداء نماذج التعلم الآلي المختلفة لتصنيف الأمراض السنية من الأشعة السينية البانورامية، مع التركيز على حالتين: تسوس الأسنان والالتهابات حول الذروة. تم تحليل مجموعة البيانات، التي تتكون من 58 صورة موصوفة، باستخدام ثلاثة نماذج متميزة: LSTM، BERT، و1D-CNN. أظهر نموذج LSTM حساسية بنسبة 75.00% ولكن خصوصية منخفضة بنسبة 41.67%، مما يشير إلى تحيز نحو تحديد الالتهابات حول الذروة بينما يواجه صعوبة في تصنيف تسوس الأسنان. كان أداؤه العام معتدلاً، بدقة 56.67% وAUC 0.72، مما يشير إلى قيود في التمييز بين الفئتين بسبب حجم مجموعة البيانات الصغيرة وخصائص النموذج الفطرية.

في المقابل، حقق نموذج BERT حساسية بنسبة 83.33% وخصوصية بنسبة 66.67%، مما أدى إلى دقة عامة قدرها 76.67% ودرجة F1 قدرها 0.82. بينما أدت أداءً جيدًا في الكشف عن الالتهابات حول الذروة، كانت قدرته على التمييز بين الحالتين لا تزال تواجه تحديات بسبب الغموض في الأوصاف النصية. ومع ذلك، أظهر نموذج 1D-CNN أداءً متفوقًا مع حساسية وخصوصية بنسبة 86.67%، محققًا دقة قدرها 84.00% وAUC 0.93. تشير الأداء المتوازن والقوة لهذا النموذج إلى أن الأساليب التلافيفية قد تكون أكثر فعالية من النماذج التسلسلية أو السياقية لاستخراج الميزات ذات الصلة من التمثيلات النصية المستمدة من الحالات السنية. بشكل عام، تشير النتائج إلى أن المصنفات القائمة على النصوص تفوقت على النماذج المعتمدة على الصور في التمييز بين تسوس الأسنان والالتهابات حول الذروة، مما يبرز إمكانيات الاستفادة من البيانات النصية المستمدة من الصور الشعاعية في التشخيص السني.

القيود

تقدم الدراسة عدة قيود قد تؤثر على قوة وعمومية نتائجها. أولاً، مجموعة البيانات الصغيرة نسبيًا، المستمدة من مركز واحد، تحد من القوة الإحصائية وتزيد من احتمال الإفراط في التكيف، على الرغم من تطبيق التحقق المتقاطع ذو 5 طيات. قد لا تمثل هذه المجموعة تنوع الحالات السنية لدى الأطفال عبر مختلف الفئات السكانية، وقد تؤدي التباينات في اكتساب الصور، وجودة التوصيف، والمعايير السريرية إلى إدخال تحيز. يحد عدم وجود مجموعات تحقق خارجية من تقييم أداء النموذج عبر إعدادات سريرية مختلفة.

علاوة على ذلك، على الرغم من أن أساليب الذكاء الاصطناعي تظهر نتائج واعدة، إلا أنها لم يتم التحقق منها في دراسات سريرية كبيرة النطاق وموجهة، وتتطلب قابلية تفسير مخرجات التعلم العميق مزيدًا من التحسين لتعزيز ثقة الأطباء وتسهيل اعتمادها سريريًا. يحد غياب البيانات الطولية من تقييم تقدم المرض ونتائج العلاج. علاوة على ذلك، على الرغم من أن طبيب أسنان مؤهل قد راجع الأوصاف النصية المولدة من حيث المعقولية التشريحية، لم تؤسس هذه المراجعة دقة التشخيص، مما يبرز الطبيعة الاستكشافية للنتائج. كما تشير الدراسة إلى مخاوف دلالية، حيث أن بعض المصطلحات المولدة، مثل “الأورام” و”التحام المفاصل”، كانت خارج النطاق السريري المقصود، مما يبرز المخاطر المرتبطة باستخدام نماذج اللغة الكبيرة (LLMs) في هذا السياق. هناك حاجة إلى مزيد من البحث الذي يتضمن مجموعات بيانات أكبر ومتعددة المراكز لمعالجة هذه القيود والتحقق من القابلية السريرية للإطار المقترح.

Journal: Frontiers in Artificial Intelligence, Volume: 9
DOI: https://doi.org/10.3389/frai.2026.1754498
PMID: https://pubmed.ncbi.nlm.nih.gov/41930217
Publication Date: 2026-03-03
Author(s): Tuan D. Pham et al.
Primary Topic: Dental Radiography and Imaging

Overview

The research paper presents a novel text-driven framework for classifying pediatric dental diseases from panoramic radiographs, emphasizing the potential of natural language processing in medical imaging. By generating structured textual descriptions from radiographs using a transformer model, the study evaluates the effectiveness of these descriptions for binary disease detection through three deep learning architectures: a one-dimensional convolutional neural network (1D-CNN), a long short-term memory (LSTM) network, and a pretrained Bidirectional Encoder Representations from Transformer (BERT) model. The results indicate that the 1D-CNN achieved the highest accuracy at 84%, outperforming both the BERT model (77% accuracy) and the LSTM model (57% accuracy), as well as three pretrained convolutional neural networks that analyzed images directly.

The findings suggest that text-based classification may serve as a viable alternative to traditional image-based methods, offering advantages such as reduced need for complex image preprocessing and enhanced interpretability. However, challenges remain in ensuring consistent generalizability across various disease types. The study advocates for future research to enhance the quality of radiograph-to-text generation, explore hybrid models that integrate both textual and visual data, and validate these approaches on larger, more diverse datasets. This methodology not only holds promise for dental disease classification but also has broader implications for AI-driven diagnostics in other medical fields, including radiology, dermatology, and pathology.

Introduction

The introduction of this research paper emphasizes the critical need for accurate and early diagnosis of pediatric dental diseases to facilitate effective treatment and prevent long-term complications. Traditional diagnostic methods, primarily reliant on manual interpretation of panoramic radiographs, are time-consuming and prone to variability among practitioners. To address these challenges, the paper highlights the potential of artificial intelligence (AI), particularly deep learning models, in enhancing diagnostic efficiency and accuracy in pediatric dentistry. Recent advancements have led to the development of AI frameworks capable of detecting pediatric dental conditions, such as the identification of permanent tooth germs and assessment of dental development stages, thereby underscoring AI’s transformative impact on early disease detection and clinical decision-making.

Despite the promising advancements, the paper notes significant challenges in implementing AI in pediatric dentistry, including issues related to data quality, dataset size, and model generalizability. The authors stress the importance of ongoing research to refine AI methodologies and address data scarcity, which can hinder model training and diagnostic accuracy. Furthermore, the paper introduces the innovative use of large language models (LLMs), such as ChatGPT, to generate textual descriptions of dental conditions from radiographs, proposing a novel approach that integrates natural language processing with deep learning for improved diagnostic accuracy and interpretability. This study aims to explore the feasibility of a text-based framework for dental disease diagnosis, potentially offering a scalable alternative to traditional image-based classification methods.

Methods

The “Methods” section outlines the experimental and analytical procedures employed in the study. It details the selection of participants, the data collection techniques, and the statistical analyses used to interpret the results. The study utilized a randomized controlled trial design, ensuring that participants were assigned to either the experimental or control group to minimize bias. Data were collected through standardized questionnaires and measurements, which were then subjected to rigorous statistical testing, including t-tests and ANOVA, to assess the significance of the findings.

Additionally, the section describes the specific mathematical models applied to analyze the data, including regression analyses to explore relationships between variables. The methodology emphasizes reproducibility and transparency, providing sufficient detail for replication by other researchers. Overall, the methods employed are robust and appropriate for addressing the research questions posed in the study.

Results

The study evaluated the effectiveness of ChatGPT, based on OpenAI’s GPT-4 architecture, in generating concise descriptions of potential bone abnormalities from reduced-resolution panoramic radiographs, which were downsampled by a factor of 25. The generated descriptions highlighted various dental conditions, including possible bone loss, jaw asymmetry, impacted teeth, and other abnormalities. The research employed a systematic approach to convert text data into numeric sequences for model training, utilizing a non-stratified holdout partition for training (90%) and validation (10%) across multiple iterations to ensure robustness.

Additionally, the performance of text-based classifiers (1D-CNN, LSTM, BERT) was compared with image-based classifiers (SqueezeNet, GoogLeNet, AlexNet) using the same reduced-resolution images. The training process incorporated data augmentation techniques to enhance model generalization. Performance metrics were derived from 5-fold cross-validation, with results indicating preliminary insights into model behavior and comparative trends rather than definitive diagnostic accuracy. The findings underscore the potential of AI models in dental diagnostics while highlighting the need for further validation on larger datasets to establish clinically generalizable performance.

Discussion

In this study, the performance of various machine learning models for classifying dental diseases from panoramic radiographs was evaluated, focusing on two conditions: caries and periapical infections. The dataset, consisting of 58 annotated images, was analyzed using three distinct models: LSTM, BERT, and 1D-CNN. The LSTM model exhibited a sensitivity of 75.00% but a low specificity of 41.67%, indicating a bias towards identifying periapical infections while struggling with caries classification. Its overall performance was moderate, with an accuracy of 56.67% and an AUC of 0.72, suggesting limitations in distinguishing between the two classes due to the small dataset size and the model’s inherent characteristics.

In contrast, the BERT model achieved a sensitivity of 83.33% and a specificity of 66.67%, resulting in an overall accuracy of 76.67% and an F1 score of 0.82. While it performed well in detecting periapical infections, its ability to differentiate between the two conditions was still challenged by the ambiguities in textual descriptions. The 1D-CNN model, however, demonstrated superior performance with both sensitivity and specificity at 86.67%, achieving an accuracy of 84.00% and an AUC of 0.93. This model’s balanced performance and robustness suggest that convolutional approaches may be more effective than sequential or contextual models for extracting relevant features from the derived textual representations of dental conditions. Overall, the findings indicate that text-based classifiers outperformed image-based models in distinguishing between caries and periapical infections, highlighting the potential of leveraging textual data derived from radiographic images in dental diagnostics.

Limitations

The study presents several limitations that may affect the robustness and generalizability of its findings. Firstly, the relatively small dataset, sourced from a single center, limits statistical power and increases the potential for overfitting, despite the application of 5-fold cross-validation. This dataset may not adequately represent the diversity of pediatric dental conditions across various demographics, and inconsistencies in image acquisition, annotation quality, and clinical criteria could introduce bias. The lack of external validation cohorts further restricts the assessment of the model’s performance across different clinical settings.

Additionally, while the AI approaches show promising results, they have not been validated in large-scale, prospective clinical studies, and the interpretability of the deep learning outputs requires further improvement to foster clinician trust and facilitate clinical adoption. The absence of longitudinal data limits the evaluation of disease progression and treatment outcomes. Furthermore, although a qualified dentist reviewed the generated textual descriptions for anatomical plausibility, this review did not establish diagnostic accuracy, emphasizing the exploratory nature of the findings. The study also notes semantic concerns, as some generated terms, such as “tumors” and “ankylosis,” fell outside the intended clinical scope, highlighting the risks associated with using large language models (LLMs) in this context. Future research involving larger, multi-center datasets is necessary to address these limitations and validate the clinical applicability of the proposed framework.