تخصيص GPT-4V(ision) للتشخيص بالأشعة: هل يمكن لنموذج اللغة الكبير اكتشاف الأسنان الزائدة؟ Customized GPT-4V(ision) for radiographic diagnosis: can large language model detect supernumerary teeth?

المجلة: BMC Oral Health، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12903-025-06163-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40399904
تاريخ النشر: 2025-05-21
المؤلف: Enes Mustafa AŞAR وآخرون
الموضوع الرئيسي: علم الأمراض الفموية والوجهية

نظرة عامة

تدرس هذه الدراسة فعالية نماذج اللغة، وبشكل خاص متغيرات GPT-4، في اكتشاف الأسنان الزائدة على الأشعة السينية المحيطية. تم تطوير نموذج مخصص، CGPT-4V، لتقييم ما إذا كان التدريب المتخصص في المجال يمكن أن يعزز دقة التشخيص مقارنة بالنماذج القياسية، GPT-4V و GPT-4o. شملت التقييم 180 صورة شعاعية، مع توزيع متوازن للحالات التي تحتوي على أسنان زائدة. تم تقييم أداء كل نموذج باستخدام موجه موحد، وتم تسجيل الردود من قبل خبراء الأسنان. تم استخدام تحليلات إحصائية، بما في ذلك اختبارات كاي-تربيع وتحليل ROC، لمقارنة أداء النماذج.

أشارت النتائج إلى أن CGPT-4V حقق أعلى دقة، حيث تم التعرف بشكل صحيح على الأسنان الزائدة في 91% من الحالات، متفوقًا بشكل كبير على GPT-4o (77%) و GPT-4V (63%). بالإضافة إلى ذلك، أظهر CGPT-4V معدل إيجابيات كاذبة أقل (16%) مقارنة بـ GPT-4V (42%). خلصت الدراسة إلى أن نماذج GPT المخصصة تحمل وعدًا كبيرًا في علم الأشعة السنية، مشددة على الحاجة إلى التحقق من صحة متعددة المراكز، والاندماج مع سير العمل السريري، وتقييم تأثيرها الاقتصادي لتسهيل التطبيق في العالم الحقيقي. يجب أن تتناول الأبحاث المستقبلية هذه المجالات لتعزيز الفائدة السريرية للذكاء الاصطناعي في تشخيص الأسنان.

الطرق

تمت الموافقة على البحث من قبل لجنة أخلاقيات البحث السريري غير التدخلي بكلية طب الأسنان بجامعة سيلجوق في تركيا (رقم القرار: 2024/68). تم حساب حجم العينة للدراسة بناءً على حجم تأثير قدره 0.25، ومستوى دلالة قدره 5% (α = 0.05)، وقوة إحصائية قدرها 80% (1 – β = 0.80)، مما أسفر عن الحد الأدنى المطلوب لحجم العينة وهو 157 صورة شعاعية. أسفر التحليل عن قوة اختبار تبلغ حوالي 0.8706.

هدفت الدراسة إلى تقييم فعالية نماذج المحولات المدربة مسبقًا (GPT) المختلفة – وبشكل خاص GPT-4V و GPT-4o و CGPT-4V – في اكتشاف الأسنان الزائدة في الأشعة السينية السنية. باستخدام اشتراك GPT Plus، طور الباحثون نموذج GPT مخصص مصمم لهذه المهمة المحددة، مما يعزز قدرة النموذج على تقديم ردود دقيقة ومتسقة في سياق التصوير السني.

النتائج

تتناول قسم النتائج النتائج المستخلصة من تقييم ثلاثة نماذج GPT، التي أجريت وفقًا للمنهجية الموصوفة سابقًا. يركز التحليل على مقاييس الأداء المختلفة، مقارنة قدرات النماذج من حيث الدقة والكفاءة والفهم السياقي. يتم تقديم نتائج كمية محددة، مثل معدلات الدقة وأوقات المعالجة، لتسليط الضوء على نقاط القوة والضعف لكل نموذج.

بالإضافة إلى ذلك، قد يتضمن القسم تقييمات نوعية، توضح كيف يؤدي كل نموذج في سيناريوهات العالم الحقيقي أو المهام المحددة. تساهم هذه النتائج في فهم أعمق لمدى قابلية تطبيق النماذج وفعاليتها في مهام معالجة اللغة الطبيعية، مما يوفر رؤى قيمة للبحوث والتطوير المستقبلية في هذا المجال.

المناقشة

في هذه الدراسة، أظهر نموذج CGPT-4V، المصمم خصيصًا لاكتشاف الأسنان الزائدة في الأشعة السينية السنية، تحسينات كبيرة في دقة التشخيص مقارنة بالنماذج العامة مثل GPT-4V و GPT-4o. تم تدريب النموذج باستخدام مجموعة بيانات من 180 صورة شعاعية محيطية، مع توضيحات من خبراء لضمان الاتساق في تحديد الأسنان الزائدة (mesiodens). أشارت النتائج إلى أن CGPT-4V حقق معدل دقة إجمالي قدره 91%، متفوقًا على GPT-4o (77%) و GPT-4V (63%). تبرز حساسية النموذج العالية (0.91) وخصوصيته (0.84) إمكانيته المحتملة في البيئات السريرية، خاصة للممارسين ذوي الخبرة المحدودة في تفسير الأشعة.

تسلط الدراسة الضوء على أهمية التخصيص الدقيق في تعزيز قدرات التشخيص للذكاء الاصطناعي. بينما أظهر CGPT-4V أداءً متفوقًا، لم يتم العثور على فرق كبير بينه وبين GPT-4V، مما يشير إلى أن فعالية التخصيص قد تعتمد على بنية النموذج الأساسي وجودة بيانات التدريب. تدعو النتائج إلى دمج حذر لنماذج الذكاء الاصطناعي في الممارسة السريرية، مشددة على دورها في تحسين الكفاءة ودعم التشخيص المبكر، خاصة في طب الأسنان للأطفال حيث يمكن أن يؤثر الكشف في الوقت المناسب عن الشذوذ بشكل كبير على نتائج العلاج. يجب أن تركز الأبحاث المستقبلية على توسيع مجموعات بيانات التدريب واستكشاف تخصيص نماذج الذكاء الاصطناعي لتطبيقات أوسع في تشخيص الأسنان.

القيود

تسلط قيود الدراسة الضوء على عدة عوامل حاسمة تؤثر على قابلية تعميم وأداء نموذج CGPT-4V. كان التدريب مقصورًا على 20 صورة فقط، مما يثير القلق بشأن قابلية تطبيق النموذج على مجموعات بيانات أكبر وأكثر تعقيدًا. قد تؤدي هذه البيانات المحدودة إلى تشابه مفرط في النتائج. بالإضافة إلى ذلك، يقتصر التركيز الحصري للدراسة على الأسنان الزائدة على صلتها بمجموعة واسعة من الأمراض السنية.

يجب أن تهدف الأبحاث المستقبلية إلى التعاون مع مطوري الذكاء الاصطناعي، مثل OpenAI، لإنشاء نماذج CGPT أكثر قوة تم تدريبها على مجموعات بيانات شعاعية ومرضية واسعة، خاصة للحالات المهمة مثل الأورام. من الضروري التحقق من صحة هذه النماذج في مجموعات بيانات سريرية متعددة المراكز. علاوة على ذلك، يمكن أن يعزز دمج أدوات الذكاء الاصطناعي في الممارسات السنية الروتينية الكشف المبكر عن الشذوذات غير العرضية، التي قد يغفلها الأطباء. حتى النماذج الأساسية المعتمدة على GPT، إذا تم تطويرها بالتعاون مع الخبرة السريرية، لديها القدرة على تحسين نتائج التشخيص ورعاية المرضى بشكل كبير.

Journal: BMC Oral Health, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12903-025-06163-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40399904
Publication Date: 2025-05-21
Author(s): Enes Mustafa AŞAR et al.
Primary Topic: Oral and Maxillofacial Pathology

Overview

This study investigates the efficacy of language models, specifically GPT-4 variants, in detecting supernumerary teeth on periapical radiographs. A customized model, CGPT-4V, was developed to assess whether domain-specific training could enhance diagnostic accuracy compared to standard models, GPT-4V and GPT-4o. The evaluation involved 180 radiographs, with a balanced distribution of cases featuring supernumerary teeth. Each model’s performance was assessed using a standardized prompt, and responses were scored by dental experts. Statistical analyses, including chi-square tests and ROC analysis, were employed to compare the models’ performances.

The results indicated that CGPT-4V achieved the highest accuracy, correctly identifying supernumerary teeth in 91% of cases, significantly outperforming GPT-4o (77%) and GPT-4V (63%). Additionally, CGPT-4V exhibited a lower false positive rate (16%) compared to GPT-4V (42%). The study concluded that customized GPT models hold significant promise in dental radiology, emphasizing the need for multicenter validation, integration with clinical workflows, and evaluation of their economic impact to facilitate real-world application. Future research should address these areas to enhance the clinical utility of AI in dental diagnostics.

Methods

The research was approved by the Selcuk University Faculty of Dentistry Non-Invasive Clinical Research Ethics Committee in Turkey (Decision No: 2024/68). The sample size for the study was calculated based on an effect size of 0.25, a significance level of 5% (α = 0.05), and a statistical power of 80% (1 – β = 0.80), resulting in a minimum required sample size of 157 radiographs. The analysis yielded a test power of approximately 0.8706.

The study aimed to evaluate the effectiveness of various Generative Pre-trained Transformer (GPT) models—specifically GPT-4V, GPT-4o, and CGPT-4V—in detecting supernumerary teeth in dental radiographs. Utilizing a GPT Plus subscription, the researchers developed a customized GPT model tailored for this specific detection task, enhancing the model’s ability to provide accurate and consistent responses in the context of dental imaging.

Results

The results section details the findings from the evaluation of three GPT models, conducted according to the previously described methodology. The analysis focuses on various performance metrics, comparing the models’ capabilities in terms of accuracy, efficiency, and contextual understanding. Specific quantitative results, such as accuracy rates and processing times, are presented to highlight the strengths and weaknesses of each model.

Additionally, the section may include qualitative assessments, illustrating how each model performs in real-world scenarios or specific tasks. These findings contribute to a deeper understanding of the models’ applicability and effectiveness in natural language processing tasks, providing valuable insights for future research and development in the field.

Discussion

In this study, the CGPT-4V model, specifically designed for detecting supernumerary teeth in dental radiographs, demonstrated significant improvements in diagnostic accuracy compared to general-purpose models like GPT-4V and GPT-4o. The model was trained using a dataset of 180 periapical radiographs, with expert annotations ensuring consistency in identifying supernumerary teeth (mesiodens). The results indicated that CGPT-4V achieved an overall accuracy rate of 91%, outperforming GPT-4o (77%) and GPT-4V (63%). The model’s high sensitivity (0.91) and specificity (0.84) further underscore its potential utility in clinical settings, particularly for practitioners with limited experience in radiographic interpretation.

The study highlights the importance of domain-specific fine-tuning in enhancing AI diagnostic capabilities. While CGPT-4V showed superior performance, no significant difference was found between it and GPT-4V, suggesting that the effectiveness of fine-tuning may depend on the underlying model’s architecture and training data quality. The findings advocate for the cautious integration of AI models into clinical practice, emphasizing their role in improving efficiency and supporting early diagnosis, particularly in pediatric dentistry where timely detection of anomalies can significantly influence treatment outcomes. Future research should focus on expanding training datasets and exploring the customization of AI models for broader applications in dental diagnostics.

Limitations

The limitations of the study highlight several critical factors affecting the generalizability and performance of the CGPT-4V model. The training was constrained to only 20 images, which raises concerns about the model’s applicability to larger and more complex datasets. This limited data population may result in over-similarity in the findings. Additionally, the study’s exclusive focus on mesiodens restricts its relevance to a wider array of dental pathologies.

Future research should aim to collaborate with AI developers, such as OpenAI, to create more robust CGPT models that are trained on extensive radiographic and pathological datasets, particularly for significant conditions like tumors. Validation of these models in multicenter clinical datasets is essential. Furthermore, the integration of AI tools into routine dental practices could enhance the early detection of asymptomatic anomalies, which clinicians might overlook. Even basic GPT-based models, if developed in conjunction with clinical expertise, have the potential to significantly improve diagnostic outcomes and patient care.