تقييم تطبيقات الذكاء الاصطناعي المختلفة في الاستجابة لإجراءات علاج الجذور التجديدي Evaluation of different artificial intelligence applications in responding to regenerative endodontic procedures

المجلة: BMC Oral Health، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12903-025-05424-5
PMID: https://pubmed.ncbi.nlm.nih.gov/39799304
تاريخ النشر: 2025-01-11
المؤلف: Ece Ekmekci وآخرون
الموضوع الرئيسي: الأشعة السينية السنية والتصوير

نظرة عامة

إن دمج الذكاء الاصطناعي (AI) في الرعاية الصحية يحول سير العمل المهني، لا سيما في تعزيز سرعة ودقة علاج المرضى. تقيّم هذه الدراسة دقة الردود من ثلاثة روبوتات محادثة AI—Google Bard (Gemini)، ChatGPT-4o، وChatGPT-4 مع مكون PDF—بخصوص علاج الجذور التجديدي (RET)، استنادًا إلى 23 سؤالًا متوافقًا مع إرشادات الجمعية الأمريكية لطب الأسنان (AAE) لعام 2022. على مدار فترة 10 أيام، طرح باحثان هذه الأسئلة ثلاث مرات يوميًا، مما أسفر عن إجمالي 1,380 ردًا، تم تصنيفها كصحيحة أو غير صحيحة أو غير كافية.

أشارت التحليلات الإحصائية إلى وجود اختلافات كبيرة في دقة الردود عبر أوقات مختلفة من اليوم (p < 0.05). ومن الجدير بالذكر أن ChatGPT-4 مع مكون PDF حقق أعلى معدل استجابة صحيحة بنسبة 98.1%، بينما سجل Gemini أدنى معدل بنسبة 48%. على الرغم من أن ChatGPT-4o أظهر إمكانات لمساعدة الأطباء في RET، إلا أنه اعتُبر غير كافٍ كمصدر مستقل. تشير النتائج إلى أنه بينما يمكن أن يكون ChatGPT-4 مع مكون PDF أداة قيمة للأطباء، فإن المزيد من البحث ضروري لتطوير تطبيقات AI مصممة خصيصًا لممارسة طب الأسنان، مما يعزز دقتها وفائدتها في البيئات السريرية.

مقدمة

تسلط المقدمة الضوء على التقدم السريع في الذكاء الاصطناعي (AI)، لا سيما في مجال معالجة اللغة الطبيعية (NLP) من خلال نماذج اللغة الكبيرة (LLMs). هذه النماذج، مثل ChatGPT من OpenAI وGemini من Google، قادرة على أداء مهام معقدة، بما في ذلك توليد النصوص والترجمة ومعالجة البيانات متعددة الوسائط. كما تشير المقدمة إلى تطور الذكاء الاصطناعي في الرعاية الصحية، مع التركيز على تطبيقاته في تعزيز دقة التشخيص وتخطيط العلاج، لا سيما في طب الأسنان، حيث يساعد الذكاء الاصطناعي في اكتشاف الآفات المحيطية والتنبؤ بنتائج العلاج.

على الرغم من التطبيقات الواعدة للذكاء الاصطناعي، لا تزال المخاوف بشأن الاعتماد المفرط على هذه التقنيات قائمة، لا سيما في البيئات السريرية. تؤكد المقدمة على الحاجة إلى البحث في موثوقية تطبيقات الذكاء الاصطناعي في طب الأسنان، خاصة مع تزايد انتشار علاج الجذور التجديدي (RET). يهدف RET إلى استعادة الحيوية في الأسنان غير الناضجة التي تعاني من نخر اللب، مما يوفر مزايا شفاء بيولوجية مقارنة بالطرق التقليدية. تهدف الدراسة إلى تقييم دقة الردود من مختلف روبوتات المحادثة AI بشأن إجراءات RET، مع افتراض أن ChatGPT-4 مع مكون PDF سيتفوق على النماذج الأخرى في تقديم معلومات دقيقة.

طرق البحث

في هذه الدراسة، طور الباحثون مجموعة من 23 سؤالًا—تتكون من 9 أسئلة ثنائية و14 سؤالًا مفتوحًا—استنادًا إلى إرشادات “الاعتبارات السريرية لإجراء تجديدي” من الجمعية الأمريكية لطب الأسنان (AAE). كانت الأسئلة تهدف إلى مساعدة الأطباء في إجراء إجراءات علاج الجذور التجديدي (RET). تضمنت عملية التصميم ضمان الدقة العلمية والأهمية السريرية من خلال التركيز على الجوانب العملية مثل جدوى الإجراءات والأسس البيولوجية والمضاعفات واختيار المرضى. خضعت الأسئلة لمراجعة خبراء من قبل اثنين من المتخصصين في طب الأسنان لتعزيز قابليتها للتطبيق.

تم تقديم الأسئلة إلى ثلاث منصات AI—Gemini وGPT-4o وChatGPT-4 مع مكون PDF—على مدار فترة 10 أيام من 24 أغسطس إلى 2 سبتمبر 2024. طرح باحثان الأسئلة عدة مرات يوميًا باستخدام حسابات مختلفة، وتم توثيق الردود في جدول بيانات Excel. تم تصنيف كل رد كصحيح أو غير صحيح أو غير كافٍ استنادًا إلى إرشادات AAE. استخدم الباحثون اختبارات Pearson Chi-Square لتحليل توزيع الإجابات وتقييم العلاقات بين منصات AI، وبالتالي تقييم فعالية الذكاء الاصطناعي في تقديم معلومات ذات صلة سريريًا لـ RET.

النتائج

جمعت الدراسة إجمالي 1,380 ردًا، مع 60 ردًا لكل سؤال، وأجريت تحليلات إحصائية لتقييم العلاقة بين المنصات المختلفة ودقة إجاباتها. أشارت النتائج إلى وجود اختلافات كبيرة (p < 0.05) في دقة الردود عبر المنصات، كما هو ملخص في الجدول 2. ومن الجدير بالذكر أن ChatGPT-4 مع مكون PDF حقق أعلى معدل استجابة صحيحة بنسبة 98.1%، بينما أظهر Gemini أدنى معدل بنسبة 48%. كشفت التحليلات الإضافية أن ChatGPT-4 قدم 86.2% من الردود الصحيحة، بجانب 2.9% غير صحيحة و10.9% غير كافية. بالمقابل، شملت أداء Gemini 48% صحيحة، و17.5% غير صحيحة، و34.5% غير كافية. يتم التأكيد على الأداء المتفوق لـ ChatGPT-4 مع مكون PDF من خلال عدم وجود ردود غير صحيحة ومعدل منخفض من الإجابات غير الكافية، كما هو موضح في الشكل 2.

المناقشة

تسلط المناقشة الضوء على الإمكانات التحويلية لتقنيات الذكاء الاصطناعي (AI) في الرعاية الصحية، لا سيما في تحسين سير العمل للمهنيين الصحيين وتحسين نتائج علاج المرضى. على الرغم من التقدم في الذكاء الاصطناعي، تؤكد الورقة على أهمية التفكير السريري البشري، الذي يتطور من خلال التعليم والخبرة الواسعة. يتم الإشارة إلى ظهور نماذج اللغة الكبيرة متعددة الوسائط (LLMs) كأحد التطورات المهمة، حيث يمكن لهذه النماذج دمج أشكال مختلفة من المعلومات—مثل النصوص والمرئيات والصوت—مما يعكس الطبيعة متعددة الأوجه لاتخاذ القرارات الطبية.

تقيّم الدراسة بشكل خاص أداء نماذج LLM متعددة الوسائط، مثل ChatGPT-4o وGemini، في معالجة الأسئلة المتعلقة بعلاج الجذور التجديدي (RET). تشير النتائج إلى أن ChatGPT-4o حقق معدل دقة ملحوظ بنسبة 98.1%، متجاوزًا العتبة المقبولة لاتخاذ القرارات السريرية، بينما أظهر Gemini دقة أقل بكثير. تشير هذه النتائج إلى أنه، على الرغم من أن روبوتات المحادثة AI يمكن أن تقدم مساعدة قيمة في البيئات السريرية، إلا أنها ليست بديلاً عن المعرفة الخبراء، لا سيما في المجالات المتخصصة مثل طب الأسنان. يدعو المؤلفون إلى مزيد من البحث لتطوير نماذج AI متخصصة مصممة للتطبيقات السريرية، والتي يمكن أن تعزز دقة وموثوقية اتخاذ القرار المدعوم بالذكاء الاصطناعي في الرعاية الصحية.

Journal: BMC Oral Health, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12903-025-05424-5
PMID: https://pubmed.ncbi.nlm.nih.gov/39799304
Publication Date: 2025-01-11
Author(s): Ece Ekmekci et al.
Primary Topic: Dental Radiography and Imaging

Overview

The integration of artificial intelligence (AI) in healthcare is transforming professional workflows, particularly in enhancing the speed and accuracy of patient treatment. This study evaluates the accuracy of responses from three AI chatbots—Google Bard (Gemini), ChatGPT-4o, and ChatGPT-4 with a PDF plugin—regarding regenerative endodontic treatment (RET), based on 23 questions aligned with the American Association of Endodontists (AAE) 2022 guidelines. Over a 10-day period, two researchers posed these questions three times daily, resulting in a total of 1,380 responses, which were categorized as correct, incorrect, or insufficient.

Statistical analysis indicated significant variations in response accuracy across different times of day (p < 0.05). Notably, ChatGPT-4 with the PDF plugin achieved the highest correct response rate at 98.1%, while Gemini recorded the lowest at 48%. Although ChatGPT-4o demonstrated potential for assisting clinicians in RET, it was deemed insufficient as a standalone resource. The findings suggest that while ChatGPT-4 with the PDF plugin could serve as a valuable tool for clinicians, further research is essential to develop AI applications tailored specifically for endodontic practice, enhancing their precision and utility in clinical settings.

Introduction

The introduction highlights the rapid advancements in artificial intelligence (AI), particularly in the realm of natural language processing (NLP) through Large Language Models (LLMs). These models, such as OpenAI’s ChatGPT and Google’s Gemini, are capable of performing complex tasks, including text generation, translation, and multimodal data processing. The introduction also notes the evolution of AI in healthcare, emphasizing its applications in enhancing diagnostic accuracy and treatment planning, particularly in endodontics, where AI aids in detecting periapical lesions and predicting treatment outcomes.

Despite the promising applications of AI, concerns regarding overreliance on these technologies persist, particularly in clinical settings. The introduction underscores the need for research into the reliability of AI applications in endodontics, especially as regenerative endodontic treatment (RET) becomes more prevalent. RET aims to restore vitality in immature teeth with pulp necrosis, offering biological healing advantages over traditional methods. The study aims to evaluate the accuracy of responses from various AI chatbots regarding RET procedures, positing that ChatGPT-4 with a PDF plugin will outperform other models in providing accurate information.

Methods

In this study, the researchers developed a set of 23 questions—comprising 9 dichotomous and 14 open-ended items—based on the “Clinical Considerations for a Regenerative Procedure” guideline from the American Association of Endodontists (AAE). The questions aimed to assist clinicians in performing regenerative endodontic treatment (RET) procedures. The design process involved ensuring scientific accuracy and clinical relevance by focusing on practical aspects such as procedural feasibility, biological foundations, complications, and patient selection. The questions underwent expert review by two endodontic specialists to enhance their applicability.

The questions were presented to three AI platforms—Gemini, GPT-4o, and ChatGPT-4 with a PDF plugin—over a 10-day period from August 24 to September 2, 2024. Two researchers posed the questions multiple times daily using different accounts, and the responses were documented in an Excel spreadsheet. Each response was categorized as correct, incorrect, or insufficient based on the AAE guidelines. The researchers employed Pearson Chi-Square tests to analyze the distribution of answers and evaluate the relationships between the AI platforms, thereby assessing the effectiveness of AI in providing clinically relevant information for RET.

Results

The study collected a total of 1,380 responses, with 60 responses per question, and conducted statistical analyses to assess the relationship between different platforms and the accuracy of their answers. The results indicated significant differences (p < 0.05) in response accuracy across platforms, as summarized in Table 2. Notably, ChatGPT-4 with the PDF plugin achieved the highest correct response rate of 98.1%, while Gemini demonstrated the lowest rate at 48%. Further analysis revealed that ChatGPT-4 provided 86.2% correct responses, alongside 2.9% incorrect and 10.9% insufficient responses. In contrast, Gemini's performance included 48% correct, 17.5% incorrect, and 34.5% insufficient responses. The superior performance of ChatGPT-4 with the PDF plugin is underscored by its lack of incorrect responses and a minimal rate of insufficient answers, as illustrated in Figure 2.

Discussion

The discussion highlights the transformative potential of artificial intelligence (AI) technologies in healthcare, particularly in enhancing the workflow of healthcare professionals and improving patient treatment outcomes. Despite the advancements in AI, the paper emphasizes the importance of human clinical reasoning, which is developed through extensive education and experience. The emergence of multimodal large language models (LLMs) is noted as a significant development, as these models can integrate various forms of information—such as text, visuals, and audio—reflecting the multifaceted nature of medical decision-making.

The study specifically evaluates the performance of multimodal LLMs, such as ChatGPT-4o and Gemini, in addressing questions related to regenerative endodontic treatment (RET). The results indicate that ChatGPT-4o achieved a remarkable accuracy rate of 98.1%, surpassing the accepted threshold for clinical decision-making, while Gemini demonstrated significantly lower accuracy. These findings suggest that, although AI chatbots can provide valuable assistance in clinical settings, they are not yet a substitute for expert knowledge, particularly in specialized fields like endodontics. The authors call for further research to develop specialized AI models tailored for clinical applications, which could enhance the precision and reliability of AI-assisted decision-making in healthcare.