تقييم روبوتات الدردشة المعتمدة على الذكاء الاصطناعي للرضاعة الطبيعية: تحليل الجودة، والقراءة، والموثوقية Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis

المجلة: PLoS ONE، المجلد: 20، العدد: 3
DOI: https://doi.org/10.1371/journal.pone.0319782
PMID: https://pubmed.ncbi.nlm.nih.gov/40096051
تاريخ النشر: 2025-03-17
المؤلف: Emine Özdemir
الموضوع الرئيسي: محو الأمية الصحية وإمكانية الوصول إلى المعلومات

نظرة عامة

تناقش قسم ورقة البحث الاستخدام المتزايد للدردشة المعتمدة على الذكاء الاصطناعي، مثل ChatGPT وGemini وCopilot، من قبل الأمهات الحوامل والمرضعات اللواتي يبحثن عن معلومات تتعلق بالرضاعة الطبيعية من خلال تطبيقات ومواقع التواصل الاجتماعي. بينما تُلاحظ هذه الدردشات لتقديم إجابات موثوقة على استفسارات الرضاعة الطبيعية، يمكن أن تعيق تعقيد المعلومات وقابليتها للقراءة الوصول إليها، خاصةً بالنسبة للأفراد ذوي المستويات التعليمية المنخفضة. تجد الدراسة أنه على الرغم من أن دردشات الذكاء الاصطناعي أكثر موثوقية من العديد من المصادر التقليدية عبر الإنترنت، إلا أن المخاوف بشأن دقتها وقابليتها للاستخدام وشفافية مصادر المعلومات لا تزال قائمة.

يؤكد المؤلفون على الحاجة إلى مزيد من تحسين تقنيات الذكاء الاصطناعي لضمان وصولها إلى جمهور أوسع بمستويات مختلفة من المعرفة الصحية. يدعون إلى استمرار البحث لتحسين هذه المنصات ووضع إرشادات لاستخدامها في كل من الإعدادات السريرية وغير السريرية. تعترف الدراسة بعدة قيود، بما في ذلك تقييم ثلاثة منصات ذكاء اصطناعي فقط، وعدم تمثيل السيناريوهات السريرية بشكل محتمل، والتباين في الردود بسبب اختلاف نماذج اللغة، وغياب مصادر متسقة لردود الدردشة في وقت الدراسة.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الدور الحاسم للرضاعة الطبيعية كمصدر غذائي مثالي للرضع، مع التأكيد على فوائدها الصحية العميقة على المدى القصير والطويل لكل من الأمهات والأطفال. على الرغم من هذه المزايا، تظل معدلات الرضاعة الطبيعية العالمية منخفضة، حيث يبدأ 44% فقط من حديثي الولادة الرضاعة الطبيعية خلال الساعات الأولى من الحياة. تشمل العوامل المساهمة في هذه المشكلة الخصائص الاجتماعية والديموغرافية، والتحديات الصحية، ونقص التعليم حول الرضاعة الطبيعية. أدى ارتفاع التكنولوجيا الرقمية إلى زيادة الوصول إلى المعلومات الصحية عبر الإنترنت، مما دفع الأمهات الحوامل والمرضعات للبحث عن إرشادات من خلال وسائل التواصل الاجتماعي ومنصات الإنترنت. ومع ذلك، يمكن أن تختلف جودة وفهم هذه المعلومات بشكل كبير، مما يستدعي التركيز على ضمان حصول الأمهات على محتوى دقيق وموثوق.

تستكشف الورقة أيضًا إمكانيات الذكاء الاصطناعي (AI) في تعزيز الوصول إلى المعلومات الصحية المتعلقة بالرضاعة الطبيعية. بينما تم دراسة تطبيقات الذكاء الاصطناعي بشكل موسع في مجالات الرعاية الصحية المختلفة، لا تزال فائدتها المحددة في تلبية احتياجات الأمهات الحوامل والمرضعات غير مستكشفة بشكل كافٍ. تهدف هذه الدراسة إلى سد هذه الفجوة من خلال تقييم دردشات الذكاء الاصطناعي، مثل ChatGPT وGemini وCopilot، من حيث قدرتها على تبسيط اللغة الطبية المعقدة وتقديم معلومات موثوقة. ستقوم البحث بتقييم جودة وموثوقية وقابلية قراءة وملاءمة المعلومات التي تقدمها هذه الدردشات، مما يسهم في فهم أفضل لدورها في رعاية صحة الأمهات والأطفال.

الطرق

في هذه الدراسة، قام باحثان بتقييم ردود ثلاثة دردشات تعتمد على الذكاء الاصطناعي للرضاعة الطبيعية—ChatGPT النسخة 3.5، Gemini، وCopilot—من خلال تحليل إجاباتهم على 50 سؤالًا متكررًا مصنفة إلى أسئلة موجهة نحو الطفل وأسئلة موجهة نحو الأم. استخدمت التقييم خمسة معايير تقييم: مقياس جودة المعلومات المقدمة للمرضى (EQIP)، مقياس قياس البلبلة البسيط (SMOG)، مؤشر التشابه (SI)، نظام تقييم الاعتمادية المعدل (mDISCERN)، ومقياس الجودة العالمية (GQS).

كان الهدف من هذا الإطار المنهجي هو قياس جودة وموثوقية المعلومات المقدمة من قبل الدردشات بشكل منهجي، مما يضمن تقييمًا شاملاً لأدائها في تقديم الإرشادات المتعلقة بالرضاعة الطبيعية. يسمح استخدام مقاييس تقييم متنوعة بفهم دقيق لنقاط القوة والضعف لكل دردشة في تلبية احتياجات كل من الأمهات والرضع.

النتائج

كشفت تقييمات دردشات الذكاء الاصطناعي عن اختلافات ذات دلالة إحصائية عبر جميع معايير التقييم (p < 0.05). حقق Copilot أعلى الدرجات في مقياس ضمان جودة المعلومات للمرضى (EQIP)، مقياس قياس البلبلة البسيط (SMOG)، ومؤشر التشابه (SI)، مع متوسط درجات 48.9 ± 14.2، 18.5 ± 2.03، و28.0 ± 20.8، على التوالي. في المقابل، تفوق Gemini في تقييمات mDISCERN ومقياس الجودة العالمية (GQS)، مسجلاً 4.15 ± 0.936 و4.12 ± 0.940، على التوالي. ومن الجدير بالذكر أنه لم يتم العثور على اختلافات ذات دلالة بين Copilot وGemini في درجات mDISCERN وGQS. أظهرت جميع الدردشات الثلاثة موثوقية وجودة عالية، على الرغم من أن قابليتها للقراءة كانت تتطلب على الأقل مستوى تعليمي جامعي. تم ملاحظة أن ChatGPT كان متميزًا في أصالته مع درجة 8.56 ± 17.6، بينما أظهر Copilot درجة أعلى من التشابه في الردود. في فئة "الأسئلة الموجهة نحو الأم"، لوحظت اختلافات كبيرة بشكل خاص في معايير EQIP وSMOG ومؤشر التشابه (p < 0.001). تم تفصيل الإحصائيات الوصفية والمقارنات اللاحقة لجميع الفئات في الجداول 3 و4 من الدراسة.

المناقشة

هدفت الدراسة إلى تقييم أداء ثلاثة دردشات تعتمد على الذكاء الاصطناعي—ChatGPT وGemini وCopilot—في تقديم معلومات حول الرضاعة الطبيعية، مع التركيز على الجودة والموثوقية وقابلية القراءة وتشابه الردود. حدد الباحثون 50 سؤالًا متكررًا مصنفة إلى أسئلة “موجهة نحو الطفل” و”موجهة نحو الأم”، والتي تم تقييمها باستخدام خمسة معايير تقييم معتمدة: أداة ضمان جودة المعلومات للمرضى (EQIP)، نظام تقييم الاعتمادية المعدل (mDISCERN)، مقياس قياس البلبلة البسيط (SMOG)، مقياس الجودة العالمية (GQS)، ومؤشر التشابه. أشارت النتائج إلى أن جميع الدردشات أظهرت موثوقية عالية وجودة جيدة، حيث حقق Copilot أعلى الدرجات عبر معظم المعايير، بينما أظهر ChatGPT أصالة أكبر في ردوده. ومن الجدير بالذكر أن Gemini وCopilot كانا أكثر موثوقية بسبب قدرتهما على الإشارة إلى مصادر البيانات بشكل متسق.

على الرغم من الجودة العالية للمعلومات المقدمة من قبل الدردشات، وُجد أن مستوى القراءة كان أعلى من مستوى المستخدم العادي، مما يشير إلى أن المحتوى قد لا يكون سهل الوصول للأفراد ذوي الخلفيات التعليمية المنخفضة. تؤكد النتائج على الحاجة إلى مزيد من تحسين تقنيات الذكاء الاصطناعي لتعزيز وضوح وسهولة الوصول إلى المعلومات الطبية. بالإضافة إلى ذلك، تسلط الدراسة الضوء على أهمية وضع إرشادات لاستخدام دردشات الذكاء الاصطناعي في الرعاية الصحية، خاصة فيما يتعلق بشفافية المصادر والتأثير المحتمل للمعلومات المضللة على اتخاذ قرارات المرضى. تشمل قيود الدراسة تقييم ثلاثة دردشات فقط والتباين في الردود بسبب نقص الأسئلة المتكررة عبر المنصات.

Journal: PLoS ONE, Volume: 20, Issue: 3
DOI: https://doi.org/10.1371/journal.pone.0319782
PMID: https://pubmed.ncbi.nlm.nih.gov/40096051
Publication Date: 2025-03-17
Author(s): Emine Özdemir
Primary Topic: Health Literacy and Information Accessibility

Overview

The research paper section discusses the increasing use of AI-based chatbots, such as ChatGPT, Gemini, and Copilot, by expectant and breastfeeding mothers seeking breastfeeding-related information through social media applications and websites. While these chatbots are noted for providing reliable answers to breastfeeding queries, the complexity and readability of the information can hinder accessibility, particularly for individuals with lower educational levels. The study finds that although AI chatbots are more reliable than many traditional online sources, concerns regarding their accuracy, usability, and transparency of information sources persist.

The authors emphasize the need for further refinement of AI technologies to ensure they are accessible to a broader audience with varying health literacy levels. They advocate for continued research to optimize these platforms and establish guidelines for their use in both clinical and non-clinical settings. The study acknowledges several limitations, including the evaluation of only three AI platforms, potential non-representativeness of clinical scenarios, variability in responses due to differing language models, and the lack of consistent sourcing for chatbot responses at the time of the study.

Introduction

The introduction of this research paper highlights the critical role of breastfeeding as the optimal nutritional source for infants, emphasizing its profound short- and long-term health benefits for both mothers and children. Despite these advantages, global breastfeeding rates remain low, with only 44% of newborns initiating breastfeeding within the first hours of life. Factors contributing to this issue include socio-demographic characteristics, health challenges, and insufficient education on breastfeeding. The rise of digital technology has led to increased access to health-related information online, prompting expectant and breastfeeding mothers to seek guidance through social media and internet platforms. However, the quality and comprehensibility of this information can vary significantly, necessitating a focus on ensuring that mothers receive accurate and reliable content.

The paper further explores the potential of artificial intelligence (AI) in enhancing access to breastfeeding-related health information. While AI applications have been extensively studied in various healthcare domains, their specific utility in addressing the needs of expectant and breastfeeding mothers remains underexplored. This study aims to fill this gap by evaluating AI chatbots, such as ChatGPT, Gemini, and Copilot, in terms of their ability to simplify complex medical language and provide reliable information. The research will assess the quality, reliability, readability, and relevance of the information these chatbots deliver, thereby contributing to a better understanding of their role in maternal and pediatric health care.

Methods

In this study, two researchers assessed the responses of three AI-based breastfeeding chatbots—ChatGPT version 3.5, Gemini, and Copilot—by analyzing their answers to 50 frequently asked questions categorized into Baby-Centered Questions and Mother-Centered Questions. The evaluation employed five scoring criteria: the Quality Information Provision for Patients (EQIP) scale, the Simple Measure of Gobbledygook (SMOG) scale, the Similarity Index (SI), the Modified Dependability Scoring System (mDISCERN), and the Global Quality Scale (GQS).

This methodological framework aimed to systematically quantify the quality and reliability of information provided by the chatbots, ensuring a comprehensive assessment of their performance in delivering breastfeeding-related guidance. The use of diverse evaluation metrics allows for a nuanced understanding of each chatbot’s strengths and weaknesses in addressing the needs of both mothers and infants.

Results

The evaluation of AI chatbots revealed statistically significant differences across all assessment criteria (p < 0.05). Copilot achieved the highest scores in the Ensuring Quality Information for Patients (EQIP), Simple Measure of Gobbledygook (SMOG), and Similarity Index (SI) scales, with mean scores of 48.9 ± 14.2, 18.5 ± 2.03, and 28.0 ± 20.8, respectively. In contrast, Gemini excelled in the mDISCERN and Global Quality Score (GQS) evaluations, scoring 4.15 ± 0.936 and 4.12 ± 0.940, respectively. Notably, no significant differences were found between Copilot and Gemini for the mDISCERN and GQS scores. All three chatbots demonstrated high reliability and quality, although their readability was determined to require at least a university-level education. ChatGPT was noted for its originality with a score of 8.56 ± 17.6, while Copilot exhibited a higher degree of similarity in responses. In the "Mother-Centered Questions" category, significant differences were specifically observed in the EQIP, SMOG, and Similarity Index metrics (p < 0.001). Descriptive statistics and post hoc comparisons for all categories are detailed in Tables 3 and 4 of the study.

Discussion

The study aimed to evaluate the performance of three AI-based chatbots—ChatGPT, Gemini, and Copilot—in providing breastfeeding information, focusing on quality, reliability, readability, and similarity of responses. The researchers identified 50 frequently asked questions categorized into “Baby-Centered” and “Mother-Centered” questions, which were assessed using five established evaluation criteria: the Ensuring Quality Information for Patients (EQIP) tool, Modified Dependability Scoring System (mDISCERN), Simple Measure of Gobbledygook (SMOG), Global Quality Scale (GQS), and Similarity Index. Results indicated that all chatbots demonstrated high reliability and good quality, with Copilot achieving the highest scores across most metrics, while ChatGPT exhibited greater originality in its responses. Notably, Gemini and Copilot were more reliable due to their ability to reference data sources consistently.

Despite the overall high quality of information provided by the chatbots, the readability level was found to be above that of the average user, suggesting that the content may not be easily accessible to individuals with lower educational backgrounds. The findings underscore the need for further refinement of AI technologies to enhance the clarity and accessibility of medical information. Additionally, the study highlights the importance of establishing guidelines for the use of AI chatbots in healthcare, particularly concerning source transparency and the potential impact of misinformation on patient decision-making. Limitations of the study include the evaluation of only three chatbots and the variability in responses due to the lack of repeated questions across platforms.