دراسة تجريبية لقياس الاستجابة العاطفية وإدراك الاستبيانات التي تم إنشاؤها بواسطة LLM والاستبيانات التي تم إنشاؤها بواسطة البشر A pilot study of measuring emotional response and perception of LLM-generated questionnaire and human-generated questionnaires

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-53255-1
PMID: https://pubmed.ncbi.nlm.nih.gov/38308014
تاريخ النشر: 2024-02-02
المؤلف: Zhao Zou وآخرون
الموضوع الرئيسي: التدخلات الرقمية في الصحة النفسية

نظرة عامة

تستكشف هذه الدراسة التجريبية تأثير الدردشة الآلية المدعومة بالذكاء الاصطناعي ChatGPT على تجارب المستخدمين، مع التركيز بشكل خاص على الاستجابات العاطفية. باستخدام استبيانين – أحدهما تم إنشاؤه بواسطة البشر والآخر بواسطة ChatGPT – إلى جانب نموذج كشف العواطف، شملت الدراسة 14 مشاركًا تتراوح أعمارهم بين 18 و 35 عامًا. أظهر تحليل البيانات من خلال تحليل التباين (ANOVA) أن استخدام ChatGPT عزز بشكل كبير سعادة المشاركين وقلل من الحزن، على الرغم من عدم وجود اختلافات ملحوظة بين الجنسين. جمعت الدراسة ما مجموعه 8672 نقطة بيانات مرتبطة بـ ChatGPT و8797 بمحتوى تم إنشاؤه بواسطة البشر، مما يبرز فعالية الدردشة الآلية في إنشاء استبيانات شاملة.

بينما تشير النتائج إلى تجارب إيجابية للمستخدمين مع ChatGPT، فإن القيود مثل حجم العينة الصغيرة ونطاق العمر الضيق تحد من إمكانية تعميم النتائج. تسهم الدراسة في الأدبيات المتزايدة حول تصورات البشر لنماذج اللغة المدعومة بالذكاء الاصطناعي وتوضح أن ChatGPT يمكن أن يتكيف مع استجاباته لتحسين تفاعلات المستخدمين. يجب أن تستكشف الأبحاث المستقبلية التأثيرات العاطفية لمختلف الدردشات الآلية عبر مجموعات عمرية وسياقات متنوعة، مع التحقيق بشكل أكبر في إمكانية الذكاء الاصطناعي في تعزيز تفاعل المستخدمين والدعم.

الطرق

تحدد قسم المنهجية النهج المنهجي المستخدم في البحث للتحقيق في الفرضيات المحددة. استخدمت الدراسة مزيجًا من الطرق الكمية والنوعية، بما في ذلك التجارب المنضبطة والاستبيانات، لجمع بيانات شاملة. تم إجراء التحليلات الإحصائية باستخدام أدوات البرمجيات لضمان موثوقية وصلاحية النتائج، مع تحديد مستويات الدلالة عند p < 0.05. بالإضافة إلى ذلك، تضمن تصميم البحث عينة تمثيلية لتعزيز إمكانية تعميم النتائج. شملت عملية جمع البيانات مراحل متعددة، بما في ذلك الاختبار المسبق والاختبار اللاحق لتقييم تأثير التدخلات. تم تعزيز الصرامة المنهجية من خلال استخدام التثليث، الذي تضمن التحقق المتبادل من النتائج من خلال مصادر بيانات وتقنيات تحليل مختلفة، مما يعزز قوة الاستنتاجات المستخلصة من الدراسة.

المناقشة

تسلط قسم المناقشة في الدراسة التجريبية الضوء على الاستجابات العاطفية للمشاركين عند التفاعل مع الاستبيانات التي صاغها البشر مقابل تلك التي أنشأها ChatGPT. استخدمت الدراسة، التي وافقت عليها لجنة أخلاقيات البحث البشري في جامعة ويسترن سيدني، تصميم استبيان مزدوج لتقييم التصورات العاطفية، باستخدام نموذج كشف العواطف المستند إلى إطار عمل YOLOv5 المدرب على مجموعة بيانات AffectNet. أظهرت النتائج أن المشاركين أبلغوا عن مستويات أعلى من السعادة ومستويات أقل من الحزن عند الرد على الاستبيان الذي أنشأه ChatGPT مقارنة بالإصدار الذي أنشأه البشر، مع قيم متوسطة للسعادة تبلغ 0.068 و0.047 على التوالي. ومن الجدير بالذكر أن المشاركات الإناث أظهرن حساسية أكبر للإصدار الخاص بـ ChatGPT، مما يعكس زيادة متوسطة في السعادة قدرها 0.042 مقارنة بنظرائهن الذكور.

على الرغم من هذه النتائج الإيجابية، كشفت الدراسة أيضًا أن استبيانًا تم إنشاؤه بالكامل بواسطة الذكاء الاصطناعي (الإصدار 3) لم يثير نفس الاستجابات العاطفية الإيجابية، مع زيادة في تعبيرات الغضب والدهشة التي قد تكون مرتبطة بطبيعة الأسئلة المطروحة. أظهر التحليل، بما في ذلك تحليل التباين للقياسات المتكررة، عدم وجود اختلافات عاطفية كبيرة بين نوعي الاستبيان، مما يشير إلى أنه بينما يمكن أن يعزز تدخل ChatGPT رضا المستخدم، قد يؤدي استخدامه المستقل إلى نتائج أقل إيجابية. بشكل عام، تؤكد الدراسة على إمكانية دمج الذكاء الاصطناعي في تصميم الاستبيانات لتحسين تجارب المستخدمين، مع الاعتراف أيضًا بالقيود مثل حجم العينة والقيود الديموغرافية التي قد تؤثر على إمكانية تعميم النتائج.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-53255-1
PMID: https://pubmed.ncbi.nlm.nih.gov/38308014
Publication Date: 2024-02-02
Author(s): Zhao Zou et al.
Primary Topic: Digital Mental Health Interventions

Overview

This pilot study investigates the impact of the AI-powered chatbot ChatGPT on user experiences, particularly focusing on emotional responses. Utilizing two questionnaires—one generated by humans and the other by ChatGPT—alongside an Emotion Detecting Model, the research involved 14 participants aged 18 to 35. Data analysis through Analysis of Variance (ANOVA) revealed that ChatGPT usage significantly enhanced participants’ happiness and reduced sadness, although no notable gender differences were found. The study collected a total of 8672 data points associated with ChatGPT and 8797 with human-generated content, highlighting the chatbot’s effectiveness in generating extensive questionnaires.

While the findings suggest positive user experiences with ChatGPT, limitations such as the small sample size and narrow age range restrict the generalizability of the results. The study contributes to the growing literature on human perceptions of AI language models and indicates that ChatGPT can adapt its responses to improve user interactions. Future research should explore the emotional impacts of various chatbots across diverse age groups and contexts, further investigating the potential of AI in enhancing user engagement and support.

Methods

The methodology section outlines the systematic approach employed in the research to investigate the specified hypotheses. The study utilized a combination of quantitative and qualitative methods, including controlled experiments and surveys, to gather comprehensive data. Statistical analyses were performed using software tools to ensure the reliability and validity of the results, with significance levels set at p < 0.05. Additionally, the research design incorporated a representative sample to enhance generalizability. Data collection involved multiple phases, including pre-testing and post-testing to assess the impact of the interventions. The methodological rigor was further strengthened by employing triangulation, which involved cross-verifying findings through different data sources and analytical techniques, thereby bolstering the robustness of the conclusions drawn from the study.

Discussion

The discussion section of the pilot study highlights the emotional responses of participants when interacting with questionnaires formulated by humans versus those generated by ChatGPT. The study, approved by the Western Sydney University Human Research Ethics Committee, utilized a dual-questionnaire design to assess emotional perceptions, employing an emotion detection model based on the YOLOv5 framework trained on the AffectNet dataset. Results indicated that participants reported higher levels of happiness and lower levels of sadness when responding to the ChatGPT-generated questionnaire compared to the human-generated version, with mean happiness values of 0.068 and 0.047, respectively. Notably, female participants exhibited greater sensitivity to the ChatGPT version, reflecting a mean happiness increase of 0.042 compared to their male counterparts.

Despite these positive findings, the study also revealed that a fully AI-generated questionnaire (Version 3) did not elicit the same favorable emotional responses, with increased expressions of anger and surprise potentially linked to the nature of the questions posed. The analysis, including repeated measures ANOVA, showed no significant emotional differences between the two questionnaire types, suggesting that while ChatGPT’s involvement can enhance user satisfaction, its independent use may lead to less favorable outcomes. Overall, the study underscores the potential of integrating AI in questionnaire design to improve user experiences, while also acknowledging limitations such as sample size and demographic constraints that may affect the generalizability of the findings.