يعتبر المقيمون من الأطراف الثالثة الذكاء الاصطناعي أكثر تعاطفًا من البشر الخبراء Third-party evaluators perceive AI as more compassionate than expert humans

المجلة: Communications Psychology، المجلد: 3، العدد: 1
DOI: https://doi.org/10.1038/s44271-024-00182-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39794410
تاريخ النشر: 2025-01-10
المؤلف: Dariya Ovsyannikova وآخرون
الموضوع الرئيسي: علم نفس الحكم الأخلاقي والعاطفي

نظرة عامة

تستكشف هذه الدراسة الفعالية المقارنة للاستجابات التعاطفية التي تم إنشاؤها بواسطة الذكاء الاصطناعي مقابل تلك التي يقدمها المستجيبون البشريون في سياقات مختلفة. عبر أربعة تجارب مسجلة مسبقًا شملت 556 مشاركًا، أظهرت النتائج باستمرار أن استجابات الذكاء الاصطناعي كانت مفضلة وتم تقييمها على أنها أكثر تعاطفًا من تلك التي قدمها مختارون من المستجيبين البشريين غير الخبراء والخبراء. ومن الجدير بالذكر أن هذه التفضيلات استمرت حتى عندما تم الكشف عن هويات المستجيبين، مما يشير إلى أن الجودة المدركة للتواصل التعاطفي للذكاء الاصطناعي ظلت متفوقة بغض النظر عن مجموعة المقارنة البشرية.

تسلط النتائج الضوء على أن الأطراف الثالثة اعتبرت الذكاء الاصطناعي أكثر استجابة، حيث ينقل الفهم والتحقق والرعاية بفعالية، مما ساهم في ارتفاع تقييمات التعاطف. تؤكد هذه الرؤى على إمكانية الذكاء الاصطناعي في تلبية الطلب المتزايد على التواصل التعاطفي في السياقات الداعمة، خاصة عندما تكون الموارد البشرية محدودة. بشكل عام، تضع الدراسة الذكاء الاصطناعي كأداة قيمة في السيناريوهات التي تتطلب تفاعلًا تعاطفيًا، مما يظهر قدرته على تقديم تواصل تعاطفي متسق وفعال.

الطرق

في هذه الدراسة، كان هدف المؤلفين هو تقييم فعالية العبارات التعاطفية التي أنشأها البشر مقابل تلك التي أنشأها الذكاء الاصطناعي من خلال مقارنة الاستجابات لمثيرات التعاطف عبر أربعة تجارب. تم تطوير عشرة مثيرات للتعاطف، تتكون من خمس تجارب إيجابية وخمس تجارب سلبية. استعرض المشاركون في الدراسات 1-3 جميع المثيرات العشرة، بينما تضمنت الدراسة 4 ستة مثيرات فقط. لكل مثير، تم تقديم زوج من الاستجابات للمشاركين: واحدة تم إنشاؤها بواسطة إنسان وواحدة بواسطة نموذج الذكاء الاصطناعي ChatGPT (gpt-4-0125-preview). تم الحصول على الاستجابات البشرية من كل من المشاركين الجامعيين والمستجيبين المدربين في خطوط الطوارئ، مما يضمن اختيار عبارات تعاطفية عالية الجودة بناءً على معايير مثل الأهمية العاطفية وقابلية الارتباط.

تم إنشاء استجابات الذكاء الاصطناعي من خلال تحفيز ChatGPT بمشاهد التعاطف، مما أنتج خمس استجابات متميزة لكل مثير، والتي تم عشوائيتها لمراجعة المشاركين. قام المشاركون بتقييم تعاطف كل استجابة على مقياس ليكرت من 5 نقاط وأشاروا إلى تفضيلهم للاستجابة التي اعتبروها أكثر فعالية. بالإضافة إلى ذلك، في الدراسة 4، قام المشاركون بتقييم الاستجابة المدركة للاستجابات بناءً على الفهم والتحقق والرعاية. استخدمت الدراسة كل من الظروف العمياء والشفافة للتحكم في التحيز في تقييم الاستجابة. تم قياس تعاطف المشاركين أيضًا لاستكشاف التأثيرات المحتملة على تقييماتهم. تم الحصول على الموافقة الأخلاقية، وتم تعويض المشاركين عن مشاركتهم في الدراسة.

النتائج

أكدت نتائج الدراسة كلا من الفرضيات المسجلة مسبقًا: قام المشاركون بتقييم الاستجابات التي أنشأها الذكاء الاصطناعي على أنها أكثر تعاطفًا من تلك التي أنشأها البشر، وأعربوا عن تفضيلهم لاستجابات الذكاء الاصطناعي على تلك البشرية. كانت هذه النتائج متسقة عبر جميع التجارب الأربع التي تم إجراؤها. يتم توضيح ملخص تقييمات التعاطف في الشكل 2، بينما يقدم الشكل 3 تفضيلات الاستجابات عبر الدراسات الأربع.

المناقشة

في هذه الدراسة، بحثنا في التعاطف المدرك لاستجابات الذكاء الاصطناعي مقارنة بتلك التي قدمها المستجيبون البشريون عبر أربع تجارب. كانت استراتيجيتنا في أخذ العينات تهدف إلى تحقيق قوة كافية لاكتشاف تأثيرات ذات دلالة، مما أسفر عن إجمالي 400 مشارك عبر تصميمات مختلطة و54 مشاركًا في تصميمات داخل الموضوع. شملت التحليلات الإحصائية اختبارات t للعينات المعتمدة ونماذج مختلطة، كاشفة أن استجابات الذكاء الاصطناعي تم تقييمها باستمرار على أنها أكثر تعاطفًا من تلك التي قدمتها مصادر بشرية مختارة وخبيرة، مع ملاحظة تأثيرات كبيرة عبر ظروف مختلفة، بما في ذلك شفافية مؤلف الاستجابة.

تشير النتائج إلى أن الذكاء الاصطناعي لم يتفوق فقط على المستجيبين البشريين في تقييمات التعاطف ولكن أيضًا في أبعاد معينة من الاستجابة، مثل الفهم والتحقق والرعاية. ومن الجدير بالذكر أن ميزة الذكاء الاصطناعي كانت أكثر وضوحًا عند معالجة المثيرات السلبية، مما يشير إلى أن قدرته على التعبير عن التعاطف قد تكون فعالة بشكل خاص في سياقات الضيق. بينما تضاءل إدراك قدرات الذكاء الاصطناعي التعاطفية عندما كان المشاركون على علم بملكية الذكاء الاصطناعي، إلا أنه ظل متفوقًا على الاستجابات البشرية حتى في الظروف الشفافة. يتحدى هذا الآراء التقليدية حول خبرة البشر في التواصل التعاطفي ويبرز إمكانية الذكاء الاصطناعي في تكملة جهود البشر في تقديم الدعم، خاصة في السيناريوهات التي تكون فيها الموارد البشرية محدودة. ومع ذلك، يجب معالجة الاعتبارات الأخلاقية المتعلقة باستخدام الذكاء الاصطناعي في السياقات التعاطفية لضمان التكامل المسؤول في أنظمة الدعم.

القيود

تسلط قسم القيود الضوء على عدة عوامل حاسمة قد تؤثر على قابلية تعميم وتفسير نتائج الدراسة. أولاً، تم تقييم استجابات الذكاء الاصطناعي والبشر من قبل مقيمين من طرف ثالث، مما يثير تساؤلات حول ما إذا كانت هذه النتائج ستظل صحيحة في السيناريوهات التي يكون فيها المقيمون هم المستفيدون المباشرون من التعاطف. يجب أن تستكشف الأبحاث المستقبلية كيف يؤثر التعليق المباشر من المشاركين على تفضيلاتهم للتعاطف البشري مقابل الذكاء الاصطناعي.

بالإضافة إلى ذلك، لم تأخذ الدراسة في الاعتبار التأثير المحتمل لمدى معرفة المشاركين بتكنولوجيا الذكاء الاصطناعي على تقييماتهم للاستجابات في كل من الظروف العمياء والشفافة. من المحتمل أن تشكل الفروق الفردية، مثل المعرفة والكفاءة مع الذكاء الاصطناعي، بالإضافة إلى عوامل الشخصية والاجتماعية المختلفة، المواقف تجاه الذكاء الاصطناعي. علاوة على ذلك، ركزت الأبحاث الحالية بشكل أساسي على التفاعلات القصيرة؛ لذا، يجب أن تأخذ التحقيقات المستقبلية في الاعتبار المشاركات طويلة الأمد مع الذكاء الاصطناعي التعاطفي لتحديد كيف تتطور التفضيلات والمواقف مع مرور الوقت وتقييم دور الذكاء الاصطناعي التعاطفي في دعم الصحة النفسية للمستخدمين ذوي الخبرة.

Journal: Communications Psychology, Volume: 3, Issue: 1
DOI: https://doi.org/10.1038/s44271-024-00182-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39794410
Publication Date: 2025-01-10
Author(s): Dariya Ovsyannikova et al.
Primary Topic: Psychology of Moral and Emotional Judgment

Overview

This study investigates the comparative effectiveness of AI-generated empathetic responses versus those from human responders in various contexts. Across four preregistered experiments involving 556 participants, findings consistently indicated that AI responses were preferred and rated as more compassionate than those from selected non-expert and expert human responders. Notably, this preference persisted even when the identities of the responders were disclosed, suggesting that the perceived quality of AI’s empathetic communication remained superior regardless of the human comparison group.

The results highlight that third parties viewed AI as more responsive, effectively conveying understanding, validation, and care, which contributed to its higher compassion ratings. These insights underscore the potential of AI to fulfill the growing demand for empathetic communication in supportive contexts, particularly when human resources are limited. Overall, the study positions AI as a valuable tool in scenarios requiring empathetic interaction, demonstrating its capability to deliver consistent and effective compassionate communication.

Methods

In this study, the authors aimed to evaluate the effectiveness of human versus AI-generated empathic statements by comparing responses to empathy prompts across four experiments. A total of ten empathy prompts, comprising five positive and five negative experiences, were developed. Participants in studies 1-3 reviewed all ten prompts, while study 4 involved only six prompts. For each prompt, participants were presented with a pair of responses: one generated by a human and one by the AI model ChatGPT (gpt-4-0125-preview). Human responses were sourced from both university participants and trained crisis hotline responders, ensuring a selection of high-quality empathic statements based on criteria such as emotional salience and relatability.

The AI responses were generated by prompting ChatGPT with the empathy vignettes, producing five distinct responses for each prompt, which were then randomized for participant review. Participants rated the compassion of each response on a 5-point Likert scale and indicated their preference for the response they deemed more effective. Additionally, in study 4, participants assessed the perceived responsiveness of the responses based on understanding, validation, and caring. The study employed both blind and transparent conditions to control for bias in response evaluation. Participants’ trait empathy was also measured to explore potential moderating effects on their ratings. Ethical approval was obtained, and participants were compensated for their involvement in the study.

Results

The results of the study confirmed both preregistered hypotheses: participants rated AI-generated responses as more compassionate than those generated by humans, and they expressed a preference for the AI responses over the human ones. These findings were consistent across all four experiments conducted. The summary of the compassion ratings is illustrated in Figure 2, while Figure 3 presents the preferences for responses across the four studies.

Discussion

In this study, we investigated the perceived compassion of AI-generated responses compared to those from human responders across four experiments. Our sampling strategy aimed for sufficient power to detect meaningful effects, resulting in a total of 400 participants across mixed designs and 54 participants in within-subject designs. Statistical analyses included dependent samples t-tests and mixed models, revealing that AI responses were consistently rated as more compassionate than those from select and expert human sources, with significant effects observed across various conditions, including transparency of the response author.

The findings indicate that AI not only outperformed human responders in compassion ratings but also in specific dimensions of responsiveness, such as understanding, validation, and care. Notably, AI’s advantage was more pronounced when addressing negative prompts, suggesting that its capacity to express empathy may be particularly effective in contexts of distress. While the perception of AI’s empathic capabilities diminished when participants were aware of the AI’s authorship, it remained superior to human responses even in transparent conditions. This challenges traditional views on human expertise in empathetic communication and highlights the potential for AI to complement human efforts in providing support, especially in scenarios where human resources are limited. However, ethical considerations regarding the use of AI in empathetic contexts must be addressed to ensure responsible integration into support systems.

Limitations

The section on limitations highlights several critical factors that may affect the generalizability and interpretation of the study’s findings. Firstly, the evaluation of AI and human-generated responses was conducted by third-party assessors, which raises questions about whether these results would hold true in scenarios where the evaluators are direct recipients of empathy. Future research should explore how direct feedback from participants influences their preferences for AI versus human empathy.

Additionally, the study did not account for the potential impact of participants’ familiarity with AI technology on their evaluations of responses in both blind and transparent conditions. Individual differences, such as familiarity and proficiency with AI, as well as various personality and social factors, are likely to shape attitudes towards AI. Furthermore, the current research primarily focused on brief interactions; thus, future investigations should consider longer-term engagements with empathic AI to determine how preferences and attitudes evolve over time and to evaluate the role of empathic AI in supporting the mental health of experienced users.