أنا مقابل الآلة؟ تقييمات ذاتية للنصائح التي يولدها الإنسان والذكاء الاصطناعي Me vs. the machine? Subjective evaluations of human- and AI-generated advice

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-86623-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39893236
تاريخ النشر: 2025-02-01
المؤلف: Merrick Robinson Osborne وآخرون
الموضوع الرئيسي: الأخلاقيات والآثار الاجتماعية للذكاء الاصطناعي

نظرة عامة

تبحث الدراسة في تأثير الذكاء الاصطناعي، وبشكل خاص نماذج اللغة الكبيرة (LLMs) مثل ChatGPT، على اتخاذ القرارات البشرية والإدراك الذاتي في سياق النصائح الشخصية. على الرغم من وجود شكوك قائمة تجاه الذكاء الاصطناعي، تكشف الدراسة أن ChatGPT يقدم نصائح ذات جودة أعلى مقارنة بالمشاركين البشريين العاديين، خاصة في المجالات الحساسة مثل المواعدة والعلاقات. ومع ذلك، يظهر تحيز ضد النصائح التي تم إنشاؤها بواسطة الذكاء الاصطناعي عندما يكون المستخدمون على علم بمصدرها، مما يشير إلى تفضيل المحتوى الذي تم إنشاؤه بواسطة البشر.

في سلسلة من خمسة تجارب مسجلة مسبقًا تضم 1,722 مشاركًا، يفحص الباحثون أيضًا كيف تؤثر التفاعلات مع أدوات الذكاء الاصطناعي على تقييمات المستخدمين الذاتية. يجدون أن توليد النصائح بشكل مستقل قبل استشارة ChatGPT يعزز من جودة الاقتراحات التي يقدمها الذكاء الاصطناعي. على العكس، فإن تلقي نصائح تم إنشاؤها بواسطة الذكاء الاصطناعي قبل التوليد الذاتي يقلل من تقييمات المستخدمين لمدى أصالة النصائح. بشكل عام، تسلط النتائج الضوء على تأثير مزدوج للذكاء الاصطناعي في سياقات النصائح الشخصية: بينما يمكن أن يتفوق على النصائح البشرية، فإنه أيضًا يثير مقارنات اجتماعية قد تؤثر سلبًا على إدراك المستخدمين لذاتهم.

مقدمة

تناقش مقدمة هذه الورقة البحثية التكامل السريع للذكاء الاصطناعي (AI)، وبشكل خاص نماذج اللغة الكبيرة (LLMs) مثل ChatGPT، في الحياة اليومية. تستخدم هذه النماذج مجموعات بيانات واسعة من النصوص التي أنشأها البشر لتقديم ردود عبر مواضيع متنوعة، مما يعزز من اتخاذ القرارات للمستخدمين. على الرغم من قدراتها، لا يزال هناك تفضيل ملحوظ للنصائح التي أنشأها البشر في الأمور الشخصية، مدفوعًا بظاهرة تعرف باسم “النفور من الخوارزميات”، حيث يفضل الأفراد الرؤى البشرية على اقتراحات الذكاء الاصطناعي، حتى عندما قد تكون الأخيرة أكثر فعالية. تهدف الدراسة إلى تقييم جودة وأصالة النصائح الشخصية التي تم إنشاؤها بواسطة ChatGPT واستكشاف كيف يؤثر الوعي بمصدر الذكاء الاصطناعي على تصورات المستخدمين لهذه النصائح.

تستند الورقة إلى الأدبيات الموجودة بشأن النفور من الخوارزميات وتقدم مفهوم “المقارنة الاجتماعية من أنا إلى الآلة”، حيث يقيم المستخدمون نصائحهم مقارنة بتلك التي يقدمها الذكاء الاصطناعي. يمكن أن تؤدي هذه المقارنة إلى مشاعر مختلطة من التحقق أو عدم الكفاءة، حيث تقوم نماذج اللغة الكبيرة بدمج وجهات نظر بشرية متنوعة، مما يشبه الحكمة الجماعية. تتكون التحقيقات التجريبية من خمس دراسات مسجلة مسبقًا تضم 1,195 مشاركًا، تفحص تفضيلات النصائح البشرية مقابل النصائح التي تم إنشاؤها بواسطة الذكاء الاصطناعي، والتحيزات ضد مخرجات الذكاء الاصطناعي، وتأثير تفاعلات الذكاء الاصطناعي على التقييمات الذاتية. تهدف النتائج إلى تعميق الفهم للديناميات المعقدة بين النصائح الشخصية التي تم إنشاؤها بواسطة الذكاء الاصطناعي وتلك التي أنشأها البشر.

الطرق

تمت الموافقة على طرق البحث المستخدمة في هذه الدراسة بشكل صارم من قبل مجلس المراجعة المؤسسية في جامعة كاليفورنيا، بيركلي، مما يضمن الالتزام بالمعايير الأخلاقية. اتبعت جميع الإجراءات الإرشادات واللوائح المعمول بها المتعلقة بسياق البحث. تم إبلاغ المشاركين بالكامل عن طبيعة الدراسة وقدموا موافقتهم، بالإضافة إلى تلقي تعويض عن مشاركتهم.

النتائج

في الدراسة 1، أعرب المشاركون عن تفضيل قوي للنصائح التي أنشأها البشر على النصائح التي أنشأها الذكاء الاصطناعي عبر مواضيع متنوعة، مع متوسط درجة تفضيل قدرها $M = 62.87$ (SD = 28.01)، مما يشير إلى نفور كبير من الذكاء الاصطناعي (فرق متوسط = 12.87، $t(282) = 12.40$، $p < .001$، Cohen's $d = 0.74$). الموضوع الوحيد الذي تم تفضيل نصائح الذكاء الاصطناعي فيه كان "التكنولوجيا والبرمجيات"، بينما أظهرت "المواعدة والعلاقات" أعلى تفضيل للنصائح البشرية. أدى هذا النفور من الذكاء الاصطناعي في السياقات الحساسة إلى تركيز الدراسات اللاحقة على المواعدة والعلاقات. في الدراسة 2، قام المشاركون بتقييم النصائح من كل من ChatGPT ومصدر بشري دون معرفة الأصول. أظهرت النتائج أن نصائح ChatGPT تم تقييمها على أنها أكثر فعالية، وأعلى جودة، وأكثر أصالة من النصائح التي أنشأها البشر. تم تكرار هذا النمط مع نماذج ذكاء اصطناعي أخرى في الدراسات التكميلية A و B. أظهرت الدراسة 3 مزيدًا من التحيز ضد نصائح الذكاء الاصطناعي من خلال التلاعب بالمصدر المدرك للنصيحة؛ حيث قيم المشاركون نفس نصيحة ChatGPT على أنها أقل فعالية وأصالة عندما كانوا يعلمون أنها تم إنشاؤها بواسطة الذكاء الاصطناعي مقارنةً عندما كانوا يعتقدون أنها من إنسان. استكشفت الدراسات 4 و 5 تأثيرات المقارنة الاجتماعية، كاشفة أن المشاركين قيموا نصائح الذكاء الاصطناعي بشكل مشابه لنصائحهم الخاصة عند توليد النصائح أولاً، لكنهم اعتبروا نصائحهم الخاصة أكثر أصالة. أثر ترتيب التقييم بشكل كبير على هذه التصورات، مما يبرز تأثير المقارنة الذاتية على تقييم النصائح.

المناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على الديناميات الدقيقة المحيطة بتقييم النصائح التي تم إنشاؤها بواسطة نماذج اللغة الكبيرة (LLMs) مثل ChatGPT. في الدراسة 5، قيم المشاركون نصائح ChatGPT بشكل إيجابي أكثر عندما اعتقدوا أنها كتبت بواسطة إنسان، مما يظهر فرقًا كبيرًا في الجودة المدركة (Cohen’s d = 0.25) والفعالية (Cohen’s d = 0.47) مقارنةً بنصائحهم الخاصة. ومع ذلك، عندما كان المشاركون على علم بأن النصيحة جاءت من ChatGPT، قيموا نصائحهم الخاصة على أنها أكثر أصالة، وإن كان بحجم تأثير أصغر (Cohen’s d = 0.32) مقارنة بالدراسات السابقة (Cohen’s d = 0.72). يشير هذا إلى أن الوعي بملكية الذكاء الاصطناعي يقلل من القيمة المدركة لنصائحه.

تشير النتائج إلى علاقة معقدة بين المحتوى الذي تم إنشاؤه بواسطة الذكاء الاصطناعي والتقييم الذاتي البشري، كاشفة أنه بينما قد يفضل الأفراد جودة نصائح الذكاء الاصطناعي، إلا أنهم لا يزالون يقدرون مساهماتهم الخاصة على أنها أكثر أصالة. تؤكد الدراسة على تداعيات دمج الذكاء الاصطناعي في السياقات الشخصية والمهنية، خاصة فيما يتعلق بكيفية إدراك الأفراد لقدراتهم بالنسبة للمحتوى الذي تم إنشاؤه بواسطة الذكاء الاصطناعي. تشمل القيود التركيز على ChatGPT والسياق المحدد لنصائح المواعدة، مما يشير إلى الحاجة لمزيد من الاستكشاف لهذه الديناميات عبر نماذج اللغة الكبيرة المختلفة والمواضيع. يجب أن تبحث الأبحاث المستقبلية في الآليات وراء التأثيرات الملحوظة، خاصة التأثير النفسي للتفاعل مع النصائح التي تم إنشاؤها بواسطة الذكاء الاصطناعي على الإدراك الذاتي والأصالة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-86623-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39893236
Publication Date: 2025-02-01
Author(s): Merrick Robinson Osborne et al.
Primary Topic: Ethics and Social Impacts of AI

Overview

The research investigates the impact of artificial intelligence, specifically large language models (LLMs) like ChatGPT, on human decision-making and self-perception in the context of personal advice. Despite existing skepticism towards AI, the study reveals that ChatGPT provides advice of higher quality compared to average human participants, particularly in sensitive areas such as dating and relationships. However, a bias against AI-generated advice emerges when users are aware of its source, indicating a preference for human-generated content.

In a series of five preregistered experiments involving 1,722 participants, the researchers also examine how interactions with AI tools influence users’ self-evaluations. They find that generating advice independently before consulting ChatGPT enhances the perceived quality of the AI’s suggestions. Conversely, receiving AI-generated advice prior to self-generation diminishes users’ self-assessments of authenticity. Overall, the findings highlight a dual effect of AI in personal advice contexts: while it can outperform human advice, it also triggers social comparisons that may negatively affect users’ self-perception.

Introduction

The introduction of this research paper discusses the rapid integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT, into everyday life. These models utilize extensive datasets of human-generated text to provide responses across various topics, enhancing decision-making for users. Despite their capabilities, there remains a notable preference for human-generated advice in personal matters, driven by a phenomenon known as “algorithm aversion,” where individuals favor human insights over AI suggestions, even when the latter may be more effective. The research aims to assess the quality and authenticity of personal advice generated by ChatGPT and to explore how awareness of the AI source influences users’ perceptions of such advice.

The paper builds on existing literature regarding algorithm aversion and introduces the concept of “me-to-machine” social comparison, where users evaluate their advice against that of AI. This comparison can lead to mixed feelings of validation or inadequacy, as LLMs synthesize diverse human perspectives, resembling collective wisdom. The empirical investigation consists of five preregistered studies involving 1,195 participants, examining preferences for human versus AI-generated advice, biases against AI outputs, and the impact of AI interactions on self-evaluations. The findings aim to deepen the understanding of the complex dynamics between AI-generated and human-generated personal advice.

Methods

The research methods employed in this study were rigorously approved by the institutional review board at The University of California, Berkeley, ensuring adherence to ethical standards. All procedures followed established guidelines and regulations pertinent to the research context. Participants were fully informed about the study’s nature and provided their consent, in addition to receiving compensation for their involvement.

Results

In Study 1, participants expressed a strong preference for human-generated advice over AI-generated advice across various topics, with an average preference score of $M = 62.87$ (SD = 28.01), indicating significant AI aversion (mean difference = 12.87, $t(282) = 12.40$, $p < .001$, Cohen's $d = 0.74$). The only topic where AI advice was preferred was "Technology and software," while "Dating and Relationships" showed the highest preference for human advice. This aversion to AI in sensitive contexts led to the focus of subsequent studies on dating and relationships. In Study 2, participants evaluated advice from both ChatGPT and a human source without knowing the origins. Results indicated that ChatGPT's advice was rated as more effective, higher quality, and more authentic than human-generated advice. This pattern was replicated with other AI models in Supplemental Studies A & B. Study 3 further demonstrated bias against AI advice by manipulating the perceived source of the advice; participants rated the same ChatGPT advice as less effective and authentic when they knew it was AI-generated compared to when they believed it was from a human. Studies 4 and 5 explored social comparison effects, revealing that participants rated AI advice similarly to their own when generating advice first, but perceived their own advice as more authentic. The order of evaluation significantly influenced these perceptions, highlighting the impact of self-comparison on advice evaluation.

Discussion

The discussion section of the research paper highlights the nuanced dynamics surrounding the evaluation of advice generated by large language models (LLMs) like ChatGPT. In Study 5, participants rated ChatGPT-generated advice more positively when they believed it was authored by a human, demonstrating a significant difference in perceived quality (Cohen’s d = 0.25) and effectiveness (Cohen’s d = 0.47) compared to their own advice. However, when participants were aware that the advice came from ChatGPT, they rated their own advice as more authentic, albeit with a smaller effect size (Cohen’s d = 0.32) than in previous studies (Cohen’s d = 0.72). This suggests that awareness of the AI’s authorship diminishes the perceived value of its advice.

The findings indicate a complex relationship between AI-generated content and human self-evaluation, revealing that while individuals may prefer the quality of AI advice, they still value their own contributions as more authentic. The research underscores the implications of AI integration in personal and professional contexts, particularly regarding how individuals perceive their capabilities in relation to AI-generated content. Limitations include a focus on ChatGPT and the specific context of dating advice, suggesting a need for further exploration of these dynamics across different LLMs and topics. Future research should investigate the mechanisms behind the observed effects, particularly the psychological impact of interacting with AI-generated advice on self-perception and authenticity.