دراسة جدوى لتطبيق المحادثات التي تم إنشاؤها بواسطة الذكاء الاصطناعي في التحليل البراغماتي A feasibility study for the application of AI-generated conversations in pragmatic analysis

المجلة: Journal of Pragmatics، المجلد: 223
DOI: https://doi.org/10.1016/j.pragma.2024.01.003
تاريخ النشر: 2024-02-21
المؤلف: Xi Chen وآخرون
الموضوع الرئيسي: اللغة، الخطاب، استراتيجيات الاتصال

نظرة عامة

تستكشف هذه الدراسة دور اللغة التي تم إنشاؤها بواسطة الذكاء الاصطناعي في التحليل البراغماتي، وهو مجال يركز تقليديًا على استخدام اللغة البشرية. مع ظهور نماذج اللغة الكبيرة مثل ChatGPT، تفحص الأبحاث الخصائص البراغماتية للنصوص التي أنشأها الذكاء الاصطناعي مقارنة بالمحادثات المكتوبة بواسطة البشر. تتضمن التحليل 148 حوارًا تم إنشاؤه بواسطة ChatGPT، و82 حوارًا مكتوبًا بواسطة البشر، و354 تقييمًا بشريًا، باستخدام طرق ترميز وإحصائية متنوعة. تشير النتائج إلى أن ChatGPT يؤدي بشكل مشابه للمشاركين البشريين في معظم الميزات البراغماتية والاجتماعية، مع عرض تنوع نحوي أكبر ورسمية. كان المشاركون عمومًا غير قادرين على التمييز بين المحادثات التي أنشأها الذكاء الاصطناعي وتلك المكتوبة بواسطة البشر.

في الختام، يمكن أن تكون المحادثات التي أنشأها الذكاء الاصطناعي مصدر بيانات قيمًا للتحليل البراغماتي، مما يوفر قاعدة أقل تحيزًا مقارنة بالبيانات البشرية. ومع ذلك، لا تزال هناك أسئلة حول قدرة الذكاء الاصطناعي على نقل ذاتية المتحدث والآليات وراء أدائه البراغماتي. تسلط الدراسة الضوء على الحاجة إلى مزيد من البحث في القدرات الميتابراجماتية للذكاء الاصطناعي مقارنة بالبشر وتقترح استكشاف أنواع مختلفة من لغة الذكاء الاصطناعي والتفاعلات متعددة الوسائط لتعزيز الفهم. بالإضافة إلى ذلك، فإن أداء ChatGPT في لغات أخرى غير الإنجليزية يستدعي مزيدًا من التحقيق، مما يبرز أهمية اختيار نماذج اللغة المدربة على بيانات اللغة المستهدفة ذات الصلة.

مقدمة

تسلط مقدمة ورقة البحث الضوء على التقدم السريع في نماذج اللغة الكبيرة (LLMs) والدردشات، مع التركيز بشكل خاص على قدرتها على إنتاج نصوص متماسكة وذات صلة بالسياق. بينما أثارت هذه التطورات اهتمامًا في تطبيقاتها عبر مجالات مختلفة، مثل الصحافة والتعليم، فإنها تثير أيضًا مخاوف بشأن جودة وموثوقية المحتوى الذي ينتجه الذكاء الاصطناعي. قارنَت الدراسات السابقة النصوص التي أنشأها الذكاء الاصطناعي مع المواد التي أنتجها البشر، كاشفة عن بعض القيود في قدرات الاستدلال البراغماتي للذكاء الاصطناعي. ومع ذلك، لا يزال الفحص المنهجي لكفاءة الذكاء الاصطناعي البراغماتية والاجتماعية غير مستكشف إلى حد كبير.

تهدف هذه الدراسة إلى سد هذه الفجوة من خلال تحليل 148 محادثة تم إنشاؤها بواسطة الذكاء الاصطناعي و82 محادثة مكتوبة بواسطة البشر عبر 74 سيناريو لفعل الكلام، مع التركيز على خمسة ميزات براغماتية وست ميزات اجتماعية. باستخدام مزيج من التحليل النوعي، وتقنيات معالجة اللغة الطبيعية، والاختبارات الإحصائية، تسعى الأبحاث إلى تقييم جدوى دمج لغة الذكاء الاصطناعي في التحليل البراغماتي. من المتوقع أن تكون النتائج لها آثار كبيرة على كل من مجال البراغماتية وتعليم اللغة، حيث يتم التعرف بشكل متزايد على دور الذكاء الاصطناعي كشريك تعاوني لمتعلمي اللغة. تمهد المقدمة الطريق لتحقيق شامل في الخصائص البراغماتية للمحادثات التي أنشأها الذكاء الاصطناعي، وهو أمر ضروري قبل دمج الذكاء الاصطناعي بالكامل في الممارسات التعليمية.

الطرق

في هذه الدراسة، جمع الباحثون ما مجموعه 82 محادثة من المشاركين البشر و148 من ChatGPT، إلى جانب 354 تقييمًا بشريًا لهذه التفاعلات. تضمنت المنهجية استخدام ترميز الاستراتيجية والتقنيات الحسابية، باستخدام مجموعة أدوات اللغة الطبيعية (NLTK)، لتحليل الميزات البراغماتية، بينما تم استخدام الاختبارات الإحصائية لتقييم الميزات الاجتماعية. تألف التصميم التجريبي من مكونين رئيسيين: صياغة مطالبات فعالة لاستنباط المحادثات وتطوير استبيان لجمع التقييمات الاجتماعية. تم اشتقاق المطالبات من 212 سيناريو تم تحديدها في 36 ورقة أكاديمية نشرت بين عامي 1984 و2022، مع اختيار 74 سيناريو بناءً على صلتها بتجارب حياة المشاركين وتنوع أفعال الكلام الممثلة.

تمت إعادة كتابة المطالبات بعناية من قبل مساعدي البحث لتقليل مؤشرات الجنس ولإطار المحادثات بطريقة تسهل التفاعل من كل من ChatGPT والمشاركين البشر. شمل قالب المطالبات النهائي تعليمات واضحة لتوليد الحوارات، مع تحديد علاقات الشخصيات والسياقات العاطفية لاستنباط محادثات طبيعية. بالإضافة إلى ذلك، تم تصميم استبيان اجتماعي لتقييم ست ميزات: فهم السياق، ملاءمة الاستراتيجية، مستويات اللباقة، (عدم) المباشرة، الرسمية، والامتثال للمعايير الاجتماعية. استخدم هذا الاستبيان مقياس ليكرت من 5 نقاط ليقوم المشاركون بتقييم المحادثات، مع التأكيد على الطبيعة التبادلية للملاءمة واللباقة في التواصل. كانت الدراسة تهدف إلى ضمان أن البيانات التي تم جمعها ستكون قابلة للتعميم عبر سياقات مختلفة، مع السماح أيضًا للمشاركين بالتعرف على ما إذا كانت المحادثة قد تم إنشاؤها بواسطة الذكاء الاصطناعي أو البشر.

النتائج

في القسم 4.1، تقدم الدراسة تحليلًا مقارنًا لخمس ميزات براغماتية، مع تسليط الضوء على الفروق والتشابهات التي لوحظت في البيانات. بعد ذلك، يتناول القسم 4.2 ست ميزات اجتماعية، موضحًا المزيد من الفروق في ديناميات المحادثة. أخيرًا، يتناول القسم 4.3 السؤال الحاسم حول ما إذا كان المشاركون البشر قادرين على التمييز بين المحادثات التي أنشأها الذكاء الاصطناعي وتلك التي كتبها البشر، مما يوفر رؤى حول الحدود الإدراكية بين التواصل الاصطناعي والبشري.

المناقشة

في مناقشة الكفاءة البراغماتية، يحدد البحث مكونين رئيسيين: الكفاءة البراغماتية، التي تتعلق بالموارد اللغوية لنقل الأفعال التواصلية، والكفاءة الاجتماعية، التي تتعلق بالتصورات الاجتماعية التي تؤثر على تفسير هذه الأفعال. يبرز المؤلفون الاعتماد المتبادل بين هاتين الكفاءتين، مشيرين إلى أنه بينما يتم تقييمهما غالبًا بشكل منفصل—الكفاءة البراغماتية من خلال الدقة اللغوية والتحكم في الخطاب، والكفاءة الاجتماعية من خلال الأحكام البشرية حول الملاءمة واللباقة—إلا أن علاقتهما معقدة ومتشابكة. تؤكد هذه القسم على الحاجة إلى نهج شامل يأخذ في الاعتبار كلا الكفاءتين في تقييم استخدام اللغة، خاصة في سياق اكتساب اللغة الثانية.

تستكشف الدراسة أيضًا تقييم الكفاءة البراغماتية من خلال أفعال الكلام، مشيرة إلى مشروع تحقيق أفعال الكلام عبر الثقافات (CCSARP) وتصنيفه لاستراتيجيات الكلام. تناقش التقدم المنهجي في تحليل أفعال الكلام، بما في ذلك استخدام الطرق الحسابية لتقييم التنوع النحوي وعلاقات الخطاب. تشير النتائج إلى أنه بينما يظهر كل من ChatGPT والمشاركون البشر خيارات معجمية مماثلة، إلا أنهم يختلفون بشكل كبير في التنوع النحوي، حيث يظهر ChatGPT أداءً أكثر اتساقًا. يكشف تحليل التعبيرات التقليدية أن كلا المجموعتين تستخدم استراتيجيات مماثلة في الطلبات والرفض، مما يشير إلى فهم مشترك للمعايير البراغماتية على الرغم من الاختلافات في التعبير الفردي. تهدف هذه الأبحاث إلى سد الفجوة بين التقييمات البراغماتية والاجتماعية، خاصة في سياق المحادثات التي أنشأها الذكاء الاصطناعي، وبالتالي المساهمة في الفهم الأوسع للكفاءة البراغماتية في كل من المتحدثين البشريين والاصطناعيين.

Journal: Journal of Pragmatics, Volume: 223
DOI: https://doi.org/10.1016/j.pragma.2024.01.003
Publication Date: 2024-02-21
Author(s): Xi Chen et al.
Primary Topic: Language, Discourse, Communication Strategies

Overview

This study investigates the role of AI-generated language in pragmatic analysis, a domain traditionally focused on human language use. With the rise of large language models like ChatGPT, the research examines the pragmatic qualities of AI-generated texts compared to human-written conversations. The analysis involves 148 ChatGPT-generated dialogues, 82 human-written dialogues, and 354 human evaluations, employing various coding and statistical methods. The results indicate that ChatGPT performs comparably to human participants in most pragmalinguistic and sociopragmatic features, displaying greater syntactic diversity and formality. Participants were generally unable to distinguish between AI-generated and human-written conversations.

In conclusion, AI-generated conversations can serve as a valuable data source for pragmatic analysis, potentially offering a less biased baseline compared to human data. However, questions remain regarding the ability of AI to convey speaker subjectivity and the mechanisms behind its pragmatic performance. The study highlights the need for further research into the metapragmatic abilities of AI compared to humans and suggests exploring different genres of AI language and multimodal interactions to enhance understanding. Additionally, the performance of ChatGPT in languages other than English warrants further investigation, emphasizing the importance of selecting language models trained on relevant target language data.

Introduction

The introduction of the research paper highlights the rapid advancements in large language models (LLMs) and chatbots, particularly emphasizing their ability to generate coherent and contextually relevant texts. While these developments have sparked interest in their applications across various domains, such as journalism and education, they also raise concerns regarding the quality and reliability of AI-generated content. Previous studies have compared AI-generated texts with human-produced materials, revealing some limitations in AI’s pragmatic inference capabilities. However, a systematic examination of AI’s pragmalinguistic and sociopragmatic competence remains largely unexplored.

This study aims to fill this gap by analyzing 148 AI-generated and 82 human-written conversations across 74 speech act scenarios, focusing on five pragmalinguistic and six sociopragmatic features. Employing a combination of qualitative analysis, natural language processing techniques, and statistical tests, the research seeks to assess the feasibility of incorporating AI-generated language into pragmatic analysis. The findings are expected to have significant implications for both the field of pragmatics and language education, where AI’s role as a collaborative partner for language learners is increasingly recognized. The introduction sets the stage for a comprehensive investigation into the pragmatic qualities of AI-generated conversations, which is essential before fully integrating AI into educational practices.

Methods

In this study, the researchers collected a total of 82 conversations from human participants and 148 from ChatGPT, alongside 354 human evaluations of these interactions. The methodology involved the use of strategy coding and computational techniques, specifically employing the Natural Language Toolkit (NLTK), to analyze pragmalinguistic features, while statistical tests were utilized to assess sociopragmatic features. The experimental design comprised two main components: crafting effective prompts for conversation elicitation and developing a questionnaire to gather emic sociopragmatic evaluations. The prompts were derived from 212 scenarios identified in 36 academic papers published between 1984 and 2022, with 74 scenarios selected based on their relevance to participants’ life experiences and the diversity of speech acts represented.

The prompts were carefully rewritten by research assistants to minimize gender indicators and to frame the conversations in a way that facilitated engagement from both ChatGPT and human participants. The final prompt template included clear instructions for generating dialogues, specifying character relationships and emotional contexts to elicit naturalistic conversations. Additionally, a sociopragmatic questionnaire was designed to evaluate six features: context understanding, strategy appropriateness, politeness levels, (in)directness, formality, and adherence to social norms. This questionnaire employed a 5-point Likert scale for participants to assess the conversations, emphasizing the reciprocal nature of appropriateness and politeness in communication. The study aimed to ensure that the data collected would be generalizable across various contexts, while also allowing participants to discern whether a conversation was generated by AI or humans.

Results

In Section 4.1, the study presents a comparative analysis of five pragmalinguistic features, highlighting the distinctions and similarities observed in the data. Following this, Section 4.2 delves into six sociopragmatic features, further elucidating the nuances in conversational dynamics. Finally, Section 4.3 addresses the critical question of whether human participants are capable of differentiating between AI-generated conversations and those authored by humans, providing insights into the perceptual boundaries between artificial and human communication.

Discussion

In the discussion of pragmatic competence, the paper delineates two primary components: pragmalinguistic competence, which involves the linguistic resources for conveying communicative acts, and sociopragmatic competence, which pertains to the social perceptions influencing the interpretation of these acts. The authors highlight the interdependence of these competences, noting that while they are often assessed separately—pragmalinguistic competence through linguistic accuracy and discourse control, and sociopragmatic competence via human judgments of appropriateness and politeness—their relationship is complex and intertwined. This section emphasizes the need for a comprehensive approach that considers both competences in evaluating language use, particularly in the context of second language acquisition.

The study further explores the assessment of pragmatic competence through speech acts, referencing the Cross-Cultural Speech Act Realization Project (CCSARP) and its categorization of speech strategies. It discusses the methodological advancements in analyzing speech acts, including the use of computational methods to evaluate syntactic diversity and discourse relations. The findings indicate that while ChatGPT and human participants exhibit similar lexical choices, they differ significantly in syntactic diversity, with ChatGPT demonstrating more consistent performance. The analysis of conventional expressions reveals that both groups employ similar strategies in requests and refusals, indicating a shared understanding of pragmatic norms despite variations in individual expression. This research aims to bridge the gap between pragmalinguistic and sociopragmatic assessments, particularly in the context of AI-generated conversations, thereby contributing to the broader understanding of pragmatic competence in both human and artificial communicators.