الكتابة الذهنية المعززة بالذكاء الاصطناعي: دراسة استخدام نماذج اللغة الكبيرة في توليد الأفكار الجماعية AI-Augmented Brainwriting: Investigating the use of LLMs in group ideation

المجلة: Proceedings of the CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3613904.3642414
تاريخ النشر: 2024-05-11
المؤلف: Orit Shaer وآخرون
الموضوع الرئيسي: ديناميات الفريق والأداء

نظرة عامة

تستكشف هذه الورقة البحثية دمج نماذج اللغة الكبيرة (LLMs) في العملية الإبداعية، مع التركيز على مرحلتين حاسمتين: مرحلة التباين لتوليد الأفكار ومرحلة التقارب لتقييم الأفكار. طور المؤلفون إطار عمل للتفكير الجماعي-الذكاء الاصطناعي Brainwriting يستخدم LLM لتعزيز التفكير الجماعي. قاموا بتقييم فعالية هذا الإطار من خلال تقييم عملية توليد الأفكار والمساحة الناتجة للحلول.

لاستكشاف دور LLMs في تقييم الأفكار بشكل أكبر، صمم الباحثون محرك تقييم وقارنوا تقييماته مع تلك التي قدمها ثلاثة خبراء وستة مقيمين مبتدئين. تشير النتائج إلى أن دمج LLMs في Brainwriting لا يحسن فقط عملية التفكير ولكن يؤثر أيضًا بشكل إيجابي على جودة النتائج. بالإضافة إلى ذلك، تقدم الدراسة أدلة على أن LLMs يمكن أن تدعم بشكل فعال تقييم الأفكار. تختتم الورقة بمناقشة حول تداعيات هذه النتائج على التعليم والممارسة في تفاعل الإنسان مع الكمبيوتر (HCI).

مقدمة

تناقش مقدمة هذه الورقة البحثية تداعيات تقنيات الذكاء الاصطناعي التوليدي، وخاصة نماذج اللغة الكبيرة (LLMs)، على العمليات الإبداعية مثل التفكير الجماعي. تسلط الضوء على قيود طرق العصف الذهني التقليدية، التي يمكن أن تعيقها أحكام الأقران وحجب الإنتاج. لمعالجة هذه القضايا، تقدم الورقة Brainwriting، وهي طريقة تسمح للمشاركين بتوليد الأفكار بشكل متوازي قبل مشاركتها، مما يؤدي غالبًا إلى عدد أكبر من الأفكار مقارنة بالعصف الذهني وجهًا لوجه. يقترح المؤلفون إطار عمل للتفكير الجماعي-الذكاء الاصطناعي Brainwriting يدمج LLMs لتعزيز كل من مرحلة التباين لتوليد الأفكار ومرحلة التقارب لتقييم الأفكار.

تركز الدراسة على سؤالين رئيسيين: (RQ1) هل تحسن LLMs عملية توليد الأفكار خلال مرحلة التباين، و(RQ2) كيف يمكن أن تساعد LLMs في تقييم الأفكار خلال مرحلة التقارب. تم اختبار الإطار في دورة دراسية متقدمة لطلاب البكالوريوس حول تصميم التفاعل الملموس، بمشاركة 16 طالبًا واستخدام كل من طرق التقييم النوعية والكمية. تم تطوير محرك تقييم LLM لتقييم الأفكار بناءً على الصلة والابتكار والبصيرة، مع مقارنة تقييماته بتلك الخاصة بالمقيمين الخبراء والمبتدئين. تشير النتائج إلى أن دمج LLMs في Brainwriting يمكن أن يعزز عملية التفكير ونتائجها، مما يوفر رؤى قيمة للمعلمين والمصممين المبتدئين في مجال تفاعل الإنسان مع الكمبيوتر (HCI). تختتم الورقة بمناقشة مزايا وقيود هذا الدمج في عمليات التفكير التعاوني.

الطرق

في هذا القسم، يوضح المؤلفون منهجيتهم لتقييم تأثير GPT-3 على عملية توليد الأفكار خلال جلسات Brainwriting التعاونية. جمعوا بيانات من مصممين مبتدئين ومراجعين خبراء، مع التركيز على الأفكار المولدة، والمحفزات المستخدمة مع GPT-3، وتقييمات مختلفة للأفكار. شمل التقييم تحليلًا موضوعيًا لتأملات الطلاب وتطبيق تقنيات معالجة اللغة الطبيعية (NLP)، بما في ذلك spaCy وGensim لنمذجة الموضوعات، بالإضافة إلى تحليل الشخصية الكامنة المستند إلى المجال (LPA) لتقييم تباين الأفكار. استخدمت طريقة LPA تباين كولباك-ليبلر لإنشاء توقيعات وثائق تبرز المصطلحات الفريدة التي تساهم في التباين بين الأفكار التي أنشأها البشر وتلك التي أنشأها GPT-3.

تشير النتائج إلى أن استخدام GPT-3 أثر بشكل إيجابي على عملية التفكير، حيث اعترف 50% من الطلاب بأنه قدم وجهات نظر فريدة ووسع إنتاجهم الإبداعي. ومع ذلك، لاحظ بعض الطلاب قيودًا، مثل التكرار في اقتراحات GPT-3 والتحديات في صياغة محفزات فعالة. بشكل عام، بينما تم اعتبار GPT-3 أداة قيمة لتعزيز الإبداع وتوجيه المشروع، تم أيضًا إدراكه على أنه يحمل قيودًا تتطلب تفاعلًا دقيقًا لتعظيم فائدته.

النتائج

في النتائج المتعلقة بسؤال البحث 1 (RQ1)، وُجد أن 50% من الطلاب اعتبروا GPT-3 مفيدًا في تقديم وجهات نظر فريدة والمساعدة في توليد الأفكار. بحلول نهاية الفصل الدراسي، أفاد نصف الطلاب بأن GPT-3 قد عزز بشكل كبير مشاريعهم من خلال توضيح المفاهيم ومعالجة تحديات محددة، على الرغم من أن 31% لاحظوا مشاكل تتعلق بالتكرار والإبداع. ظهرت أفكار المشروع النهائية بشكل رئيسي من مزيج من المفاهيم التي أنشأها الطلاب والتحسينات من GPT-3، حيث استلهم فريق واحد مباشرة من اقتراحات GPT-3. كشفت التجميعات الدلالية أن الأفكار التي أنشأها البشر كانت تميل إلى الإشارة إلى مفاهيم مجردة، بينما كانت مخرجات GPT-3 أكثر وضوحًا، مما يبرز الاختلافات في استخدام المفردات بين المصدرين.

بالنسبة لسؤال البحث 2 (RQ2)، أشار تقييم الأفكار خلال مرحلة التقارب إلى أن محرك تقييم GPT-4 قدم تقييمات عالية باستمرار للأفكار التي اختارها في النهاية فرق الطلاب، مما يدل على اتساق داخلي قوي عبر معايير الصلة والابتكار والبصيرة. بالمقابل، أظهر المقيمون الخبراء والمبتدئون آراء متباينة وانخفاض في الاتساق. أظهر تحليل الارتباط علاقة إيجابية معتدلة بين تصنيفات الأفكار من قبل الخبراء والمبتدئين وGPT-4، مما يشير إلى أن GPT-4 يمكن أن يعمل كأداة موثوقة لتصفية الأفكار الأولية. من الجدير بالذكر أن جميع الأفكار التي حصلت على تقييمات عالية من GPT-4 تم اختيارها أيضًا من قبل الفرق، مما يشير إلى أن GPT-4 يمكن أن يدعم بشكل فعال عملية التفكير دون تصفية المفاهيم القيمة.

المناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على أهمية أساليب التفكير المنظمة، خاصة في البيئات التعاونية، ودمج نماذج اللغة الكبيرة (LLMs) لتعزيز هذه العمليات. يستعرض تقنيات التفكير المختلفة، مثل العصف الذهني وBrainwriting، مشيرًا إلى نقاط قوتها وقيودها. بينما يمكن أن تعيق العصف الذهني عوامل مثل أحكام الأقران وحجب الإنتاج، يقدم Brainwriting نهجًا متوازيًا يؤدي غالبًا إلى كمية وجودة أعلى من الأفكار. تؤكد الورقة على إمكانات LLMs لتعزيز التفكير الجماعي الصغير من خلال توليد أفكار إبداعية وتسهيل المناقشات، وبالتالي معالجة الحواجز الموجودة في طرق العصف الذهني التقليدية.

علاوة على ذلك، تستكشف الورقة مفهوم التعاون بين الإنسان والذكاء الاصطناعي، متتبعة جذوره إلى أنظمة التصميم المدعومة بالكمبيوتر المبكرة ومناقشة التطبيقات المعاصرة للذكاء الاصطناعي التوليدي في التصميم والتفكير. تؤكد على الحاجة إلى إطار تقييم منهجي للأفكار، يتضمن أبعادًا مثل الجدة، وقابلية العمل، والصلة، والدقة. يقترح المؤلفون إطار عمل للتفكير الجماعي-الذكاء الاصطناعي Brainwriting يتكون من مرحلة تباين، حيث يتم توليد الأفكار الأولية، تليها مرحلة تقارب، حيث يتم تقييم الأفكار وتنقيحها بمساعدة LLM. تهدف الدراسة إلى تقييم جدوى وفعالية هذا الإطار، خاصة في تعزيز نتائج التفكير وتقييم جودة الأفكار التي تم توليدها من خلال التعاون بين الإنسان وLLM.

القيود

تسلط قيود عملية Brainwriting الجماعية-الذكاء الاصطناعي المقترحة الضوء على عدة اعتبارات حاسمة بشأن دمج الوكلاء غير البشر، وخاصة نماذج اللغة الكبيرة (LLMs)، في التفكير التعاوني. بينما تظهر الدراسة إمكانات LLMs لتعزيز عملية التفكير بين المصممين المبتدئين، من الضروري الاعتراف بأن هؤلاء الوكلاء الذكاء الاصطناعي مدربون على المنطق واللغة الإنسانية التقليدية، مما قد يؤدي عن غير قصد إلى perpetuating التحيزات الاجتماعية الموجودة. على الرغم من عدم تحديد أي تحيزات محددة في الأفكار المولدة، يجب أن تستكشف الأبحاث المستقبلية وجود تحيزات تتعلق بمجموعات أو مفاهيم معينة وتطوير طرق لتصفية مثل هذه التحيزات من مخرجات التفكير.

علاوة على ذلك، يقتصر نطاق الدراسة على المستخدمين المبتدئين لـ GPT-3 ضمن سياق محدد لتعليم تفاعل الإنسان مع الكمبيوتر (HCI)، مع التركيز فقط على عملية تفكير واحدة وبيان مشكلة واحد. هذا يقيد إمكانية تعميم النتائج على المستخدمين الخبراء، أو مجالات الابتكار المختلفة، أو غيرها من البيئات التعليمية. بالإضافة إلى ذلك، لا تستكشف الدراسة الآثار طويلة المدى لدمج الذكاء الاصطناعي في تعليم HCI، بل تركز بدلاً من ذلك على النتائج الفورية. على الرغم من هذه القيود، تفتح الدراسة آفاقًا لاستكشافات مستقبلية عند تقاطع الذكاء الاصطناعي وHCI والتعليم، مما يشير إلى أن LLMs يمكن أن تدعم بشكل فعال التفكير التعاوني وتوفر تعليقات قيمة، شريطة أن يقلل تصميمها من خطر انتشار التحيزات.

Journal: Proceedings of the CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3613904.3642414
Publication Date: 2024-05-11
Author(s): Orit Shaer et al.
Primary Topic: Team Dynamics and Performance

Overview

This research paper investigates the integration of large language models (LLMs) into the creative process, focusing on two critical stages: the divergence stage of idea generation and the convergence stage of idea evaluation. The authors developed a collaborative group-AI Brainwriting ideation framework that utilizes an LLM to enhance group ideation. They assessed the effectiveness of this framework by evaluating the idea generation process and the resulting solution space.

To further explore the role of LLMs in idea evaluation, the researchers designed an evaluation engine and compared its ratings with those given by three expert and six novice evaluators. The findings indicate that the incorporation of LLMs in Brainwriting not only improves the ideation process but also positively impacts the quality of outcomes. Additionally, the study provides evidence that LLMs can effectively support the evaluation of ideas. The paper concludes with a discussion on the implications of these findings for human-computer interaction (HCI) education and practice.

Introduction

The introduction of this research paper discusses the implications of generative AI technologies, particularly large language models (LLMs), for creative processes such as group ideation. It highlights the limitations of traditional brainstorming methods, which can be hindered by peer judgment and production blocking. To address these issues, the paper introduces Brainwriting, a method that allows participants to generate ideas in parallel before sharing them, thus often yielding a greater number of ideas compared to face-to-face brainstorming. The authors propose a collaborative group-AI Brainwriting ideation framework that integrates LLMs to enhance both the divergence stage of idea generation and the convergence stage of idea evaluation.

The research focuses on two primary questions: (RQ1) whether LLMs improve the idea generation process during the divergence stage, and (RQ2) how LLMs can assist in evaluating ideas during the convergence stage. The framework was tested in an advanced undergraduate course on tangible interaction design, involving 16 students and employing both qualitative and quantitative evaluation methods. An LLM evaluation engine was developed to assess ideas based on relevance, innovation, and insightfulness, with its ratings compared to those of expert and novice evaluators. The findings suggest that integrating LLMs into Brainwriting can enhance the ideation process and its outcomes, providing valuable insights for educators and novice designers in the Human-Computer Interaction (HCI) field. The paper concludes by discussing the merits and limitations of this integration in collaborative ideation processes.

Methods

In this section, the authors detail their methodology for evaluating the impact of GPT-3 on the idea generation process during collaborative Brainwriting sessions. They collected data from novice designers and expert reviewers, focusing on the ideas generated, prompts used with GPT-3, and various ratings of the ideas. The evaluation involved thematic analysis of student reflections and the application of Natural Language Processing (NLP) techniques, including spaCy and Gensim for topic modeling, as well as Domain-based Latent Personal Analysis (LPA) to assess the divergence of ideas. The LPA method utilized Kullback-Leibler Divergence to create document signatures that highlight the unique terms contributing to the divergence between human-generated and GPT-3-generated ideas.

The results indicate that the use of GPT-3 positively influenced the ideation process, with 50% of students acknowledging that it provided unique perspectives and expanded their creative output. However, some students noted limitations, such as redundancy in GPT-3’s suggestions and challenges in crafting effective prompts. Overall, while GPT-3 was seen as a valuable tool for enhancing creativity and refining project direction, it was also perceived as having constraints that required careful interaction to maximize its utility.

Results

In the results concerning Research Question 1 (RQ1), it was found that 50% of students perceived GPT-3 as beneficial for providing unique perspectives and aiding in idea generation. By the semester’s end, half of the students reported that GPT-3 had significantly enhanced their projects by elaborating on concepts and addressing specific challenges, although 31% noted issues with redundancy and creativity. The final project ideas predominantly emerged from a combination of student-generated concepts and enhancements from GPT-3, with one team directly inspired by GPT-3’s suggestions. Semantic clustering revealed that human-generated ideas tended to reference abstract concepts, while GPT-3’s outputs were more concrete, highlighting differences in vocabulary usage between the two sources.

For Research Question 2 (RQ2), the evaluation of ideas during the convergence phase indicated that the GPT-4 evaluation engine provided consistently high ratings for ideas ultimately selected by student teams, demonstrating a robust internal consistency across the criteria of Relevance, Innovation, and Insightfulness. In contrast, expert and novice evaluators exhibited diverging opinions and lower consistency. Correlation analysis showed a moderate positive relationship among the rankings of ideas by experts, novices, and GPT-4, suggesting that GPT-4 could serve as a reliable tool for preliminary idea filtering. Notably, all ideas rated highly by GPT-4 were also chosen by teams, indicating that GPT-4 could effectively support the ideation process without filtering out valuable concepts.

Discussion

The discussion section of the research paper highlights the significance of structured ideation approaches, particularly in collaborative settings, and the integration of large language models (LLMs) to enhance these processes. It reviews various ideation techniques, such as brainstorming and brainwriting, noting their strengths and limitations. While brainstorming can be hindered by factors like peer judgment and production blocking, brainwriting offers a parallel approach that often results in a higher quantity and quality of ideas. The paper emphasizes the potential of LLMs to augment small group ideation by generating creative ideas and facilitating discussions, thereby addressing the barriers present in traditional brainstorming methods.

Furthermore, the paper explores the concept of human-AI co-creation, tracing its roots to early computer-aided design systems and discussing contemporary applications of generative AI in design and ideation. It underscores the need for a systematic evaluation framework for ideas, incorporating dimensions such as novelty, workability, relevance, and specificity. The authors propose a collaborative Group-AI Brainwriting framework that consists of a divergence phase, where initial ideas are generated, followed by a convergence phase, where ideas are evaluated and refined with the assistance of an LLM. The study aims to assess the feasibility and effectiveness of this framework, particularly in enhancing the ideation outcomes and evaluating the quality of ideas generated through human-LLM collaboration.

Limitations

The limitations of the proposed group-AI Brainwriting process highlight several critical considerations regarding the integration of non-human agents, specifically large language models (LLMs), in collaborative ideation. While the study demonstrates the potential of LLMs to enhance the ideation process among novice designers, it is essential to recognize that these AI agents are trained on traditional humanist logic and language, which may inadvertently perpetuate existing social biases. Although no specific biases were identified in the generated ideas, future research should investigate the presence of biases related to particular groups or concepts and develop methods to filter such biases from the ideation outputs.

Moreover, the study’s scope is limited to novice users of GPT-3 within a specific context of Human-Computer Interaction (HCI) education, focusing solely on a single ideation process and problem statement. This restricts the generalizability of the findings to expert users, different innovation domains, or other educational settings. Additionally, the research does not explore the long-term implications of AI integration in HCI education, concentrating instead on immediate outcomes. Despite these limitations, the study opens avenues for future exploration at the intersection of AI, HCI, and education, suggesting that LLMs can effectively support collaborative ideation and provide valuable feedback, provided that their design mitigates the risk of bias propagation.