ربط البشر ونماذج اللغة الكبيرة: دراسة التعاون بين الإنسان والذكاء الاصطناعي في تحليل متطلبات الوكالات المتعددة لتبني الذكاء الاصطناعي في المؤسسات Bridging Humans and LLMs: Investigating Human-AI Collaboration in Multi-agent Requirements Analysis for Organizational AI Adoption

المجلة: e-Informatica Software Engineering Journal، المجلد: 20، العدد: 1
DOI: https://doi.org/10.37190/e-inf260103
تاريخ النشر: 2026-01-01
المؤلف: Malik Abdul Sami وآخرون
الموضوع الرئيسي: الأخلاقيات والآثار الاجتماعية للذكاء الاصطناعي

نظرة عامة

تبحث هذه الدراسة في فعالية نظام متعدد الوكلاء قائم على نماذج اللغة الكبيرة (LLMs) في تسهيل تحليل المتطلبات للمنظمات التي تخطط لتبني الذكاء الاصطناعي (AI). تستخدم الأبحاث نهجًا مختلطًا، يتضمن تصميم وتطوير نظام متعدد الوكلاء يهدف إلى توليد وترتيب أولويات المتطلبات، إلى جانب دراسات حالة مع أربع شركات. تم جمع البيانات من خلال استبيانات بعد الجلسات ومقابلات متابعة، مما كشف أن المشاركين وجدوا المتطلبات المولدة ذات صلة ومتوافقة مع أهداف المنظمة. عززت الملاحظات التكرارية من المستخدمين البشريين بشكل كبير من اكتمال المتطلبات وقابليتها للتطبيق، مما يبرز ضرورة الإشراف البشري لضمان الدقة السياقية والتحقق.

تشير النتائج إلى أن نظام الوكلاء المتعددين يمكن أن يدعم بفعالية هيكلة المتطلبات المبكرة ومناقشات التخطيط، على الرغم من أنه لا يحل محل الحاجة إلى الحكم البشري وخبرة المجال. أشار المشاركون إلى التحديات المتعلقة بشفافية النظام في منطق ترتيب الأولويات ودمجه مع أدوات المؤسسات الحالية. تختتم الدراسة بالقول إنه بينما يمكن للنظام تقليل الجهد اليدوي وتوفير نقاط انطلاق منظمة للمناقشات، هناك حاجة إلى تحسينات إضافية في قابلية الاستخدام والشفافية وقابلية التوسع لتبني أوسع في المنظمات. ستركز الأبحاث المستقبلية على تعزيز آليات الشفافية، واستكشاف الوظائف التعاونية، وتقييم تكامل النظام مع منصات المؤسسات لفهم تداعياته العملية بشكل أفضل عبر منظمات متنوعة.

مقدمة

في مقدمة هذه الورقة البحثية، يبرز المؤلفون التكامل المتزايد للذكاء الاصطناعي (AI) عبر مختلف القطاعات، مدفوعًا بالحاجة إلى تحسين الكفاءة التشغيلية، ونماذج الأعمال المبتكرة، وتعزيز تجارب العملاء. لقد تسارعت ظهور الذكاء الاصطناعي التوليدي (GenAI) بشكل خاص في هذا الاتجاه، حيث يقدم قدرات استدلال متقدمة يمكن دمجها بسلاسة في سير العمل التنظيمي. ومع ذلك، تواجه الشركات تحديات كبيرة في تبني GenAI، بما في ذلك المخاوف بشأن خصوصية البيانات وعدم القدرة على التنبؤ بمخرجات الذكاء الاصطناعي. لمعالجة هذه القضايا، تقترح الدراسة استخدام قدرات معالجة اللغة الطبيعية لـ GenAI لمحاكاة تفاعلات أصحاب المصلحة وتسهيل حوارات متعددة الوكلاء لتحليل المتطلبات.

تؤكد الورقة على أهمية هندسة المتطلبات (RE) في مراحل التخطيط المبكرة لتبني الذكاء الاصطناعي، حيث توفر عمليات منظمة لتحديد وترتيب أولويات احتياجات المنظمة. يقترح المؤلفون نظامًا جديدًا متعدد الوكلاء يدعم التخطيط الاستراتيجي للذكاء الاصطناعي من خلال محاكاة أدوار أصحاب المصلحة المختلفة، مثل مالك المنتج وضابط الامتثال، لتوليد وترتيب أولويات المتطلبات عالية المستوى لتبني الذكاء الاصطناعي بشكل تعاوني. توجه الأبحاث ثلاثة أسئلة رئيسية تركز على تصميم نظام الوكلاء المتعددين، وتأثير المشاركة البشرية على جودة المتطلبات، والتحديات التي تواجهها الشركات خلال عملية التبني. من خلال نهج مختلط يتضمن دراسة حالات متعددة عبر أربع شركات، تهدف الدراسة إلى تقديم رؤى حول التعاون بين الإنسان والذكاء الاصطناعي والتحديات العملية في التخطيط الاستراتيجي للذكاء الاصطناعي.

الطرق

تم هيكلة منهجية البحث في قسمين رئيسيين: (أ) تصميم وتنفيذ النظام المقترح، و(ب) تقييمه من خلال دراسات حالات متعددة أجريت عبر أربع منظمات. يوضح القسم أ هيكل النظام، والميزات التي تم تنفيذها، والمكونات التقنية المعنية في تطوير النظام. يركز القسم ب على تصميم دراسة الحالة، والذي يشمل اختيار الحالات، والعمليات لجمع البيانات، والإجراءات التحليلية المستخدمة لتقييم فعالية النظام.

يمثل الشكل 1 في الورقة منهجية البحث بصريًا، مؤكدًا دورة حياة النظام من مراحل التصميم والتطوير إلى تقييمه في سياقات عملية وعالمية. يضمن هذا النهج الشامل أن يتم فحص كل من الإطار النظري والتطبيقات العملية للنظام بدقة.

النتائج

يقدم قسم النتائج نتائج من دراسة شملت تسعة مشاركين أكملوا استبيانًا بعد الجلسة وأربع مقابلات شبه منظمة. استخدم المشاركون نظامًا متعدد الوكلاء لتوليد المتطلبات ضمن سير عمل تخطيطهم التنظيمي. من الجدير بالذكر أن المشارك P8 من الشركة D قام بتوليد 21 متطلبًا أوليًا، والتي تم تحسينها من خلال جولات الملاحظات، مما أسفر عن 16 قصة مستخدم معتمدة. أبرزت عملية الملاحظات أهمية توضيح النطاق، وإزالة التكرارات، وضمان الامتثال للأطر التنظيمية، مثل قانون الذكاء الاصطناعي في الاتحاد الأوروبي.

تكشف تفسير النتائج أنه عبر الشركات الأربع، دعم النظام بفعالية توليد المتطلبات، وتحسينها، وترتيب أولوياتها، مع اعتبار الإشراف البشري ضروريًا للدقة السياقية. قام المشاركون بتوافق المتطلبات المولدة بواسطة الذكاء الاصطناعي مع المصطلحات الخاصة بالمجال والأهداف الداخلية، وضبطها لتلبية قيود الامتثال والقابلية للتطبيق. وجدت الدراسة أن المدخلات البشرية كانت تركز بشكل أساسي على التفسير السياقي بدلاً من مجرد تصحيح الأخطاء، مع تسهيل آليات الملاحظات التعديلات المستهدفة عادةً ضمن جولتين. تم نسب الاختلافات في تجارب المشاركين إلى السياق التنظيمي، وخصائص القطاع، والأدوار الفردية، مما يشير إلى أن فعالية التعاون بين الإنسان والذكاء الاصطناعي في هندسة المتطلبات تتأثر بعوامل مثل نضج المنظمة في تبني الذكاء الاصطناعي والاحتياجات المحددة للقطاعات المختلفة.

المناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على التحديات المتعددة الأوجه التي تواجهها المنظمات في تبني تقنيات الذكاء الاصطناعي، مشددًا على أن التكامل الناجح يتطلب أكثر من مجرد التنفيذ الفني. تعتبر عوامل رئيسية مثل التوافق الاستراتيجي، وحوكمة البيانات، واستعداد الإدارات المختلفة حاسمة، كما أشار Smit وآخرون. [20]. يقوم Russo [6] بتشكيل هذه الديناميات من خلال إطار التعاون والتكيف بين الإنسان والذكاء الاصطناعي، محددًا توافق سير العمل كالدافع الرئيسي لتبني الذكاء الاصطناعي التوليدي (GenAI)، متجاوزًا المقاييس التقليدية مثل الفائدة المدركة. تؤكد الدراسات التجريبية هذه النتائج، كاشفة عن الحواجز مثل مخاوف خصوصية البيانات، وعدم وضوح مقاييس القيمة، وعدم كفاية توجيه الموظفين، مما يعيق التكامل الفعال للذكاء الاصطناعي عبر مختلف القطاعات [5][24][25].

علاوة على ذلك، تناقش الورقة إمكانيات نماذج اللغة الكبيرة (LLMs) وأنظمة الوكلاء المتعددين في تعزيز عمليات هندسة المتطلبات (RE). بينما يمكن أن تساعد LLMs في توليد وتحسين المتطلبات، فإن المخاوف بشأن دقة وموثوقية مخرجاتها تتطلب إشرافًا بشريًا. يتضمن الإطار المقترح متعدد الوكلاء نهجًا يعتمد على وجود الإنسان في الحلقة (HITL)، مما يسمح بتغذية راجعة تكرارية وتحسين المخرجات لضمان توافقها مع الأهداف التنظيمية. تقيم الدراسة هذا الإطار تجريبيًا عبر أربع شركات، مما يوضح قابليته العملية في تسهيل التخطيط الاستراتيجي لتبني الذكاء الاصطناعي من خلال تحليل المتطلبات التعاونية، وبالتالي معالجة الحاجة الملحة لأدوات تدعم كل من الأتمتة والحكم البشري في السياقات التنظيمية.

القيود

تسلط القيود في هذه الدراسة الضوء على عدة قيود تؤثر على تفسير نتائجها. تشمل القضايا الرئيسية المشاكل المستمرة للهلوسة، وضعف التكيف مع المجال، ونقص الشفافية في الاستدلال ضمن نماذج اللغة الكبيرة الحالية. على الرغم من تنفيذ التحقق من صحة وجود الإنسان في الحلقة للتخفيف من التناقضات، لا يزال النظام يظهر أخطاء في التوليد وتكرار، مما يشير إلى عدم قدرته على العمل بشكل مستقل. بالإضافة إلى ذلك، كانت تنسيق الوكلاء معوقًا بسبب غياب الذاكرة المشتركة والتواصل التكيفي، مما حد من الاستدلال التعاوني والاستمرارية خلال دورات التحسين. قيد استخدام نموذج GPT-4o للتقييم المقارنات عبر هياكل مختلفة ولم يعالج تحيز النموذج والعدالة. علاوة على ذلك، فإن عينة المشاركين الصغيرة – التي تضم تسعة استبيانات وأربع مقابلات عبر أربع منظمات – تحد من إمكانية تعميم النتائج.

يجب أن تهدف الأبحاث المستقبلية إلى توسيع قاعدة المشاركين لتعزيز التمثيل وتقييم إمكانية التعميم عبر مختلف الصناعات. يمكن أن يؤدي دمج الذاكرة المعززة بالاسترجاع أو نماذج السياق الممتد إلى تحسين استمرارية الاستدلال وقابلية التتبع خلال التفاعلات التكرارية. بالإضافة إلى ذلك، يمكن أن تعزز ميزات الشرح مثل سجلات الاستدلال، وتتبع القرارات، أو تصور المبررات من قابلية التفسير والثقة في النظام. يعد التقييم المقارن الذي يتضمن نماذج مفتوحة وملكية متعددة أمرًا ضروريًا لتقييم القوة والقدرة على التكرار. يمكن أن تحقق الدراسات الطولية والدراسات الخاصة بالمجال مزيدًا من التحقيق في تكيف النظام مع مرور الوقت وتحت القيود التنظيمية. أخيرًا، قد تسهل تطوير آليات التنسيق التكيفية مشاركة المعلومات بشكل أفضل والاستدلال التعاوني بين الوكلاء، مما يدعم قابلية التوسع في النشر التنظيمي.

Journal: e-Informatica Software Engineering Journal, Volume: 20, Issue: 1
DOI: https://doi.org/10.37190/e-inf260103
Publication Date: 2026-01-01
Author(s): Malik Abdul Sami et al.
Primary Topic: Ethics and Social Impacts of AI

Overview

This study investigates the efficacy of a Large Language Models (LLMs)-based multi-agent system in facilitating requirements analysis for organizations planning to adopt Artificial Intelligence (AI). The research employs a mixed-method approach, which includes the design and development of a multi-agent system aimed at generating and prioritizing requirements, alongside case studies with four companies. Data was collected through post-session questionnaires and follow-up interviews, revealing that participants found the generated requirements relevant and aligned with organizational goals. Iterative feedback from human users significantly enhanced the completeness and feasibility of the requirements, underscoring the necessity of human oversight for contextual accuracy and validation.

The findings suggest that the multi-agent system can effectively support early requirement structuring and planning discussions, although it does not replace the need for human judgment and domain expertise. Participants highlighted challenges related to the system’s transparency in prioritization logic and integration with existing enterprise tools. The study concludes that while the system can reduce manual effort and provide structured starting points for discussions, further refinements in usability, transparency, and scalability are needed for broader organizational adoption. Future research will focus on enhancing transparency mechanisms, exploring collaborative functionalities, and assessing the system’s integration with enterprise platforms to better understand its practical implications across diverse organizations.

Introduction

In the introduction of this research paper, the authors highlight the increasing integration of Artificial Intelligence (AI) across various sectors, driven by the need for improved operational efficiency, innovative business models, and enhanced customer experiences. The emergence of Generative AI (GenAI) has particularly accelerated this trend, offering advanced reasoning capabilities that can be seamlessly integrated into organizational workflows. However, companies face significant challenges in adopting GenAI, including concerns about data privacy and the unpredictability of AI outputs. To address these issues, the study proposes utilizing GenAI’s natural language processing capabilities to simulate stakeholder interactions and facilitate multi-agent dialogues for requirements analysis.

The paper emphasizes the importance of Requirements Engineering (RE) in the early planning phases of AI adoption, as it provides structured processes for identifying and prioritizing organizational needs. The authors propose a novel multi-agent system that supports strategic AI planning by simulating various stakeholder roles, such as Product Owner and Compliance Officer, to collaboratively generate and prioritize high-level requirements for AI adoption. The research is guided by three key questions focusing on the design of the multi-agent system, the influence of human involvement on requirement quality, and the challenges faced by companies during the adoption process. Through a mixed-methods approach involving a multiple-case study across four companies, the study aims to provide insights into human-AI collaboration and practical challenges in strategic AI planning.

Methods

The research methodology is structured into two primary sections: (A) the design and implementation of the proposed system, and (B) its evaluation through multi-case studies conducted across four organizations. Section A details the system architecture, the features that have been implemented, and the technical components involved in the system’s development. Section B focuses on the case study design, which encompasses the selection of cases, the processes for data collection, and the analytical procedures employed to assess the system’s effectiveness.

Figure 1 in the paper visually represents the methodology, emphasizing the lifecycle of the system from its design and development phases to its evaluation in practical, real-world contexts. This comprehensive approach ensures that both the theoretical framework and practical applications of the system are rigorously examined.

Results

The results section presents findings from a study involving nine participants who completed a post-session questionnaire and four follow-up semi-structured interviews. The participants utilized a multi-agent system for requirement generation within their organizational planning workflows. Notably, participant P8 from Company D generated 21 initial requirements, which were refined through feedback rounds, resulting in 16 approved user stories. The feedback process highlighted the importance of clarifying scope, removing duplicates, and ensuring compliance with regulatory frameworks, such as the EU AI Act.

The interpretation of findings reveals that across the four companies, the system effectively supported requirement generation, refinement, and prioritization, with human oversight deemed essential for contextual accuracy. Participants aligned AI-generated requirements with domain-specific terminology and internal objectives, adjusting them for compliance and feasibility constraints. The study found that human input was primarily focused on contextual interpretation rather than mere error correction, with feedback mechanisms facilitating targeted revisions typically within two rounds. Differences in participant experiences were attributed to organizational context, sector characteristics, and individual roles, indicating that the effectiveness of human-AI collaboration in requirement engineering is influenced by factors such as organizational maturity in AI adoption and the specific needs of different sectors.

Discussion

The discussion section of the research paper highlights the multifaceted challenges organizations face in adopting AI technologies, emphasizing that successful integration requires more than just technical implementation. Key factors such as strategic alignment, data governance, and interdepartmental readiness are critical, as noted by Smit et al. [20]. Russo [6] formalizes these dynamics through the Human-AI Collaboration and Adaptation Framework, identifying workflow compatibility as the primary driver for Generative AI (GenAI) adoption, surpassing traditional metrics like perceived usefulness. Empirical studies corroborate these findings, revealing barriers such as data privacy concerns, unclear value metrics, and insufficient employee guidance, which hinder effective AI integration across various sectors [5][24][25].

Furthermore, the paper discusses the potential of Large Language Models (LLMs) and multi-agent systems in enhancing requirements engineering (RE) processes. While LLMs can assist in generating and refining requirements, concerns about the accuracy and reliability of their outputs necessitate human oversight. The proposed multi-agent framework incorporates a human-in-the-loop (HITL) approach, allowing for iterative feedback and refinement of outputs to ensure alignment with organizational objectives. The study empirically evaluates this framework across four companies, demonstrating its practical applicability in facilitating strategic AI adoption planning through collaborative requirements analysis, thereby addressing the critical need for tools that support both automation and human judgment in organizational contexts.

Limitations

The limitations of this study highlight several constraints that affect the interpretation of its findings. Key issues include the persistent problems of hallucination, weak domain adaptation, and a lack of transparency in reasoning within current large language models. Despite implementing human-in-the-loop validation to mitigate inconsistencies, the system still exhibited generation errors and redundancy, indicating its inability to function autonomously. Additionally, agent coordination was hampered by the absence of shared memory and adaptive communication, which limited collaborative reasoning and continuity during refinement cycles. The use of the GPT-4o model for evaluation restricted comparisons across different architectures and did not address model bias and fairness. Furthermore, the small participant sample—comprising only nine questionnaires and four interviews across four organizations—limits the generalizability of the results.

Future research should aim to expand the participant base to enhance representativeness and assess generalizability across various industries. Incorporating retrieval-augmented memory or extended-context models could improve reasoning continuity and traceability during iterative interactions. Additionally, embedding explainability features such as reasoning logs, decision-tracing, or rationale visualization could bolster interpretability and trust in the system. A comparative evaluation involving multiple open and proprietary models is essential for assessing robustness and reproducibility. Longitudinal and domain-specific studies could further investigate system adaptation over time and under regulatory constraints. Finally, developing adaptive coordination mechanisms may facilitate better information sharing and collaborative reasoning among agents, thereby supporting scalability in organizational deployment.