إلى أي مدى يمكن لنموذج اللغة الكبير أن يشرح العمليات التجارية كما يدركها المستخدمون؟ How well can a large language model explain business processes as perceived by users?

المجلة: Data & Knowledge Engineering، المجلد: 157
DOI: https://doi.org/10.1016/j.datak.2025.102416
تاريخ النشر: 2025-02-15
المؤلف: Dirk Fahland وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي القابل للتفسير (XAI)

نظرة عامة

تناقش هذه القسم النتائج والآثار المترتبة على دراسة تفحص كيف يمكن لنماذج اللغة الكبيرة (LLMs) أن تفسر العمليات التجارية من منظور المستخدم. بناءً على الأعمال السابقة، قام المؤلفون بتشكيل فكرتهم الأساسية في مجموعة من الفرضيات وتطوير مكتبة من الخدمات لأتمتة دمج المعرفة الإضافية في LLMs. أنشأوا مقياسًا صارمًا لتقييم فعالية هذه الخدمات عند تطبيقها على GPT-4.0، مما أدى إلى تحليل كمي لاستجابات المستخدمين.

تشير النتائج إلى أن تعزيز LLMs بمعرفة حول الاعتماديات التنفيذية السببية والتسلسل الزمني يمكن أن يحسن من الإخلاص المدرك في التفسيرات. ومع ذلك، فإن هذا التعزيز يعتمد على ثقة المستخدم في LLM وفضوله حول المشكلة المطروحة. يحذر المؤلفون من المبالغة في التأكيد على الإخلاص، حيث قد يقلل ذلك من القابلية للتفسير المدركة. تشمل اتجاهات البحث المستقبلية استكشاف التوازن بين الإخلاص والقابلية للتفسير، وفحص دمج المعرفة السببية مع معرفة العمليات، والتحقيق في آثار دمج معرفة العمليات مع نهج الذكاء الاصطناعي القابل للتفسير (XAI). تشير الدراسة إلى أن وجهات نظر إضافية تتعلق بالعمليات يمكن أن تغني التفسيرات المقدمة من قبل LLMs.

مقدمة

تؤكد مقدمة الورقة على الدور الحاسم للشفافية في اعتماد وثقة العمليات التجارية المدعومة بالذكاء الاصطناعي (ABPMSs). تفترض أن التفسيرات الفعالة تمكن المستخدمين من فهم والاستجابة لمواقف مختلفة أثناء تنفيذ العمليات، مما يدفع تحسينات مستمرة في العمليات. تسلط الورقة الضوء على الأهمية المتزايدة للأتمتة والذكاء الاصطناعي في إدارة العمليات التجارية (BPM)، مع توقع أن يتوسع القطاع بشكل كبير من 15.4 مليار دولار في 2022 إلى 65.8 مليار دولار بحلول 2032. يقترح المؤلفون الاستفادة من نماذج اللغة الكبيرة (LLMs) لأتمتة توليد التفسيرات للعمليات التجارية، مع معالجة التحديات التي تطرحها قيود LLMs في التفكير السببي وميولها لإنتاج الهلاوس.

تهدف الأبحاث إلى تعزيز الجودة المدركة للتفسيرات التي تولدها LLM من خلال دمج مدخلات معرفية متنوعة، بما في ذلك الاعتماديات السببية بين الأنشطة في العمليات التجارية. هذه المقاربة جديدة في تطبيق مكونات سببية لتحفيز LLMs على توليد التفسيرات. توضح الورقة منهجية منظمة، بما في ذلك مراجعة للمفاهيم الأساسية، وتصميم تجريبي، وتقييم للتفسيرات المولدة، مما يساهم في أداة لدمج LLMs لأتمتة التعبير عن التفسيرات في BPM. تشير النتائج إلى أن التلاعب بأنواع المعرفة يؤثر بشكل كبير على تصورات المستخدمين للإخلاص والقابلية للتفسير، مع تعديلها بواسطة عوامل مثل الثقة والفضول.

طرق البحث

استخدمت الدراسة تصميمًا تجريبيًا مختلطًا (3×2) للتحقيق في آثار نوع المعرفة وجودة التفسير المدركة عبر ثلاثة مجالات مشكلة متميزة. تم تعيين المشاركين عشوائيًا إلى مجموعتين وأكملوا استبيانًا عبر الإنترنت يتضمن الموافقة، والمعلومات الديموغرافية، وثلاث مجموعات من مقاييس التقييم بناءً على طريقة ليكرت. احتوت كل مجموعة على 24 عنصرًا، تركز على الإخلاص، والقابلية للتفسير، والفضول، والثقة، مع تقديم أوصاف وتفسيرات محددة لكل مجال مشكلة.

تضمن الإعداد التجريبي تقديم “تفسير الحقيقة الأساسية” من الباحثين جنبًا إلى جنب مع تفسير تم إنشاؤه بواسطة نموذج لغة كبير (LLM)، تحديدًا GPT-4.0. قام المشاركون بتقييم التفسيرات المولدة بواسطة LLM بالمقارنة مع الحقيقة الأساسية، والتي كانت بمثابة خط الأساس للتقييم. شملت مجالات المشكلة سيناريوهات تتعلق بتأخير توصيل البيتزا، وغرامات وقوف السيارات، وموافقات القروض، مع اختلافات في مدخلات المعرفة المقدمة لـ LLM عبر المجموعات. كانت المنهجية تهدف إلى تقييم صحة المدخلات التي تم إنتاجها بواسطة LLM مع ضمان أن المشاركين كان لديهم وصول إلى تفسيرات دقيقة لتقليل الاعتماد على معلومات قد تكون غير صحيحة. تهدف النتائج إلى تقديم رؤى حول تأثير مكونات المعرفة المختلفة على الجودة المدركة للتفسيرات التي تولدها LLMs.

النتائج

يقدم قسم “النتائج” من ورقة البحث النتائج المستمدة من التجارب أو التحليلات التي تم إجراؤها. يوضح النتائج التي توصلت إليها الدراسة، مع تسليط الضوء على نقاط البيانات والاتجاهات المهمة التي لوحظت في النتائج. قد يتضمن القسم تحليلات إحصائية، وتمثيلات رسومية، ومقارنات مع دراسات سابقة، مما يوفر نظرة شاملة على نتائج البحث.

غالبًا ما يتم تلخيص النتائج الرئيسية في جداول أو أشكال، توضح العلاقات بين المتغيرات وفعالية التدخلات أو المنهجيات المستخدمة. يركز القسم على الآثار المترتبة على هذه النتائج في سياق أسئلة البحث المطروحة، مما يقدم رؤى حول الأهمية الأوسع للنتائج ضمن مجال الدراسة.

المناقشة

في هذا القسم، يناقش المؤلفون دمج نماذج اللغة الكبيرة (LLMs) مع إدارة العمليات التجارية (BPM) لتعزيز الشفافية من خلال إطار عمل جديد يسمى الشفافية المدركة للموقف (SAX). تؤكد الدراسة على أهمية هندسة التحفيز، التي تتضمن صياغة التعليمات لتحسين مخرجات LLM لمهام BPM المحددة، مثل الاكتشاف السببي (CD) وتنقيب العمليات (PD). يجادل المؤلفون بأن طرق اكتشاف العمليات التقليدية، التي تركز بشكل أساسي على أسبقية الوقت، تفشل في التقاط العلاقات السببية اللازمة لتوصيات التدخل الفعالة. من خلال استخدام نموذج غير غاوسي خطي غير دوري (LiNGAM) للاكتشاف السببي، يهدف المؤلفون إلى كشف الاعتماديات التنفيذية التي تُعلم التفسيرات التي تولدها LLM، مما يحسن من قابلية التفسير والإخلاص لنتائج العمليات.

تم تصميم إطار SAX لمعالجة القيود الحالية في الشفافية من خلال دمج المعرفة السببية، والمعرفة العملية، والمعرفة القابلة للتفسير (XAI) في تحفيزات LLM. تهدف هذه المقاربة إلى إنتاج تفسيرات ليست فقط صحيحة سببيًا ولكن أيضًا ذات صلة بالسياق، مما يعزز فهم المستخدمين لظروف العمليات التجارية. يبرز المؤلفون أهمية عملهم في تعزيز تقاطع LLMs وBPM، مشيرين إلى أنه بينما استكشفت الدراسات السابقة تطبيقات LLM في BPM، لم يركز أي منها بشكل خاص على الآثار المترتبة على الشفافية. تعتبر مكتبة SAX4BPM، التي تم تطويرها كجزء من هذا البحث، أداة لتوليد هذه التفسيرات الغنية، مستفيدة من رسم بياني للمعرفة لتخزين ومعالجة البيانات ذات الصلة. بشكل عام، تقدم الدراسة فرضيات بشأن الجودة المدركة للتفسيرات التي تولدها LLM، مفترضة أن المعرفة المستنيرة حول العمليات التجارية ستؤدي إلى زيادة الإخلاص والقابلية للتفسير في التفسيرات المقدمة.

القيود

في هذا القسم، يعترف المؤلفون بعدة قيود متأصلة في دراستهم. أولاً، للحفاظ على الصلاحية الداخلية أثناء تطوير مقياس لتقييم الجودة المدركة للتفسيرات، قاموا بالتحكم في المتغيرات المحتملة مثل الثقة والفضول. استخدموا نموذج لغة كبير رائد (LLM) لضمان الحساسية في التقاط جودة التفسيرات. ومع ذلك، فإن التركيز على الجودة المدركة بدلاً من الجودة “موضوعية” تطلب اعتبارًا دقيقًا للأخيرة، والذي تم معالجته من خلال مراجعة الباحثين للسرد المولد من حيث الإخلاص للحقيقة الأساسية وتقديم هذه التفسيرات للمشاركين.

بالإضافة إلى ذلك، يعترف المؤلفون أنه بينما تعزز أدواتهم الصارمة وإجراءاتهم الموحدة القابلية للتكرار، قد تكون عمومية نتائجهم محدودة بسبب تنوع LLMs المستخدمة. يقترحون أن استخدام مجموعة أوسع من LLMs، بما في ذلك الخيارات مفتوحة المصدر، وإجراء تقييمات مقارنة يمكن أن يخفف من هذه القيود. يتم اقتراح هذه المقاربة كمسار واعد للبحث المستقبلي، مدعومًا بتوفر أدواتهم المطورة.

Journal: Data & Knowledge Engineering, Volume: 157
DOI: https://doi.org/10.1016/j.datak.2025.102416
Publication Date: 2025-02-15
Author(s): Dirk Fahland et al.
Primary Topic: Explainable Artificial Intelligence (XAI)

Overview

This section discusses the findings and implications of a study examining how large language models (LLMs) can explain business processes from the user’s perspective. Building on previous work, the authors formalized their core idea into a set of hypotheses and developed a library of services to automate the integration of additional knowledge into LLMs. They created a rigorous scale to evaluate the effectiveness of these services when applied to GPT-4.0, leading to a quantitative analysis of user responses.

The results indicate that enhancing LLMs with knowledge about causal execution dependencies and temporal sequencing can improve perceived fidelity in explanations. However, this enhancement is contingent on the user’s trust in the LLM and their curiosity about the problem at hand. The authors caution against overemphasizing fidelity, as it may reduce perceived interpretability. Future research directions include exploring the trade-off between fidelity and interpretability, examining the integration of causal knowledge with process knowledge, and investigating the effects of combining process knowledge with explainable AI (XAI) approaches. The study suggests that additional process-related perspectives could further enrich the explanations provided by LLMs.

Introduction

The introduction of the paper emphasizes the critical role of explainability in the adoption and trust of AI-augmented business processes (ABPMSs). It posits that effective explanations enable users to comprehend and respond to various situations during process executions, thereby driving continuous process improvements. The paper highlights the growing significance of automation and AI in business process management (BPM), with the industry projected to expand significantly from $15.4 billion in 2022 to $65.8 billion by 2032. The authors propose leveraging Large Language Models (LLMs) to automate the generation of explanations for business processes, addressing the challenges posed by LLMs’ limitations in causal reasoning and their tendency to produce hallucinations.

The research aims to enhance the perceived quality of LLM-generated explanations by integrating diverse knowledge inputs, including causal dependencies among activities in business processes. This approach is novel in its application of causal components to prompt LLMs for explanation generation. The paper outlines a structured methodology, including a review of fundamental concepts, experimental design, and evaluation of the generated explanations, ultimately contributing a tool for integrating LLMs to automate explanation articulation in BPM. The findings suggest that the manipulation of knowledge types significantly influences users’ perceptions of fidelity and interpretability, moderated by factors such as trust and curiosity.

Methods

The study employed a mixed experimental design (3×2) to investigate the effects of knowledge type and perceived explanation quality across three distinct problem domains. Participants were randomly assigned to two groups and completed an online questionnaire that included consent, demographic information, and three sets of rating scales based on the Likert method. Each set contained 24 items, focusing on fidelity, interpretability, curiosity, and trust, with specific descriptions and explanations provided for each problem domain.

The experimental setup involved presenting participants with a “ground truth” explanation from researchers alongside an explanation generated by a large language model (LLM), specifically GPT-4.0. Participants rated the LLM-generated explanations in comparison to the ground truth, which served as a baseline for evaluation. The problem domains included scenarios related to pizza delivery lateness, parking fines, and loan approvals, with variations in the knowledge inputs provided to the LLM across groups. The methodology aimed to assess the perceived correctness of the LLM outputs while ensuring that participants had access to accurate explanations to mitigate reliance on potentially incorrect information. The findings are intended to provide insights into the impact of different knowledge components on the perceived quality of explanations generated by LLMs.

Results

The “Results” section of the research paper presents the findings derived from the conducted experiments or analyses. It details the outcomes of the study, highlighting significant data points and trends observed in the results. The section may include statistical analyses, graphical representations, and comparisons to previous studies, providing a comprehensive overview of the research findings.

Key results are often summarized in tables or figures, illustrating the relationships between variables and the effectiveness of the interventions or methodologies employed. The section emphasizes the implications of these results in the context of the research questions posed, offering insights into the broader significance of the findings within the field of study.

Discussion

In this section, the authors discuss the integration of Large Language Models (LLMs) with Business Process Management (BPM) to enhance explainability through a novel framework called Situation-Aware eXplainability (SAX). The study emphasizes the importance of prompt engineering, which involves crafting instructions to optimize LLM outputs for specific BPM tasks, such as causal discovery (CD) and process mining (PD). The authors argue that traditional process discovery methods, which primarily focus on time precedence, fail to capture the causal relationships necessary for effective intervention recommendations. By employing the Linear Non-Gaussian Acyclic Model (LiNGAM) for causal discovery, the authors aim to uncover execution dependencies that inform LLM-generated explanations, thereby improving the interpretability and fidelity of process outcomes.

The SAX framework is designed to address existing limitations in explainability by incorporating causal, process, and explainable AI (XAI) knowledge into LLM prompts. This approach aims to produce explanations that are not only causally sound but also contextually relevant, enhancing users’ understanding of business process conditions. The authors highlight the significance of their work in advancing the intersection of LLMs and BPM, noting that while previous studies have explored LLM applications in BPM, none have specifically focused on the implications for explainability. The SAX4BPM library, developed as part of this research, serves as a tool for generating these enriched explanations, leveraging a knowledge graph to store and process relevant data. Overall, the study sets forth hypotheses regarding the perceived quality of LLM-generated explanations, positing that informed knowledge about business processes will lead to higher fidelity and interpretability in the explanations provided.

Limitations

In this section, the authors acknowledge several limitations inherent in their study. Firstly, to maintain internal validity while developing a scale for assessing the perceived quality of explanations, they controlled for potential covariates such as trust and curiosity. They utilized a leading large language model (LLM) to ensure sensitivity in capturing the quality of explanations. However, the focus on perceived quality over “objective” quality necessitated careful consideration of the latter, which was addressed by having the researchers review generated narratives for fidelity to the ground truth and presenting these explanations to participants.

Additionally, the authors recognize that while their rigorous instrumentation and standardized procedures enhance reproducibility, the generalizability of their findings may be limited due to the diversity of LLMs used. They suggest that employing a broader range of LLMs, including open-source options, and conducting comparative benchmarking could mitigate this limitation. This approach is proposed as a promising avenue for future research, facilitated by the availability of their developed instrumentation.