توليد نماذج عمليات BPMN 2.0 تلقائيًا من أوصاف عمليات اللغة الطبيعية: التحديات، الإطار، تقييم الجودة Automatically Generating BPMN 2.0 Process Models from Natural Language Process Descriptions: Challenges, Framework, Quality Assessment

المجلة: Business & Information Systems Engineering، المجلد: 68، العدد: 1
DOI: https://doi.org/10.1007/s12599-025-00983-x
تاريخ النشر: 2026-01-12
المؤلف: Luca Franziska Hörner وآخرون
الموضوع الرئيسي: نمذجة وتحليل العمليات التجارية

نظرة عامة

تقدم ورقة البحث BPMNGen، وهو إطار محادثة قائم على LLM مصمم لأتمتة توليد نماذج عمليات BPMN 2.0 من أوصاف اللغة الطبيعية. على الرغم من التقدم في معالجة اللغة الطبيعية، لا يزال التحدي المتمثل في إنتاج نماذج صحيحة نحويًا ودقيقة دلاليًا قائمًا، خاصةً بالنسبة للمستخدمين الذين يفتقرون إلى الخبرة في نمذجة العمليات. يهدف BPMNGen إلى معالجة ذلك من خلال السماح للمستخدمين بإنشاء النماذج وتنقيحها من خلال واجهة تفاعلية. تم إجراء دراستين تجريبيتين لتقييم النماذج المولدة، مع التركيز على الدقة الدلالية، وقابلية الفهم، والعبء المعرفي، وقبول المستخدم. أظهرت النتائج أنه بينما يؤدي BPMNGen بشكل جيد للعمليات الأبسط، فإنه يواجه صعوبة مع السيناريوهات الأكثر تعقيدًا، حيث تُفضل النماذج التي أنشأها الخبراء.

تكشف النتائج أن النماذج التي تم إنشاؤها بواسطة BPMNGen تفرض عبئًا معرفيًا مشابهًا لنماذج الخبراء، مما يشير إلى أن تعقيد العملية المودلة هو عامل أكثر أهمية من مصدر النموذج. على الرغم من أن المشاركين قيموا نماذج الخبراء أعلى من حيث الفائدة المدركة، أظهرت نماذج BPMNGen إمكانات لفهم سريع في السياقات الأبسط. تشمل اتجاهات البحث المستقبلية تعزيز قدرات BPMNGen لدعم البنى المتقدمة لـ BPMN، واستكشاف المدخلات متعددة الوسائط لتفاعل المستخدم، والتحقيق في ترجمات النموذج إلى النص لتحسين التواصل بين منشئي النماذج وأصحاب المصلحة. بشكل عام، يمثل BPMNGen خطوة مهمة نحو جعل نمذجة العمليات التجارية أكثر سهولة، مع فرص واسعة للتطوير والتطبيق في نماذج مختلفة.

مقدمة

تؤكد مقدمة ورقة البحث على أهمية نمذجة العمليات في إدارة المنظمات، مع تسليط الضوء على دورها في توثيق، وتصوير، وتحليل، وتحسين العمليات التجارية لتعزيز الكفاءة والفعالية (Alves et al. 2023). يركز المؤلفون على BPMN 2.0، وهو معيار معترف به على نطاق واسع لنمذجة العمليات التجارية الذي يوفر تعبيرًا أكبر مقارنةً باللغات الأخرى، مثل مخططات نشاط UML وسلاسل العمليات المدفوعة بالحدث. على الرغم من مزاياها، يشير المؤلفون إلى أن تحويل أوصاف اللغة الطبيعية إلى نماذج BPMN 2.0 دقيقة يمثل تحديات، خاصةً بالنسبة للمستخدمين الذين يفتقرون إلى الخبرة في نمذجة العمليات (Zimoch et al. 2018).

لمعالجة هذه التحديات، تقدم الورقة BPMNGen، وهو إطار محادثة قائم على LLM مصمم لتسهيل توليد نماذج BPMN 2.0 من أوصاف اللغة الطبيعية. يسمح الإطار للمستخدمين بتنقيح أوصاف عملياتهم والنماذج المقابلة بشكل تكراري. أجرى المؤلفون دراستين تجريبيتين لتقييم جودة النماذج المولدة، مع التركيز على الدقة الدلالية وقابلية الفهم. تشير النتائج إلى أن BPMNGen يمكن أن ينتج نماذج بدقة دلالية قابلة للمقارنة مع تلك التي أنشأها الخبراء، وفي بعض الحالات، تكون النماذج المولدة أسهل في الفهم. ومع ذلك، يعترف المؤلفون بالقيود، بما في ذلك التركيز على السيناريوهات البسيطة والاعتماد على مقاييس ذاتية الإبلاغ. بشكل عام، تؤكد الورقة على إمكانات الأطر القائمة على LLM مثل BPMNGen في دعم مهام نمذجة العمليات المعقدة مع التأكيد على أهمية تقييم الجودة لموثوقية وقبول النماذج المولدة تلقائيًا.

الطرق

في هذا القسم، يصف المؤلفون المنهجية المستخدمة في دراستين، كل منهما تستخدم خمسة سيناريوهات عملية متميزة لتقييم تقنيات نمذجة العمليات. لكل سيناريو، تم إنشاء نموذجين لعملية BPMN 2.0: واحد تم إنشاؤه تلقائيًا باستخدام BPMNGen، المدعوم بـ GPT-3.5 Turbo، وآخر تم إنشاؤه يدويًا بواسطة خبير بشري. أسفر ذلك عن إجمالي عشرة نماذج عمليات لكل دراسة. تضمنت عملية التوليد التلقائي نهجًا بدون ضربات، حيث تم تقديم أوصاف كاملة باللغة الطبيعية إلى BPMNGen. تم إجراء تعديلات بصرية فقط لتحسين تخطيط النماذج المولدة، مع ضمان بقاء العناصر الدلالية وهياكل النماذج دون تغيير. تم إنشاء نماذج الخبراء في SAP Signavio، وللحفاظ على التناسق البصري، تم إعادة رسم جميع مخرجات BPMNGen في نفس الأداة.

شملت السيناريوهات الخمسة التسوق عبر الإنترنت، وخبز البيتزا، واستعارة كتاب، وصنع القهوة، وطلب إجازة. في المتوسط، احتوت النماذج المولدة بواسطة BPMNGen على 12 عنصرًا نمذجيًا، بينما كانت نماذج الخبراء تحتوي على حوالي 14 عنصرًا. تم زيادة تعقيد النماذج بشكل منهجي من خلال أوصاف عمليات تفصيلية تدريجية، حيث كانت أبسط وصف تتوافق مع النموذج الأول والأكثر تعقيدًا مع الخامس. يتم تلخيص البيانات الوصفية المتعلقة بعدد العناصر لكل نوع لكلا مجموعتي النماذج في الجدول 7، مما يوفر نظرة شاملة على خصائص النمذجة.

النتائج

في قسم النتائج، يتم تقديم النتائج من الدراسة الأولى والدراسة الثانية، مع التركيز على كل من الإحصائيات الوصفية والاستنتاجية المتعلقة بالعبء المعرفي وأداء النموذج عبر خمسة نماذج عمليات. تكشف التحليلات أن العبء المعرفي الداخلي (ICL) والعبء المعرفي الخارجي (ECL) يزدادان مع تعقيد نماذج العمليات، حيث تظهر نماذج “الخبير” قيمًا متوسطة أعلى من نماذج “BPMNGen”، خاصةً بالنسبة للنماذج PM4 و PM5. كما يتبع العبء المعرفي الجوهري (GCL) هذا الاتجاه، حيث يتفوق نموذج “الخبير” باستمرار على نموذج “BPMNGen” في السيناريوهات الأكثر تعقيدًا.

فيما يتعلق بالفائدة المدركة (PUU) وسهولة الاستخدام المدركة (PEU)، يسجل نموذج “الخبير” درجات أعلى في PUU عبر معظم النماذج، خاصةً PM1 و PM2، بينما تظهر كلا النموذجين تقييمات PEU مشابهة، مع تفوق نموذج “الخبير” قليلاً باستثناء PM3. تشير درجات الأداء إلى أن نموذج “BPMNGen” يحقق دقة متفوقة في PM3 و PM4، بينما يتفوق نموذج “الخبير” في PM5. بشكل عام، تسلط النتائج الضوء على موثوقية نماذج BPMNGen في السيناريوهات الأقل تعقيدًا، بينما تظهر نماذج “الخبير” نقاط القوة في السياقات الأكثر تعقيدًا. بالإضافة إلى ذلك، يوفر الجدول 14 تحليلًا مقارنًا لمقاييس العبء المعرفي، بما في ذلك المتوسطات والانحرافات المعيارية والإحصائيات الاستنتاجية مثل قيم اختبار t ومستويات الدلالة لكلا المجموعتين.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على قدرات وتطبيقات نماذج اللغة الكبيرة (LLMs) في معالجة اللغة الطبيعية، خاصةً في سياق نمذجة العمليات التجارية. تستفيد LLMs، مثل ChatGPT و Gemini، من بنية Transformer لمعالجة وتوليد نصوص تشبه النصوص البشرية بكفاءة، مما يجعلها مناسبة للمهام المتخصصة مثل النمذجة المفاهيمية والعمليات. يتأثر فعالية LLMs بشكل كبير باستراتيجيات التوجيه، التي توجه استجابات النموذج وتعزز من صلة وملاءمة المخرجات المولدة. يبرز هذا القسم أيضًا أهمية تفاعل المستخدم في تنقيح نماذج العمليات، موضحًا كيف يمكن لوكلاء المحادثة تسهيل التحسينات التكرارية من خلال الحوار.

علاوة على ذلك، تتناول الورقة دمج LLMs في إدارة العمليات التجارية (BPM)، مشيرةً إلى ظهور أطر تترجم أوصاف اللغة الطبيعية إلى نماذج عمليات رسمية، خاصةً باستخدام BPMN 2.0. بينما تظهر الأساليب الحالية جدوى نمذجة العمليات التلقائية، غالبًا ما تقصر في التطبيقات العملية التي تشمل تفاعل المستخدم وسلوكيات النموذج المعقدة. يقترح المؤلفون تقييمًا منهجيًا لنماذج العمليات المولدة بواسطة LLM، مع التركيز على التوافق الدلالي والجودة العملية، لفهم أفضل لقابليتها للاستخدام وقابلية الفهم. يهدف هذا النهج المتمركز حول الإنسان إلى سد الفجوة بين التوليد التلقائي وقبول المستخدم، مما يمهد الطريق لتطبيقات أكثر فعالية لـ LLMs في نمذجة العمليات التجارية.

القيود

يسلط قسم القيود في دراسات التقييم لـ BPMNGen الضوء على عدة عوامل حاسمة قد تؤثر على صحة وعمومية النتائج. واحدة من القيود الرئيسية هي بساطة السيناريوهات العملية المستخدمة، والتي تم اختيارها عمدًا لتتناسب مع قدرات BPMNGen ولتكون متاحة للمشاركين بمستويات مختلفة من الخبرة في النمذجة. بينما يعزز ذلك من الصحة البيئية، فإنه يقيد تطبيق BPMNGen على العمليات الأكثر تعقيدًا والمحددة في المجال. بالإضافة إلى ذلك، كانت عينة المشاركين تتكون بشكل أساسي من طلاب الجامعات، مما يثير مخاوف بشأن قابلية نقل النتائج إلى بيئات مهنية بها نماذج مدربة. على الرغم من الجهود المبذولة للتخفيف من تحيز الاختيار، قد تؤثر الفروقات المتبقية بين المجموعات على النتائج.

علاوة على ذلك، فإن اعتماد الدراسة على مقاييس ذاتية الإبلاغ لل Constructs مثل الفائدة المدركة والعبء المعرفي يقدم تحيزات محتملة، بينما قد لا تلتقط الطبيعة الثنائية لأسئلة الفهم العمق الكامل للفهم. قد تساهم البنية المدمجة لنماذج BPMNGen المولدة، التي تحتوي عادةً على عناصر أقل من النماذج التي أنشأها الخبراء، أيضًا في الفروقات الملحوظة في العبء المعرفي وسهولة الفهم المدركة. علاوة على ذلك، فإن التطور السريع لنماذج اللغة الكبيرة (LLMs) والاعتماد المحدد على النموذج الأساسي (GPT-3.5 Turbo) وسيناريوهات الإدخال النصي المستخدمة في الدراسة تشكل تهديدات إضافية لإمكانية إعادة الإنتاج والعمومية. أخيرًا، فإن التغطية المحدودة لعناصر BPMN 2.0 التي تم تقييمها تقيد تعبير BPMNGen، مما يبرز الحاجة إلى أبحاث مستقبلية لمعالجة هذه القيود لتعزيز قابلية تطبيق نمذجة العمليات المدعومة بـ LLM في سياقات متنوعة.

Journal: Business & Information Systems Engineering, Volume: 68, Issue: 1
DOI: https://doi.org/10.1007/s12599-025-00983-x
Publication Date: 2026-01-12
Author(s): Luca Franziska Hörner et al.
Primary Topic: Business Process Modeling and Analysis

Overview

The research paper introduces BPMNGen, an LLM-based conversational framework designed to automate the generation of BPMN 2.0 process models from natural language descriptions. Despite advancements in natural language processing, the challenge of producing syntactically correct and semantically accurate models remains, particularly for users lacking expertise in process modeling. BPMNGen aims to address this by allowing users to create and refine models through an interactive interface. Two empirical studies were conducted to evaluate the generated models, focusing on semantic accuracy, comprehensibility, cognitive load, and user acceptance. Results indicated that while BPMNGen performs well for simpler processes, it struggles with more complex scenarios, where expert-generated models are preferred.

The findings reveal that BPMNGen-generated models impose a comparable cognitive load to expert models, suggesting that the complexity of the modeled process is a more significant factor than the model’s source. Although participants rated expert models higher in perceived usefulness, BPMNGen models showed potential for quick understanding in simpler contexts. Future research directions include enhancing BPMNGen’s capabilities to support advanced BPMN constructs, exploring multimodal inputs for user interaction, and investigating model-to-text translations to improve communication between model creators and stakeholders. Overall, BPMNGen represents a significant step toward making business process modeling more accessible, with ample opportunities for further development and application in various modeling paradigms.

Introduction

The introduction of the research paper emphasizes the significance of process modeling in organizational management, highlighting its role in documenting, visualizing, analyzing, and optimizing business processes to enhance efficiency and effectiveness (Alves et al. 2023). The authors focus on BPMN 2.0, a widely recognized standard for business process modeling that offers greater expressiveness compared to other languages, such as UML activity diagrams and Event-Driven Process Chains. Despite its advantages, the authors note that converting natural language descriptions into accurate BPMN 2.0 models poses challenges, particularly for users lacking expertise in process modeling (Zimoch et al. 2018).

To address these challenges, the paper introduces BPMNGen, an LLM-based conversational framework designed to facilitate the generation of BPMN 2.0 models from natural language descriptions. The framework allows users to iteratively refine their process descriptions and the corresponding models. The authors conducted two empirical studies to evaluate the quality of the generated models, focusing on semantic accuracy and comprehensibility. The findings indicate that BPMNGen can produce models with semantic accuracy comparable to those created by experts, and in some cases, the generated models are easier to understand. However, the authors acknowledge limitations, including the focus on simple scenarios and reliance on self-reported measures. Overall, the paper underscores the potential of LLM-based frameworks like BPMNGen in supporting complex process modeling tasks while emphasizing the importance of quality assessment for the reliability and acceptance of automatically generated models.

Methods

In this section, the authors describe the methodology employed in two studies, each utilizing five distinct process scenarios to evaluate process modeling techniques. For each scenario, two BPMN 2.0 process models were created: one generated automatically using BPMNGen, powered by GPT-3.5 Turbo, and another manually crafted by a human expert. This resulted in a total of ten process models per study. The automatic generation involved a zero-shot prompting approach, where complete natural-language descriptions were provided to BPMNGen. Visual adjustments were made solely to enhance the layout of the generated models, ensuring that semantic elements and model structures remained unchanged. The expert models were created in SAP Signavio, and to maintain visual consistency, all BPMNGen outputs were redrawn in the same tool.

The five scenarios included online shopping, baking pizza, borrowing a book, making coffee, and requesting vacation. On average, the BPMNGen-generated models contained 12 modeling elements, while the expert models had approximately 14 elements. The complexity of the models was systematically increased through progressively detailed process descriptions, with the simplest description corresponding to the first model and the most complex to the fifth. Meta-data regarding the number of elements per type for both model sets is summarized in Table 7, providing a comprehensive overview of the modeling characteristics.

Results

In the Results section, the findings from Study I and Study II are presented, focusing on both descriptive and inferential statistics related to cognitive load and model performance across five process models. The analysis reveals that intrinsic cognitive load (ICL) and extraneous cognitive load (ECL) increase with the complexity of the process models, with the ‘Expert’ models exhibiting higher mean values than the ‘BPMNGen’ models, particularly for models PM4 and PM5. Germane cognitive load (GCL) also follows this trend, with the ‘Expert’ model consistently outperforming the ‘BPMNGen’ model in more complex scenarios.

Regarding perceived usefulness (PUU) and perceived ease of use (PEU), the ‘Expert’ model scores higher in PUU across most models, especially PM1 and PM2, while both models show similar PEU ratings, with the ‘Expert’ model slightly ahead except in PM3. Performance scores indicate that the ‘BPMNGen’ model achieves superior accuracy in PM3 and PM4, whereas the ‘Expert’ model excels in PM5. Overall, the results highlight the reliability of the BPMNGen models in less complex scenarios, while the ‘Expert’ models demonstrate strengths in more intricate contexts. Additionally, Table 14 provides a comparative analysis of cognitive load measures, including means, standard deviations, and inferential statistics such as t-test values and significance levels for both groups.

Discussion

The discussion section of the research paper highlights the capabilities and applications of Large Language Models (LLMs) in natural language processing, particularly in the context of business process modeling. LLMs, such as ChatGPT and Gemini, leverage the Transformer architecture to efficiently process and generate human-like text, making them suitable for specialized tasks like conceptual and process modeling. The effectiveness of LLMs is significantly influenced by prompting strategies, which guide the model’s responses and enhance the relevance and coherence of the generated outputs. This section also emphasizes the importance of user interaction in refining process models, showcasing how conversational agents can facilitate iterative improvements through dialogue.

Furthermore, the paper addresses the integration of LLMs into Business Process Management (BPM), noting the emergence of frameworks that translate natural language descriptions into formal process models, particularly using BPMN 2.0. While existing approaches demonstrate the feasibility of automated process modeling, they often fall short in practical implementations that encompass user interaction and complex model behaviors. The authors propose a systematic evaluation of LLM-generated process models, focusing on semantic alignment and pragmatic quality, to better understand their usability and comprehensibility. This human-centered approach aims to bridge the gap between automated generation and user acceptance, paving the way for more effective applications of LLMs in business process modeling.

Limitations

The section on limitations in the evaluation studies of BPMNGen highlights several critical factors that may affect the validity and generalizability of the findings. One primary limitation is the simplicity of the process scenarios employed, which were intentionally chosen to align with BPMNGen’s capabilities and to be accessible to participants with varying levels of modeling expertise. While this enhances ecological validity, it restricts the applicability of BPMNGen to more complex, domain-specific processes. Additionally, the participant sample predominantly consisted of university students, raising concerns about the transferability of results to professional environments with trained modelers. Despite efforts to mitigate selection bias, residual differences among groups could still influence outcomes.

Moreover, the study’s reliance on self-reported measures for constructs like perceived usefulness and cognitive load introduces potential biases, while the binary nature of comprehension questions may not capture the full depth of understanding. The structural compactness of BPMNGen-generated models, which typically contain fewer elements than expert-created models, may also contribute to observed differences in cognitive load and perceived ease of understanding. Furthermore, the rapid evolution of large language models (LLMs) and the specific dependencies on the underlying model (GPT-3.5 Turbo) and textual input scenarios used in the study pose additional threats to reproducibility and generalizability. Lastly, the limited coverage of BPMN 2.0 elements evaluated restricts the expressiveness of BPMNGen, emphasizing the need for future research to address these limitations to enhance the applicability of LLM-supported process modeling in diverse contexts.