إطار لتجميع نماذج العمليات المحلية Framework for grouping local process models

المجلة: Journal of Intelligent Information Systems، المجلد: 64، العدد: 3
DOI: https://doi.org/10.1007/s10844-025-01010-x
تاريخ النشر: 2026-01-13
المؤلف: Viki Peeva وآخرون
الموضوع الرئيسي: نمذجة وتحليل العمليات التجارية

نظرة عامة

تناقش هذه الفقرة مفهوم نماذج العمليات المحلية (LPMs) في تعدين العمليات، مع تسليط الضوء على قدرتها على كشف الرؤى من بيانات الأحداث من خلال تحليل التسلسلات، والاختيارات، والتزامن، والحلقات. على الرغم من قيمتها، تواجه اكتشاف LPM تحديات مثل انفجار النموذج والتكرار، مما يؤدي إلى عدد هائل من النماذج المماثلة التي تعقد التحليل. يجادل المؤلفون بأن الاعتماد فقط على LPMs ذات الدرجات الأعلى يمكن أن يؤدي إلى التكرار وفهم غير مثالي للعمليات.

لمعالجة هذه القضايا، يقترح المؤلفون إطار عمل لتجميع LPMs، والذي يتضمن حساب المسافات بين النماذج باستخدام مقاييس التشابه المعتمدة أو من خلال تحليل سياق الأحداث التي تغطيها. يقوم هذا الإطار بتجميع LPMs المماثلة ويختار نموذجًا تمثيليًا من كل مجموعة، مما يقلل من التكرار بينما يعزز تنوع العينة. تُظهر التقييمات أن هذا النهج في التجميع يحسن من فهم العمليات ويوفر تمثيلًا أكثر شمولاً لتنوع LPM عبر سجلات الأحداث المتعددة. يختتم المؤلفون بالاعتراف بحدود إطار عملهم وتقييمهم، مقترحين مجالات للبحث المستقبلي.

مقدمة

تستعرض مقدمة هذه الورقة البحثية مجال تعدين العمليات، الذي يركز على اكتشاف، ومراقبة، وتحسين العمليات باستخدام بيانات من أنظمة الإدارة المختلفة. تستند هذه التخصص إلى ثلاثة أعمدة رئيسية: اكتشاف العمليات، والتحقق من المطابقة، وتعزيز العمليات (van der Aalst، 2016). مع توسع نطاق تعدين العمليات، خاصة في المجالات المعقدة مثل الرعاية الصحية وإنترنت الأشياء، تواجه تقنيات الاكتشاف التقليدية تحديات في توليد نماذج متماسكة. غالبًا ما تؤدي هذه التقنيات إلى نماذج معقدة للغاية، مثل نماذج الزهور أو السباغيتي، التي تفشل في تقديم رؤى واضحة (Tax et al.، 2016b).

لمعالجة هذه القضايا، تقدم الورقة اكتشاف نموذج العملية المحلية (LPM)، الذي يهدف إلى إنشاء نماذج أصغر وأكثر قابلية للإدارة تمثل أجزاء من السلوك. ومع ذلك، غالبًا ما تؤدي الأساليب الحالية لـ LPM إلى عدد هائل من النماذج المماثلة، مما يؤدي إلى انفجار النموذج والتكرار. يقترح المؤلفون إطار عمل جديد لاختيار عينة مثالية من LPMs من خلال تجميعها بناءً على تدفق التحكم وخصائص البيانات، مما يسمح بتحديد نماذج تمثيلية تلتقط بشكل أفضل العملية الأساسية. يركز التقييم على مقارنة العينات التقليدية ذات الدرجات الأعلى مع هذه العينات التمثيلية، مع تقييم تنوعها وتغطية الأحداث. تهدف الورقة إلى تعزيز فهم LPMs وقابليتها للتطبيق في سجلات الأحداث الواقعية، مما يسهم في تطوير منهجيات تعدين العمليات بشكل أكثر فعالية.

الطرق

في هذا القسم، يوضح المؤلفون الإعداد التجريبي المستخدم في بحثهم، مع التركيز على تحليل سجلات الأحداث ومجموعات LPM (نموذج عملية السجل) كما هو مفصل في الجدول 2. يستخدمون خمسة مقاييس لتشابه نماذج العمليات جنبًا إلى جنب مع مقياس لخصائص البيانات لحساب المسافات. لاختبار التجميع الهرمي، يتم اختبار طرق الربط المختلفة، بما في ذلك الربط الفردي، والربط الكامل، وUPGMA، وWPGMA، مع تكرار عدد المجموعات عبر نطاق محدد: {2، 5، 10، 15، 20، 30، 40، 50، 100، 200}.

يشمل التصميم التجريبي الشامل إجمالي 1440 تجربة، تأخذ في الاعتبار جميع تركيبات مجموعات LPM، وطرق استخراج المسافات، ومعلمات خوارزمية التجميع. نظرًا لقيود المساحة، يقدم المؤلفون فقط مجموعة من النتائج أو ملخصًا في الورقة، لكنهم يوفرون الوصول إلى مجموعة البيانات الكاملة ودفاتر التحليل عبر Zenodo وGitLab لاستكشاف المزيد من قبل القراء المهتمين.

النتائج

في هذا القسم، تُعرض نتائج التحليل المقارن للتغطية والتنوع عبر أنواع العينات المختلفة (العينة العليا، عينة التمثيل حسب الدرجات، وعينة التمثيل حسب المسافة). يكشف التحليل أن التغطية لعينة التمثيل حسب المسافة (RD) أقل باستمرار من تلك الخاصة بكل من العينات العليا (T) وعينة التمثيل حسب الدرجات (RS)، كما هو موضح بالقيم السلبية في الرسوم البيانية المقارنة بين cov RD – cov T والقيم الإيجابية لـ cov RS – cov RD. من الجدير بالذكر أن عينة RS تظهر عمومًا تغطية أعلى من عينة RD عبر معظم سجلات الأحداث، بينما يختلف فرق التغطية بين عينات T وRS اعتمادًا على سجل الأحداث. من حيث التنوع، تتفوق عينتا RS وRD على عينة T في معظم الحالات، على الرغم من أن عينة RD تظهر نتائج غير متوقعة لسجلات الأحداث HB وBPIC2020، حيث يتم توزيع درجات التنوع بشكل متساوٍ.

يشير التحليل أيضًا إلى وجود ارتباط ضعيف بين التماسك والتمثيل، خاصة بالنسبة لمقاييس التغطية cov RS – cov T وcov RD – cov T، التي تدور حول 0.3. يتم ملاحظة ارتباط أكبر لمقاييس التنوع div RS – div T وdiv RD – div T عند التقييد بالمجموعات التي تحتوي على 20 مجموعة كحد أقصى. تشير النتائج إلى أنه بينما تكون المجموعات المكونة عمومًا غير متماسكة، فإن عينة التمثيل حسب الدرجات توفر تنوعًا وتغطية أفضل من العينة العليا. يختتم المؤلفون بالقول إن المزيد من التحقيق مطلوب لفهم نقص التماسك واستكشاف مقاييس إضافية تتجاوز التنوع والتغطية في الأبحاث المستقبلية.

المناقشة

في قسم المناقشة من الورقة، يستكشف المؤلفون الإطار الخاص بتجميع نماذج العمليات المحلية (LPMs) بناءً على المنهجيات الحالية وقيودها. يبرزون أن اكتشاف LPM يجلس عند تقاطع اكتشاف العمليات التقليدي وتعدين التسلسلات، ويهدف إلى استخراج نماذج متعددة تمثل أنماط سلوكية محددة من سجلات الأحداث. بينما تم تطوير أساليب مختلفة لاكتشاف LPMs، مثل تلك التي قدمها Tax et al. (2016b) وAcheli et al. (2019)، لا تزال التحديات مثل انفجار النموذج والتكرار قائمة، خاصة في الأساليب التي لا تقوم بتصفية النماذج المماثلة بشكل كافٍ. يشير المؤلفون إلى أنه بينما تخفف بعض الطرق، مثل تلك التي اقترحها Brunings et al. (2022)، من مشكلة الانفجار، إلا أنها لا تزال تواجه مشاكل مع النماذج التكرارية وقابلية التطبيق المحدودة.

كما يناقش المؤلفون السياق الأوسع لتجميع نماذج العمليات، مشيرين إلى الأدبيات الحالية التي تتناول مقاييس تشابه النماذج وتقنيات التجميع. يؤكدون أنه بينما تعتمد العديد من الطرق على المقارنات الهيكلية أو السلوكية، فإن نهجهم يدمج بشكل فريد خصائص البيانات من سجلات الأحداث لتعزيز عملية التجميع. يهدف هذا التكامل إلى تقليل التكرار بين LPMs المماثلة من خلال الاستفادة من كل من تشابه نموذج العملية وخصائص البيانات السياقية، مما يوفر استراتيجية تجميع أكثر دقة وفعالية. يختتم المؤلفون بالقول إن إطار عملهم لا يعالج فقط تحديات تكرار النموذج، بل يفتح أيضًا آفاقًا لتحليل أكثر تطورًا لنماذج العمليات من خلال استخدام خصائص البيانات.

القيود

إطار عمل تجميع LPM، على الرغم من كونه مرنًا نظريًا في تمثيله للنماذج المسموح بها، وقياس المسافة، وخيارات التجميع، مقيد بالخيارات التنفيذية المحددة التي تم اتخاذها في هذه الدراسة. تحدد هذه القرارات التصميمية بشكل جوهري التطبيقات المحتملة للإطار وقد تقدم انحيازات مختلفة قد تؤثر على نتائج التحليلات التي تُجرى باستخدام هذا الإطار. وبالتالي، بينما يوفر الإطار نقطة انطلاق قوية لتجميع النماذج، قد تتأثر فائدته العملية بهذه القيود الجوهرية.

Journal: Journal of Intelligent Information Systems, Volume: 64, Issue: 3
DOI: https://doi.org/10.1007/s10844-025-01010-x
Publication Date: 2026-01-13
Author(s): Viki Peeva et al.
Primary Topic: Business Process Modeling and Analysis

Overview

This section discusses the concept of Local Process Models (LPMs) in process mining, highlighting their potential to reveal insights from event data through the analysis of sequences, choices, concurrency, and loops. Despite their value, LPM discovery faces challenges such as model explosion and repetition, resulting in an overwhelming number of similar models that complicate analysis. The authors argue that relying solely on top-scoring LPMs can lead to redundancy and suboptimal understanding of processes.

To address these issues, the authors propose a framework for grouping LPMs, which involves computing distances between models using established similarity measures or by analyzing the context of the events they cover. This framework clusters similar LPMs and selects a representative model from each cluster, thereby reducing redundancy while enhancing the diversity of the sample. The evaluation demonstrates that this grouping approach improves process understandability and provides a more comprehensive representation of LPM diversity across multiple event logs. The authors conclude by acknowledging the limitations of their framework and evaluation, suggesting areas for future research.

Introduction

The introduction of this research paper outlines the field of process mining, which focuses on discovering, monitoring, and improving processes using data from various management systems. The discipline is grounded in three main pillars: process discovery, conformance checking, and process enhancement (van der Aalst, 2016). As the scope of process mining expands, particularly in complex domains such as healthcare and IoT, traditional discovery techniques face challenges in generating coherent models. These techniques often lead to overly complex models, such as flower or spaghetti models, which fail to provide clear insights (Tax et al., 2016b).

To address these issues, the paper introduces Local Process Model (LPM) discovery, which aims to create smaller, more manageable models that represent fragments of behavior. However, existing LPM approaches often result in an overwhelming number of similar models, leading to model explosion and repetition. The authors propose a novel framework for selecting an optimal sample of LPMs by grouping them based on control flow and data attributes, thereby allowing for the identification of representative models that better capture the underlying process. The evaluation focuses on comparing traditional top-scoring samples with these representative samples, assessing their diversity and event coverage. The paper aims to enhance the understanding of LPMs and their applicability in real-world event logs, ultimately contributing to more effective process mining methodologies.

Methods

In this section, the authors outline the experimental setup utilized for their research, focusing on the analysis of event logs and LPM (Log Process Model) sets as detailed in Table 2. They employ five process model similarity measures alongside a data attribute measure to compute distances. For hierarchical clustering, various linkage methods are tested, including single, complete, UPGMA, and WPGMA, while iterating the number of clusters across a specified range: {2, 5, 10, 15, 20, 30, 40, 50, 100, 200}.

The comprehensive experimental design encompasses a total of 1440 experiments, accounting for all combinations of LPM sets, distance extraction methods, and clustering algorithm parameters. Due to space constraints, the authors present only a selection of results or a summary in the paper, but they provide access to the complete dataset and analysis notebooks via Zenodo and GitLab for further exploration by interested readers.

Results

In this section, the results of the comparative analysis of coverage and diversity across different sample types (Top sample, Score Representative sample, and Distance Representative sample) are presented. The analysis reveals that the coverage for the Distance Representative (RD) sample is consistently lower than that of both the Top (T) and Score Representative (RS) samples, as indicated by the negative values in the boxplots comparing cov RD – cov T and positive values for cov RS – cov RD. Notably, the RS sample generally exhibits higher coverage than the RD sample across most event logs, while the coverage difference between T and RS samples varies depending on the event log. In terms of diversity, the RS and RD samples outperform the T sample in most cases, although the RD sample shows unexpected results for the HB and BPIC2020 event logs, where diversity scores are evenly distributed.

The analysis also indicates a weak correlation between cohesiveness and representativeness, particularly for the coverage metrics cov RS – cov T and cov RD – cov T, which hover around 0.3. A more substantial correlation is observed for diversity metrics div RS – div T and div RD – div T when restricting to clusters with no more than 20 clusters. The findings suggest that while the clusters formed are generally non-cohesive, the Score Representative sample yields better diversity and coverage than the Top sample. The authors conclude that further investigation is warranted to understand the lack of cohesiveness and to explore additional measures beyond diversity and coverage in future research.

Discussion

In the discussion section of the paper, the authors explore the framework for grouping Local Process Models (LPMs) based on existing methodologies and their limitations. They highlight that LPM discovery sits at the intersection of traditional process discovery and sequence mining, aiming to extract multiple models that represent specific behavioral patterns from event logs. While various approaches have been developed to discover LPMs, such as those by Tax et al. (2016b) and Acheli et al. (2019), challenges like model explosion and redundancy persist, particularly in approaches that do not adequately filter similar models. The authors note that while some methods, like those proposed by Brunings et al. (2022), mitigate the explosion problem, they still face issues with repetitive models and limited applicability.

The authors also discuss the broader context of grouping process models, referencing existing literature that addresses model similarity measures and clustering techniques. They emphasize that while many methods rely on structural or behavioral comparisons, their approach uniquely incorporates data attributes from event logs to enhance the grouping process. This integration aims to reduce redundancy among similar LPMs by leveraging both process model similarity and contextual data attributes, thereby providing a more nuanced and effective grouping strategy. The authors conclude that their framework not only addresses the challenges of model repetition but also opens avenues for more sophisticated analysis of process models through the use of data attributes.

Limitations

The LPM grouping framework, while theoretically flexible in its representation of allowed models, distance measurement, and clustering options, is constrained by the specific implementation choices made in this study. These design decisions inherently limit the framework’s potential applications and may introduce various biases that could affect the outcomes of analyses conducted using this framework. Consequently, while the framework offers a robust starting point for model grouping, its practical utility may be influenced by these inherent limitations.