Conformance Checking of Dwelling Time Using a Token-based Method

Background: Standard operating procedure (SOP) is a series of business activities to achieve organisational goals, with each activity carried to be recorded and stored in the information system together with its location (e.g., SCM, ERP, LMS, CRM). The activity is known as event data and is stored in a database known as an event log. Objective: Based on the event log, we can calculate the fitness to determine whether the business process SOP is following the actual business process. Methods: This study obtains the event log from a terminal operating system (TOS), which records the dwelling time at the container port. The conformance checking using token-based replay method calculates fitness by comparing the event log with the process model. Results: The findings using the Alpha algorithm resulted in the most traversed traces (a, b, n, o, p). The fitness calculation returns 1.0 were produced, missing, and remaining tokens are replied to each of the other traces. Conclusion: Thus, if the process mining produces a fitness of more than 0.80, this shows that the process model is following the actual business process.


INTRODUCTION
In the era of big data, almost all data is stored in the information system. Process mining research is currently growing to control flow and monitoring [1]- [3]. Monitoring is done based on event logs stored in the information system [4]. A container port's information system records all transaction processes. The dwelling time is when a container goes down from a docked vessel until it leaves the port. Dwelling time logs are also stored in the information system. Exploring these data using process mining can help measure a port's performance [5]. As such, we can form the process model [6]- [8].We can then do conformance checks to perform audits, as well as compliance checks by replaying the event logs on a process model. Thus, we can measure the performance and visualise the deviations. We can also detect bottlenecks and build predictive models [9], [10].This is necessary because high logistics costs and dwelling time [11] will reduce a port's competitiveness.
The research aims to examine container dwelling time by using process mining to measure a port performance measure. The application of process mining in various case studies have been tested and challenges [10], [12]. However, it has not been tested in the case of container port, where a cross-organisational mining process is at place. Therefore, we perform conformance checking [13] on the dwelling time at a container port. We continue the previous study (i.e., data extraction, discovery) [5] and use one of the process mining techniques, namely conformance checking, which compares the standard operating procedure (SOP) with the process model generated from the event logs.

II. LITERATURE REVIEW
Process mining is research that is engaged between machine learning and data mining and is common in business intelligence environment [14], [15]. Process mining aims to find, monitor and improve the actual process by extracting knowledge from event logs available in the information system. Process mining compare event data (i.e., observed behaviour) and process models (handmade or automatic discovery). Process models can also be used to describe the realignment of what is happening. With the increasing abundance of event data and due to the limitations of handmade models, a process model needs to be created [4], [7], [16]. As such, we can discover the actual processes and the existing process models can be evaluated and enhanced.
Process model is used for [7] verification, to find errors in the system or procedure (e.g., potential deadlocks, bottlenecks); and performance analysis, to understand the multiple factors that affect response times, service levels, etc. Fig. 1 The BPM life-cycle shows the difference in the use of the process model Fig. 2 The placement of the three main types of process mining Fig. 1, Business Process Management (BPM) life-cycle, shows the difference between a process model and the lifec in a business process [7]. Fig. 2 (F. Daniel 2011) shows the three main stages in process mining: a) discovery by finding process data from the event logs; b) conformance by comparing the actual process with process rules; and c) enhancement of process model capabilities. Conformance checking is used to check whether reality is recorded in the event logs, confirmation on the model, or vice versa [4], [7], [9], [17], [18].

III. METHODS
The research methodology that is shown by Fig. 3 is adopted from the L * life-cycle proposed by van der Aalst with a few changes, a model is describing a process mining project. The method consisting of five steps:  Step 0: Planning and justifying Process mining projects aim for process improvement, specifically, specific KPIs, for example, cost reductions or shorter response times;  Step 1: Extraction The event log has two main conditions, namely (a) events must be ordered by time, and (b) events must be linked (that is, each event needs to refer to a specific case).  Step 2: Creating a control flow model and relate it to the event log After completing step 1, the control flow model is closely linked to the event log. That is, the events in the record refer to the activities in the model. If the model fit and the logs are low (e.g., below 0.8), it is not easy to move to Stage 3. However, by definition, it should not be a problem in terms of being used for the Lasagna process.  Step 3: Creating an integrated process model Model capabilities are enhanced by adding perspectives to the control flow model perspective (e.g., organisation, case and time).  Step 4: Operational support Step 4 of the L * life cycle relates to three operational support activities, namely to: detect, predict, and recommend. In this study, we use conformance checking [19], [20] to compare the SOP with the process model generated from the event logs. For data processing, we use PM4PY [21] to support a series of process mining techniques (discovery, conformance checking, and enhancement).

Fig. 3 Research method
A. Process mining α-algorithm The Petri network in [7] is used to create a business process model within the BPM. Petri network is also included to make decision-making models. The Petri network is a bipartite graph consisting of places and transitions [7]. The network structure is static but can be governed by token tagging rules. We can route the tokens across the network. The status of the Petri network is determined by distributing tokens that pass through the place and are marked.
A Petri net is a tuple (P, T, F) where P is a definite sequence of places, T is a definite series of transitions with P∩T = ∅, and F ⊆ (P × T) ∪ (T × P) is a circuit directed by flowing relationships. The sign in the Petri Net is a pair (N, M), where N = (P, T, F) is the Petri Net and M∈B (P) is the multi-set passing P that represents the sign of the network. .  Option to complete: for each designation.  Absence of dead parts: (N, [i]) in which there are no broken transitions (e.g., for every t ∈T, a sequence t is possible). Alpha Algorithm [7] is the first approach in determining the branching, namely explicit causal dependence and parallel assignment. The basic idea of the alpha algorithm (α-algorithm) is as follows. The α-algorithm input is a record of events L passing through A. The various activities are denoted by A. The output of the α-algorithm is a tag on the Petri Network α (L) = (N, M) on the WF-net, and then the initial marking can be removed and rewritten as α (L) = N, the initial marker referred to as M = (i]. The α-algorithm will read the event logs with a specific pattern. For example, if activity a is followed by activity b, but b is never followed by a, then it is assumed that there is a cause of causal dependence between a and b. The reflection of this dependence on the Petri Network relationship, then we must place the link between a to b. Four patterns that fit in the log are defined and are known to distinguish the causes of the association. Definition of sequential relationships is based on event records, using four patterns to form a process model [7]. Thus, the business process model is constructed based on the relationship of the sequence pattern of events. B. Token-based replay Equation (1) is to calculate the fitness on the WF-net for each trace by observing the tokens where (produced tokens), (consumed tokens), (missing tokens), and (remaining tokens).
To define the fitness on the WF-net across the event log on the WF-net, the following (2) is used:

C. Handmade Model as an initial Model Process
Process domain model, through a handmade model, describes the SOP. It is necessary to get a specific case that will be analysed. Accordingly, we can determine the initial and final activities. Fig. 4 below is the process dwelling time at the container port.

IV. RESULTS
This study [5] carried out the stages of the L* life-cycle model (W M P der Aalst 2011) to discover (mining Lasagna processes) at a terminal container port to explore the dwelling time. From Table 1 about the dwelling time activities, there are 16 activities discovered, starting from discharge, when the container is dropped from the ship until it leaves the container port. We extracted data from previous research [5], [22], then we make data selection according to the analysis's needs. The attributes used include case, label, and Time Stamp. The Algorithm 1 is below. Then the data filtering stage is carried out based on start activities and end activities in the import dwelling time process. The initial action in the import dwelling time process begins with activities labelled "A" and the last activities with the label "P". The filtering stage uses the following Algorithm 2. Then we convert the dataset into log format, for later the discovery process is carried out using the alpha miner algorithm, with the following Algorithm 4. from pm4py import converter as log converter In this paper, a fitness calculation is shown using the example of the one ABCNOP trace in table 2, and firing from A. In Fig. 5a, one token p=1 is produced starting from start to run activity A. The results obtained are p=0, c=0, m=0, r=0 and p=1, c=0, m=0, r=0. In Fig. 5b the token passes through A to get the next sequence, which is B so that the token has been produced to p = 2 and consumed by A, which is c = 1. The results obtained are p=2, c=1, m=0, r=0. After B is passed in Fig. 5c. To carry out the following sequence, namely C, where the token's position is before C, three tokens have been produced, namely p = 3 and 2 tokens have been consumed, c = 2. The results obtained are p=3, c=2, m=0, r=0. The next step is Fig. 5d. That N then the token is positioned before because C must be passed, the token position passes C, thus increasing the token production to p = 4 and token consumption to c = 3. The results obtained are p=4, c=3, m=0, r=0. The next step in Fig. 5e. is O, then p = 5 is produced and token c = 4 is consumed, while the token is positioned after C doesn't move, then it becomes rest (remain), with the result r = 1. The results obtained are p=5, c=4, m=0, r=1. And the last activity Fig,5f., must be run is P, where the token position is before P. Thus p = 6, token consumption c = 5, and the remain token is still 1, namely r = 1. The results obtained are p=6, c=5, m=0, r=1. Fig.5g. The last replay of tokens in the ABCNOP process model is obtained p = 7, c = 7, and r = 1. The last results obtained are p=7, c=6, m=0, r=1 and p=7, c=7, m=0, r=1. Thus, conformance checking in this study compares the process model with the event log of the same process. Its purpose is to check whether the event log fits the model and vice versa.
In this research, conformance checking is done using token-based replay. The token-based reproduction will match the trace with the process model, starting from the initial activity to find which transitions were executed and which actions were missing tokens for a given process. The token-based replay will calculate the fitness value with a threshold of 1, and if the fitness value is close to 1, then the variant is close to perfect. And if the fitness value is away from 1, then the variant of the process is not comparable to excellent. Conformance checking with token-based replay using the following Algorithm 5. The results of conformance checking with token-based replay in this study can be seen in Table 3. from pm4py import algorithm as token replay replayed traces = apply (log 1, net, initial marking, final marking) replayed traces Algorithm 5. Conformance using Token-based replay  The mining process results using the alpha miner Algorithm 6. will be visualised in a Petri net to display the structure contained in the dataset. The results of the Petri net can be seen in Fig. 6. from pm4py import algorithm as alpha miner net, initial marking, final marking = alpha miner apply (log 1) The final evaluation stage in this research is to compare the results of the behaviour contained in the event log and the behaviour included in the process model to see the compatibility, and replay fitness is used. The goal is to calculate how the process model receives much behaviour in the event log. The evaluation of the event log and process model is shown with the following Algorithm 7. The results of this evaluation are shown in Table 4.

VI. CONCLUSIONS
Based on the L* life-cycle methodology used in this study, the process mining using Alpha miner is carried out to produce a process model. The process model is a control flow of activities running on the container port. The result of the fitness calculation is more than 0.8, which is 1.0, which indicates that the resulting control flow is good.
From the results of the fitness calculation, it is found that the process model and event logs match, which means that the process model is following the reality of transactions recorded in the event log. Thus, the event log corresponds to the resulting process model. This study focuses on process discovery techniques using the Alpha miner algorithm. So, in the following research development, process discovery can be made using Alpha plus, which can overcome short loops.