Process Discovery of Business Processes Using Temporal Causal Relation

Background: Nowadays, enterprise computing manages business processes which has grown up rapidly. This situation triggers the production of a massive event log. One type of event log is double timestamp event log. The double timestamp has a start time and complete time of each activity executed in the business process. It also has a close relationship with temporal causal relation. The temporal causal relation is a pattern of event log that occurs from each activity performed in the process. Objective: In this paper, seven types of temporal causal relation between activities were presented as an extended version of relations used in the double timestamp event log. Since the event log was not always executed sequentially, therefore using temporal causal relation, the event log was divided into several small groups to determine the relations of activities and to mine the business process. Methods: In these experiments, the temporal causal relation based on time interval which were presented in Gantt chart also determined whether each case could be classified as sequential or parallel relations. Then to obtain the business process, each temporal causal relation was combined into one business process based on the timestamp of activity in the event log. Results: The experimental results, which were implemented in two real-life event logs, showed that using temporal causal relation and double timestamp event log could discover business process models. Conclusion: Considering the findings, this study concludes that business process models and their sequential and parallel AND, OR, XOR relations can be discovered by using temporal causal relation and double timestamp event log.


I. INTRODUCTION
Nowadays, enterprise computing manages business processes which has grown up rapidly [1][2] [3]. This situation triggers the production of a massive event log that gives knowledge about the activities of business processes run recently or few years ago [4] [5]. Meanwhile, an observation of the event log, known as process mining technique, is needed to analyze a performance of the process [6][7] [8]. In process mining, the technique related to mining the event log is called process discovery [7]. Process discovery focuses on collecting the information and then obtaining the process models which will represent the behavior of activities in systems from event log [8]. Therefore, the main goal of process discovery is the business process models describing the real business processes [9] [10].
Event log and business processes are two primary input for process discovery [5]. Other than that, process models as guidance for analyzing and verifying the performance of the existing business processes are also needed [11]. In process model, activities, timestamp, and relations between those activities become main elements [12] [13]. The reasons are those three can show the exact business process. Activities define the name of activities or events which 184 are executed in business process [14]. To get the real-time of business process, we can use timestamp. In addition, business process has two types of relations; sequential and parallel relations [15]. The difference between sequential and parallel is in the term of linking the activities. In business process, to connect one activity and another activity executed afterwards we use sequential relation, whereas to link one or more activities with one or more activities performed thereafter is the definition of parallel relation [15] [16].
Generally, the business process is executed sequentially, from start to end [17]. In fact, business process can be done by performing the activities sequential and parallel [16] [17]. To know the business process is executed in sequential or parallel, one easy way to find out is to pay attention to the timestamp of each activity [17] [18]. The relation is defined as parallel if the timestamp overlaps, meanwhile, sequential relation has the timestamp which does not overlap [19]. In additional, timestamp has a close relationship with the temporal causal relation. A temporal causal relation is a pattern of event log that occurs from each activity performed in the process [16] [17]. The event log is not always executed sequentially, therefore using temporal causal relation, event log is divided into several small groups [19]. In this research, seven types of temporal causal relation between activities are presented as extended version of relations used in the double timestamp event log. With the help of time interval presented in Gantt chart, they can mine the relations of business process and can determine whether each case can be classified as sequential or parallel relations. Then to model the business process, each temporal causal relation is combined into a business process based on the timestamp of activities in the event log.
This study focuses on discovering the business process model and their seqential and parallel AND, OR, XOR relations using temporal causal relation and double timestamp event log, with the help of an extended version of Modified Time-based Alpha Miner algorithm [16] as a latest development of Alpha Miner algorithm [6] in process discovery. This research gives an alternative way to discover business process model from the event log, because some research which were did before such as Alpha, Alpha+, Alpha++, Heuristics Miner [5][6][9] [13] did not involve the temporal causal relation or time interval in their process discovery steps. For example, Alpha Miner mined the process model using the direct relations between activities [6] [13] and Heuristics Miner tried to obtain the model by calculating the dependency in the activities [6] [9]. However, although this research was successful in implementing temporal causal relation in the process discovery of business processes, this research has not involved other issues in process mining areas such as loop, non-free choice, and invisible prime tasks yet.
This research paper consists of four sections. Our discovery approach related to this research will be explained in Section 2. The experimental result and discussion will be presented in Section 3. Last, the conclusion will conclude this research paper in Section 4.

II. METHODS
In Section 2, we present main literature review and integrated discovery approach for discovering business processes used in this research.

A. Business Process Model and Event Log
To achieve a particular goal, there is a collection of interrelated structured events or work to solve a specific problem or produce a product or service, called Business Process Model [2] [5]. Some useful information which are contained in a business process are where and when, input and output, initial condition and final condition of the executed activities [15].
In this research, we evaluate two types of the event log, firstly, a business process taken from online book store management process as event log 1 (EL1) and secondly, event log from YM Company as event log 2 (EL2). Table 1 shows part of EL1 which contains Case ID, list of activities, start time, and complete time. EL1 has 100 cases and 7 activities. The activities are Choose Books (CB), Check Price (CP), Order to Seller (OtS), Create Bill (CrB), Send the Ordered Books (SOB), Cancel Order (CO) and Send the Notification (SN). For another experiment, 100 cases, 6 activities (activities A, B, C, D, E and F), start time and complete time are evaluated as EL2. Table 2 presents the fragment of EL2. EL1 and EL2 are in double timestamp event log. In the end, Petri Net as a graphical and mathematical language for the modeling of systems that present concurrency and resource sharing [10][15] [20] is used to display the final result of business process in our experiments.

B. Temporal Causal Relation
The definition of the temporal causal relation is a pattern of event log that occurs from each activity performed [16] [17]. Temporal causal relation has a close relationship with timestamp of event log. This is because the model of the temporal causal relation is based on the category of the timestamp, especially double timestamp event log to discover the process model. In this research, we present seven types of temporal causal relations between activities which are extended version of standard relations in business process. If previously there were only sequential and parallel relations, after modification, we divide the sequential relations into before and meet, whereas parallel relation becomes the_same_start_time, the_same_complete_time, overlap, contain and equal.  Table 3. As explained in Table 3, according to their time interval, all activities X and Y can be modelled and each time interval describes each temporal causal relation. After we define the types of temporal causal relation, Table 4 presents the model of the business process for each temporal causal relation. Sequential relations are before and meet, whereas parallel relations have the_same_start_time, the_same_complete_time, overlap, contain and equal.

C. Extended Version of Modified Time-based Alpha Miner Algorithm
In process discovery, many algorithms are available to mine the business processes, and Alpha Miner is one of the most well-known algorithms [6][10] [21]. This algorithm has been modified by the researchers become Alpha+ and Alpha++ Miner [6] [13]. One of the latest modification is called Modified Time-based Alpha Miner (MTBAM) [16]. The core of this algorithm is using time interval information and relations of activities from the event log in order to obtain the business processes. In [16], there are 13 steps in total to mine the business process from event log using MTBAM algorithm. However, in this research, we modify a bit of the existing algorithm and only use six steps because we are more focused on temporal causal relation in discovering the business process. The steps are: 1. Determine the temporal causal relation and create the Gantt chart for all cases in the event log (EL) 2. Generate business process based on the Gantt chart for all cases, including the relations between all activities. The steps to model the business process: -Generate a set of transition (TL) Overlay all business processes formed by Gantt chart into one complete business process 4. Define all relations of business process, i.e. sequential, parallel 5. Define the type of parallel relations, i.e. XOR, OR, AND

D. Integrated Discovery Approach
The discovery approach for the overall business process is shown in Fig. 1. This method uses the event log (EL) as an input process. For all cases, we need to determine the temporal causal relation. Then, they are presented in Gantt chart form [1][5] [15]. Gantt chart will show the relations of activities. Next, a process model is created based on the Gantt chart for all cases in Petri Net [3][20] [22]. We need to create the transition, input, output and place for all the process model [23]. After all the process models are formed, we overlay them into one complete process model. Therefore, in the end, we only have a business process including the relations; sequential and parallel (AND, OR or XOR).

III. RESULTS
We do the experiments which will present the results that our proposed method can mine business processes using Temporal Causal Relation and double timestamp event log, with the help of MTBAM algorithm by only using half of its all steps in this section. As mentioned in Section 2.1, two real-life event logs are tested and analyzed in this experiment. Event logs which are generated from organizations are in the double timestamp, i.e. EL1 and EL2. From event log EL1 presented in Table 1, we have 100 cases and 7 activities. Based on Step 1, we need to determine the types of temporal causal relation which are used for all cases. Table 5 presents the result of temporal causal relation for all activities in EL1. This event log only consists of before and meet for all cases.
Meanwhile, we have 100 cases and 6 activities based on EL2 in Table 2. It is same as Step 1 of EL1, and we need to determine the types of temporal causal relation. EL2 consists of temporal causal relation before, overlap and contain as explained in Table 6.
After all temporal causal relations from the EL1 and EL2 were obtained, we need to create the Gantt chart for all cases. Gantt charts are used to present the exact timestamp of each case. They should represent the results of  Fig. 2 and Fig. 3 present the Gantt charts of EL1 and EL2 for case IDs ID001, ID100 and P001, P100 respectively. Next step, we model the business process including the relations between activities based on the temporal causal relation and Gantt chart. To model the business process, we need to create the transition and place, and also determine the input and output of activities. Based on EL1, activity Choose Books as input and activity Send the Notification as output. Meanwhile for EL2, activity A as input and activity F as output. Fig. 4 shows the discovered business process model for EL1; Case ID ID001 and ID100, whereas for EL2; Case ID P001 and P100 are shown in Fig. 5. Fig. 4 explains that all relations in all cases of EL1 are sequential, whereas EL2 has sequential and parallel relations as shown in Fig. 5.  Fig. 4 Discovered business process models for Case ID ID001 and ID100 of EL1 based on Table 5 Fig. 5 Business processes for Case ID P001 and P100 of EL2 based on Table 6 Effendi, & Nuzulita Journal of Information Systems Engineering and Business Intelligence, 2019, 5 (2), 183-194 191 After we obtain the discovered business processes for each case of EL1 and EL2, we have to overlay all the identified business processes into one business process model which represents the overall activities and relations of EL1 and EL2. Fig. 6 shows the final result of business process model after Step 6 in Section 2.3 is executed. We get the complete activities and relations between activities of EL1. For EL2, activities and their relations are presented in Fig. 7. Fig. 6 The business process of EL1 discovered by using the proposed method After the business processes for each case of EL1 and EL2 are obtained using temporal causal relation and extended version of MTBAM, we get the complete activities and relations of EL1. For sequential relations, there is no problem because almost all the relations in the business processes are executed after one activity was done. However, the parallel relations also need to be defined whether they are categorized as AND, OR or XOR. To define the parallel, we can use Table 5 and Table 6 to know the exact type of parallel. Also, we also notice what activities are in parallel from the Gantt charts in Fig. 2 and Fig. 3.
The last step of discovering the business process is to determine the type of parallel relations, whether including parallel XOR, OR or AND from the discovered business process. To determine the parallel relations, we use Eq. (1), Eq. (2), and Eq. (3). The frequencies of sequential and parallel relations which are calculated from all cases of EL1 are activity Choose Books (CB) and activity Check Price (CP), activity Check Price (CP) and activity Order to Seller (OtS), activity Order to Seller (OtS) and activity Create Bill (CrB), activity Create Bill (CrB) and activity Send the Ordered Books (SOB), activity Create Bill (CrB) and activity Cancel Oder (CO), and activity Send the Ordered Books (SOB) and activity Send the Notification (SN), and activity Cancel Order (CO) and activity Send the Notification (SN) have 100, 100, 100, 64, 36, 64 and 36 respectively.
Whereas for EL2, the frequency of sequential and parallel relations are activity A and activity B have 51 relations, activity C and activity D have 100 relations, activity B and activity D have 67 relations, activity C and activity E have 67 relations, activity D and activity F have 100 relations, activity E and activity F also have 100 relations, activity A and activity C have 49 relations, and activity B and activity E have 50 relations. All of them are in sequential relations. Meanwhile, for activities B and C, activities D and E, activities C and B, activities E and D have 51, 67, 33, 33 respectively. They are in parallel relations. Next, we calculate minimum all sequential relations, average parallel relation and average all sequential relations. Table 7 presents the results of EL1 and Table 8 explains the results of EL2 respectively.
Based on data presented in Table 7, EL1 has parallel XOR. The reason is that the value of Avg PPM is less than the value of Min ASR. This matters in accordance with the Eq. (1). Meanwhile, parallel OR and AND are generated as the parallel relations of EL2. From Table 8 we get the information that for activities B and C, the value of Avg PPM is higher than the value of Min ASR, but the value of Avg PPM is less than the value of Avg ASR. Based on Eq. (2), the relation is OR. Similar to activities B and C, activities D and E are also has the value of Avg PPM higher than the value of Min ASR, but has the same value with the Avg ASR. So, the final relation of activities D and E is AND following the Eq. (3).   The last step, we model the parallel relations in Petri Net form into the discovered business process of EL1 and EL2. Fig. 8 presents the final result of business process model of EL1 including XOR, whereas the business process model of EL2 with AND and OR shown in Fig. 9.

V. CONCLUSIONS
This research focused on introducing a new approach to mine the business processes using temporal causal relation and double timestamp event log, with the help of MTBAM algorithm in discovering business processes. The proposed method defined seven temporal causal relations based on the existing relations, which are only sequential and parallel. Each temporal causal relation was presented in a Gantt chart to show the relations of activities.
Besides, four business process models in the Petri Net were generated to model the seven temporal causal relations. After all the business process models were formed, they were combined into one business process based on the timestamp of the event log. Therefore, in the end, there was one business process with all sequential and parallel (XOR, OR, AND) relations. Based on our experiments using two real-life event logs, our proposed method worked on mining the business processes as well as their relations.
Although this research was successful in implementing temporal causal relation in the process discovery of business processes, this research has not involved other issues in process mining areas such as loop, non-free choice, and invisible prime tasks yet. Further research will try to perform temporal causal relation in different process discovery algorithms such as heuristics miner, fuzzy miner, and genetic algorithm.