This tutorial shows how to filter an XES process log from YAWL in the process mining tool ProM. After filtering we look at the Inductive Miner, the Data Aware Heuristic Miner, and the Fuzzy Miner and compare the results with the original YAWL specification. Here are the links referred to in the video:
The zip file below contains the XES logs before and after filtering.
Welcome to our third tutorial on process mining with YAWL! So in the last video we have produced the XES file. And in this video, I will show you how to analyze this XES file with a process mining tool. And we are going to take ProM, an open-source tool by the University of Eindhoven, for this purpose. And I will put the download link in the supplementary material below.
There are also other tools than ProM and we have been using Disco and Celonis in the past but ProM is of course the easiest to get and maybe not the easiest to handle. Teaching you how to install and use ProM would be beyond this tutorial series but there is excellent material on the download page. Also there is a ProM user group and there is a MOOC by Joos Buijs which is excellent and I will also put the link in the supplementary material. So we will use the XES file from the last tutorial. And we will use ProM to work on it. So let's go! We have our XES file from the YAWL system open here. So as you can see on the top level, we have the element log. So in XES everything is a log. And a log consists of several traces. So we have the trace element here and the value 1 is the case id in YAWL. And then for this trace, we have events: event 1, event 2, and so on and so forth. So we will now load this file into ProM and filter it and do some analysis on it.
So I will start ProM here. So this is ProM when it's opened up. We are going to import our tutorial XES file. And here we will take the naive version of it because some of the other versions just don't work. Now we have our log file here and we are going to filter it in order to apply functions. We have to press the play button here and then we will filter it with something that is called heuristic filter log using simple heuristics. That's the one we need: select and start.
Now we get a number of different events or event classes here. Schedule, start, complete, and so on. And we are only interested in complete events because we don't want to look at how long someone looked at a form or whatever. We just want to see what happened in the process. So what we will do is, we will put these all to remove accept complete that's the one we keep. And then we have the start events, we press next the end events, we press next and the event filter. "e put it to 100 percent we want all the events finished. The result is a file where we have one process, eight cases, and 50 events. We still have some duplicate complete events. That is always like this in YAWL. So we want to filter those out. So we need another filter. So we go back to our lists of elements. We go to the first one and we apply another filter here. And this one is called: "filter on event attributes". "Filter log on event attribute values", that's the one we choose.
And here we have some tabs and we just go to org resource and the only org resource we have is always the same user. And so we can choose this user and remove everything if this value is not provided and that will eliminate our duplicate complete events. And we press finish. So now we have one process, eight cases, and 25 events. And now we will look at our result. So the first one will be to look at the trace variants. We look at the trace variance and you can see now that we have several variants that we can explore here. The next thing is, we go to the inductive visual miner.
And what we can see now is what has been inferred from the log. So we can see that we have eight cases here and these yellow bubbles are actually the cases that go through this diagram. The plus sign is a parallel split and then we have four cases where cancel requirement was executed. This corresponds to the list we have seen. We have four cases with "Define software development project" and four cases with "Prepare software procurement". Then we have two cases where the project has effectively been started. The other two are the ones that have been cancelled here. And then we have two cases with "Approve proposition", two cases without approval because they have been below the threshold and one case with "Prepare purchase order" from this filtered log. To start with the data aware heuristic miner. So we have to put a different selection here and we enter data and we say we take the data causal net. And down here the data aware configuration we select the estimated cost.
And now we can see that we have "Formulate requirement complete", "Prepare software procurement complete" and we can select these bubbles here below and then we can see for example here we do "Cancel requirement complete" if the estimated cost is less or equal to 95,000.
And we go to "Approve proposition complete" when the estimated cost is greater than 95,000. And this is not the exact value of our selection rule which was at hundred thousand but this is what has been inferred from the log because the original specification is unknown of course here. And in the same way, after defining software development project we have estimated cost greater 80,000 and the other path is estimated cost less or equal to 80,000. So this is another way of analyzing our log.
And last but not least, I want to show you the fuzzy miner. So again, we put in fuzzy here and we accept all the default parameters. And now we can see "Formulate requirement complete", "Defined software development complete", and so on. So this very much corresponds to our original YAWL model and of course: anything that hasn't been used from the original net can also not be in our log. And in this model we have here and the fuzzy model for example says that after "Prepare software procurement", "Approve proposition" then we have "Prepare purchase order". 25 is a good number to stop. So this concludes our regular series on YAWL tutorials. We haven't covered everything. So if you have any questions or you want tutorials on specific topics, please do not hesitate to ask us. Bye bye!