Project Report on Process Mining Techniques in Business Environments

Applicability of Process Mining
Techniques in Business Environments
Annual Meeting IEEE Task Force on Process Mining

Andrea Burattin
?

andreaburattin

September 8, 2014

Brief Curriculum Vitæ

2009, M.Sc.
Computer Science (A.I. program)
University of Padova
2009  2012, Ph.D.
Supervisor: Prof. Alessandro Sperduti
Joint school University of BolognaPadova
Thesis defended on April 2013
2013  2014, Postdoc
Prompt project (prompt.processmining.it)
University of Padova

Specola, Padova.

http://flic.kr/p/cEW5bo

2 of 17

Ph.D. Inception

Ph.D background
Inception during M.Sc. thesis
? Companies: study on process mining

A company (Siav S.p.A.) funded my PhD
? Aim: investigate applicability of process

www.siav.it
mining techniques in business scenarios
? Interaction with companies: interesting! (but sometimes. . . )

Outcome
?

Applicability of Process Mining Techniques in Business
Environments

3 of 17

Quick Recap of Process Mining
Imagination

Incarnation / Environment
control

Operational
Model

implement

Operational
Incarnation

Information
S ystem
support

describe
(re-)design

protocol
/ audit
basis

Process Mining
Extension

Analytical
Model

analyze

augment

C onformance
compare

create

Discovery

Event Logs
compare

mine

Observation
Source: C. Günther,

Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Quick Recap of Process Mining
Imagination

Incarnation / Environment
control

Operational
Model

implement

Operational
Incarnation

Information
S ystem
support

describe
(re-)design

protocol
/ audit
basis

Process Mining
Extension

Analytical
Model

analyze

augment

C onformance
compare

create

Discovery

Event Logs
compare

mine

Observation
Source: C. Günther,

Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Quick Recap of Process Mining
Imagination

Incarnation / Environment
control

Operational
Model

implement

Operational
Incarnation

Information
S ystem
support

describe
(re-)design

protocol
/ audit
basis

Process Mining
Extension

Analytical
Model

analyze

augment

Conformance
compare

create

Discovery

Event Logs
compare

mine

Observation
Source: C. Günther,

Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Quick Recap of Process Mining
Imagination

Incarnation / Environment
control

Operational
Model

implement

Operational
Incarnation

Information
S ystem
support

describe
(re-)design

protocol
/ audit
basis

Process Mining
Extension

Analytical
Model

analyze

augment

C onformance
compare

create

Discovery

Event Logs
compare

mine

Observation
Source: C. Günther,

Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Theoretical vs. Industrial-related Open Problems
Some literature open problems

Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources

Noise and incompleteness

5 of 17

Theoretical vs. Industrial-related Open Problems
Some literature open problems

Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources

Noise and incompleteness

Case studies open problems

Using process mining tools
and conguring algorithms
Results interpretation
Readable results

Computational power and
storage capacity required

5 of 17

Theoretical vs. Industrial-related Open Problems
Some literature open problems

Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources

Noise and incompleteness

Case studies open problems

Using process mining tools
and conguring algorithms
Results interpretation
Readable results

Computational power and
storage capacity required
4 Not overlapping sets
5 of 17

Possible Industry Scenarios
Four possible industry scenarios
Process aware vs. Process unaware
Process aware software vs. Process unaware software
Process Aware
Companies

Process Unaware
Companies

Company 4

Company 3

Company 1

Company 2

Process Unaware
Information Systems

Process Aware
Information Systems
6 of 17

Thesis Structure and Organization
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

6 of 17

Overview  Data Preparation
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

6 of 17

Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)

7 of 17

Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
Typical set of required elds

(case-id; activity; timestamp; [process-name]; [originator])

7 of 17

Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
Typical set of required elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware
Structure of available log

(activity; timestamp; originator; info1; . . . ; infon )
7 of 17

Problems with Data Preparation (cont.)

Case-id

from

infoi

elds

Candidate case-id elds
A-priori knowledge
Events chains
Strings similarity functions
Selection of

maximal

Most activities

or

chain
simplest chain

Process name is not a problem
All events belonging to the same process

8 of 17

Problems with Data Preparation (cont.)

Case-id

from

infoi

elds

Candidate case-id elds
A-priori knowledge
Events chains
Strings similarity functions
Selection of

maximal

Most activities

or

chain
simplest chain

Process name is not a problem

Act.

a
a
a
a
a
a

info1

info2

1

AB-01

BB-01

2

AA-02

AB-01

3

AB-01

BB-02

4

AB-01

BB-03

1

AA-03

BB-04

5

AA-03

BB-05

All events belonging to the same process

8 of 17

Overview  Control-ow Mining
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

8 of 17

Exploiting Data Available
Sub?ac?vity 1
Sub?ac?vity 2

Start

Sub?ac?vity n?1
End

Time

Sub?ac?vity n

Main ac?vity

Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information

9 of 17

Exploiting Data Available
Sub?ac?vity 1
Sub?ac?vity 2

Start

Sub?ac?vity n?1
End

Time

Sub?ac?vity n

Main ac?vity

Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information

Process with events as ?me intervals
A

B

B

A

C

A

B

D
Time

D

C
A

D
C

B

C

D

Process with instantaneous events

9 of 17

Not-expert Users
Our users: not-expert in process mining, with notions of BPM

10 of 17

Not-expert Users
Our users: not-expert in process mining, with notions of BPM
Observations

Process mining algorithms require congurations

Typically, algorithm congurations are threshold on measures
The mining log is nite

Only a nite amount of congurations possible

10 of 17

Not-expert Users
Our users: not-expert in process mining, with notions of BPM
Observations

Process mining algorithms require congurations

Typically, algorithm congurations are threshold on measures
The mining log is nite

Only a nite amount of congurations possible

We are able to discretize the parameter values
B

?1 = ?
?2 = ?
?3 = ?
?4 = ?

A

B

C

B
E

D

E

C

C

?

F

D

A

D

A

B
A
D
C

10 of 17

Model Selection Approaches
User-guided Approach

Hierarchical clustering of models
Average linkage
Any model-to-model metric
0.76

0.34

0.69

0.63

0.84

0.45

0.74
0.71

Process 3

Process 2

Process 7

Process 4

Process 6

Process 5

Process 8

Process 9

Process 1

Process 10

0.49

0

0.2

0.4

0.6

0.8

1

Navigation of the dendrogram
11 of 17

Model Selection Approaches
User-guided Approach

Automatic Approach

Hierarchical clustering of models
Average linkage
Any model-to-model metric

Hill climbing with
Maximum plateau steps
Random restarts
(Local optimum)

hMDL = arg min L(h) + L(D |h)

0.76

0.34

0.69

0.63

h ?H

0.84

0.45

0.74
0.71

Process 3

Process 2

Process 7

Process 4

Process 6

Process 5

Process 8

Process 9

Process 1

Process 10

0.49

0

0.2

0.4

0.6

0.8

Navigation of the dendrogram

1

MDL encodings
MDL by Calders et al.
Simplied heuristics

11 of 17

Overview  Results Evaluation
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

11 of 17

Evaluation Metrics
Model-to-model Metric
Generation rules (based on Alpha alg.)
Complex process into
A ? B
? A > B, B ? A
Permitted relations
A k B
? A > B, B > A
A # B
? A ? B, B ? A
Forbidden relations
Comparison as Jaccard similarity on two sets (> and ?)

12 of 17

Evaluation Metrics
Model-to-model Metric
Generation rules (based on Alpha alg.)
Complex process into
A ? B
? A > B, B ? A
Permitted relations
A k B
? A > B, B > A
A # B
? A ? B, B ? A
Forbidden relations
Comparison as Jaccard similarity on two sets (> and ?)
Model-to-log Metric

Declare constraint ? and a trace ? ? healthiness measures
Activation sparsity: 1 ?
Violation ratio:

n

v (?,?)
a (?,?)

n

a (?,?)

n

n

(?)

Fulllment ratio:
Conict ratio:

n

n

f (?,?)
a (?,?)

n

c (?,?)
a (?,?)

n

12 of 17

Overview  Process Extension
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

12 of 17

Multiperspective Mining
Given
Log with information on originators
Process model

Assumption
Roles are characterized by

consistent set of originators

We add roles to the model

13 of 17

Multiperspective Mining
Given
Log with information on originators
Process model

Assumption
Roles are characterized by

consistent set of originators

We add roles to the model
1
2

Dependencies as handover of roles
Remove dependencies below threshold

Connected components are candidate roles

3

Merge candidate roles if users sets
similarities above threshold

4 Entropy-based metric to tune thresholds
13 of 17

Overview  Stream Control-ow Mining
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

13 of 17

Stream Context
Stream Mining Peculiarities

Cannot store the entire stream
Approximation

Backtracking not feasible
One pass over data

Variable system condition

Ex. uctuating stream rates

Adapt the model to new data
Concept drifts

4 Completely new problems!
14 of 17

Stream Context
Stream Mining Peculiarities

Cannot store the entire stream
Approximation

Principle

Recent observations are more
important than older ones

Backtracking not feasible
One pass over data

Variable system condition

Ex. uctuating stream rates

Adapt the model to new data
Concept drifts

4 Completely new problems!
14 of 17

Stream Context
Stream Mining Peculiarities

Cannot store the entire stream
Approximation

Principle

Recent observations are more
important than older ones

Backtracking not feasible
One pass over data

Variable system condition

Ex. uctuating stream rates

Adapt the model to new data
Concept drifts

3 version of Heuristics Miner
Based on Sliding Window
Based on Lossy Counting
Based on Budget Lossy
Counting

4 Completely new problems!
14 of 17

Overview
Data Prepara?on

Process Mining
Capable Event Stream

Process Mining
Capable Event Logs

Control??ow Mining

Process Extension

Stream Control??ow Mining

Process Representa?on

Results Evalua?on

Model Evalua?on

14 of 17

Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)

15 of 17

Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
Processes and Logs Generator

Stochastic context free grammar
generates random processes
Rules to simulate a process and
produce an event log
Reference model used for evaluation
control-ow mining algorithms

astart
A
a

P
G
(G ; G )

aend
G0 " G)

(
(G ; G )
A; (G ? G ); A
b A A e
c d

A
f

A
g

15 of 17

Detailed Map of Performed Activities
Legacy, Process?unaware
Informa?on Systems

Process Mining
Capable Event Stream

Random Process
Generator

Process Mining
Capable Event Logs

Control??ow Mining Algorithm
Exploi?ng More Data

User?guided Discovery
Algorithm Con?gura?on

Stream Control??ow
Mining Framework

Data Prepara?on

Automa?c
Algorithm Con?gura?on

Event Logs Generator

Process Representa?on
(e.g. Dependency Graph, Petri Net)

Model?to?model Metric

Extension of Process Models
with Organiza?onal Roles

Model?to?log Metric

Model Evalua?on
(wrt Log / Original Model)

16 of 17

Thanks!
Doing the Ph.D. has been amazing!
A huge

Thank you!

to

My supervisor, Alessandro Sperduti
Siav S.p.A. and Roberto Pinelli
My internal examiners: Tullio Vardanega, Paolo Baldan
My external examiners: Barbara Weber, Diogo Ferreira
All the process mining community!

17 of 17



doc_664222665.pdf
 

Attachments

Back
Top