Applicability of Process Mining
Techniques in Business Environments
Annual Meeting IEEE Task Force on Process Mining
Andrea Burattin
?
andreaburattin
September 8, 2014
Brief Curriculum Vitæ
2009, M.Sc.
Computer Science (A.I. program)
University of Padova
2009 2012, Ph.D.
Supervisor: Prof. Alessandro Sperduti
Joint school University of BolognaPadova
Thesis defended on April 2013
2013 2014, Postdoc
Prompt project (prompt.processmining.it)
University of Padova
Specola, Padova.
http://flic.kr/p/cEW5bo
2 of 17
Ph.D. Inception
Ph.D background
Inception during M.Sc. thesis
? Companies: study on process mining
A company (Siav S.p.A.) funded my PhD
? Aim: investigate applicability of process
www.siav.it
mining techniques in business scenarios
? Interaction with companies: interesting! (but sometimes. . . )
Outcome
?
Applicability of Process Mining Techniques in Business
Environments
3 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
C onformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
C onformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
Conformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
C onformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources
Noise and incompleteness
5 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and conguring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
5 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and conguring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
4 Not overlapping sets
5 of 17
Possible Industry Scenarios
Four possible industry scenarios
Process aware vs. Process unaware
Process aware software vs. Process unaware software
Process Aware
Companies
Process Unaware
Companies
Company 4
Company 3
Company 1
Company 2
Process Unaware
Information Systems
Process Aware
Information Systems
6 of 17
Thesis Structure and Organization
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
6 of 17
Overview Data Preparation
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
6 of 17
Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
7 of 17
Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
Typical set of required elds
(case-id; activity; timestamp; [process-name]; [originator])
7 of 17
Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
Typical set of required elds
(case-id; activity; timestamp; [process-name]; [originator])
Our context: Company process aware; IS process unaware
Structure of available log
(activity; timestamp; originator; info1; . . . ; infon )
7 of 17
Problems with Data Preparation (cont.)
Case-id
from
infoi
elds
Candidate case-id elds
A-priori knowledge
Events chains
Strings similarity functions
Selection of
maximal
Most activities
or
chain
simplest chain
Process name is not a problem
All events belonging to the same process
8 of 17
Problems with Data Preparation (cont.)
Case-id
from
infoi
elds
Candidate case-id elds
A-priori knowledge
Events chains
Strings similarity functions
Selection of
maximal
Most activities
or
chain
simplest chain
Process name is not a problem
Act.
a
a
a
a
a
a
info1
info2
1
AB-01
BB-01
2
AA-02
AB-01
3
AB-01
BB-02
4
AB-01
BB-03
1
AA-03
BB-04
5
AA-03
BB-05
All events belonging to the same process
8 of 17
Overview Control-ow Mining
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
8 of 17
Exploiting Data Available
Sub?ac?vity 1
Sub?ac?vity 2
Start
Sub?ac?vity n?1
End
Time
Sub?ac?vity n
Main ac?vity
Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information
9 of 17
Exploiting Data Available
Sub?ac?vity 1
Sub?ac?vity 2
Start
Sub?ac?vity n?1
End
Time
Sub?ac?vity n
Main ac?vity
Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information
Process with events as ?me intervals
A
B
B
A
C
A
B
D
Time
D
C
A
D
C
B
C
D
Process with instantaneous events
9 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
10 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
Observations
Process mining algorithms require congurations
Typically, algorithm congurations are threshold on measures
The mining log is nite
Only a nite amount of congurations possible
10 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
Observations
Process mining algorithms require congurations
Typically, algorithm congurations are threshold on measures
The mining log is nite
Only a nite amount of congurations possible
We are able to discretize the parameter values
B
?1 = ?
?2 = ?
?3 = ?
?4 = ?
A
B
C
B
E
D
E
C
C
?
F
D
A
D
A
B
A
D
C
10 of 17
Model Selection Approaches
User-guided Approach
Hierarchical clustering of models
Average linkage
Any model-to-model metric
0.76
0.34
0.69
0.63
0.84
0.45
0.74
0.71
Process 3
Process 2
Process 7
Process 4
Process 6
Process 5
Process 8
Process 9
Process 1
Process 10
0.49
0
0.2
0.4
0.6
0.8
1
Navigation of the dendrogram
11 of 17
Model Selection Approaches
User-guided Approach
Automatic Approach
Hierarchical clustering of models
Average linkage
Any model-to-model metric
Hill climbing with
Maximum plateau steps
Random restarts
(Local optimum)
hMDL = arg min L(h) + L(D |h)
0.76
0.34
0.69
0.63
h ?H
0.84
0.45
0.74
0.71
Process 3
Process 2
Process 7
Process 4
Process 6
Process 5
Process 8
Process 9
Process 1
Process 10
0.49
0
0.2
0.4
0.6
0.8
Navigation of the dendrogram
1
MDL encodings
MDL by Calders et al.
Simplied heuristics
11 of 17
Overview Results Evaluation
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
11 of 17
Evaluation Metrics
Model-to-model Metric
Generation rules (based on Alpha alg.)
Complex process into
A ? B
? A > B, B ? A
Permitted relations
A k B
? A > B, B > A
A # B
? A ? B, B ? A
Forbidden relations
Comparison as Jaccard similarity on two sets (> and ?)
12 of 17
Evaluation Metrics
Model-to-model Metric
Generation rules (based on Alpha alg.)
Complex process into
A ? B
? A > B, B ? A
Permitted relations
A k B
? A > B, B > A
A # B
? A ? B, B ? A
Forbidden relations
Comparison as Jaccard similarity on two sets (> and ?)
Model-to-log Metric
Declare constraint ? and a trace ? ? healthiness measures
Activation sparsity: 1 ?
Violation ratio:
n
v (?,?)
a (?,?)
n
a (?,?)
n
n
(?)
Fulllment ratio:
Conict ratio:
n
n
f (?,?)
a (?,?)
n
c (?,?)
a (?,?)
n
12 of 17
Overview Process Extension
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
12 of 17
Multiperspective Mining
Given
Log with information on originators
Process model
Assumption
Roles are characterized by
consistent set of originators
We add roles to the model
13 of 17
Multiperspective Mining
Given
Log with information on originators
Process model
Assumption
Roles are characterized by
consistent set of originators
We add roles to the model
1
2
Dependencies as handover of roles
Remove dependencies below threshold
Connected components are candidate roles
3
Merge candidate roles if users sets
similarities above threshold
4 Entropy-based metric to tune thresholds
13 of 17
Overview Stream Control-ow Mining
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
13 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Backtracking not feasible
One pass over data
Variable system condition
Ex. uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
14 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Principle
Recent observations are more
important than older ones
Backtracking not feasible
One pass over data
Variable system condition
Ex. uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
14 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Principle
Recent observations are more
important than older ones
Backtracking not feasible
One pass over data
Variable system condition
Ex. uctuating stream rates
Adapt the model to new data
Concept drifts
3 version of Heuristics Miner
Based on Sliding Window
Based on Lossy Counting
Based on Budget Lossy
Counting
4 Completely new problems!
14 of 17
Overview
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
14 of 17
Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
15 of 17
Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
Processes and Logs Generator
Stochastic context free grammar
generates random processes
Rules to simulate a process and
produce an event log
Reference model used for evaluation
control-ow mining algorithms
astart
A
a
P
G
(G ; G )
aend
G0 " G)
(
(G ; G )
A; (G ? G ); A
b A A e
c d
A
f
A
g
15 of 17
Detailed Map of Performed Activities
Legacy, Process?unaware
Informa?on Systems
Process Mining
Capable Event Stream
Random Process
Generator
Process Mining
Capable Event Logs
Control??ow Mining Algorithm
Exploi?ng More Data
User?guided Discovery
Algorithm Con?gura?on
Stream Control??ow
Mining Framework
Data Prepara?on
Automa?c
Algorithm Con?gura?on
Event Logs Generator
Process Representa?on
(e.g. Dependency Graph, Petri Net)
Model?to?model Metric
Extension of Process Models
with Organiza?onal Roles
Model?to?log Metric
Model Evalua?on
(wrt Log / Original Model)
16 of 17
Thanks!
Doing the Ph.D. has been amazing!
A huge
Thank you!
to
My supervisor, Alessandro Sperduti
Siav S.p.A. and Roberto Pinelli
My internal examiners: Tullio Vardanega, Paolo Baldan
My external examiners: Barbara Weber, Diogo Ferreira
All the process mining community!
17 of 17
doc_664222665.pdf
Techniques in Business Environments
Annual Meeting IEEE Task Force on Process Mining
Andrea Burattin
?
andreaburattin
September 8, 2014
Brief Curriculum Vitæ
2009, M.Sc.
Computer Science (A.I. program)
University of Padova
2009 2012, Ph.D.
Supervisor: Prof. Alessandro Sperduti
Joint school University of BolognaPadova
Thesis defended on April 2013
2013 2014, Postdoc
Prompt project (prompt.processmining.it)
University of Padova
Specola, Padova.
http://flic.kr/p/cEW5bo
2 of 17
Ph.D. Inception
Ph.D background
Inception during M.Sc. thesis
? Companies: study on process mining
A company (Siav S.p.A.) funded my PhD
? Aim: investigate applicability of process
www.siav.it
mining techniques in business scenarios
? Interaction with companies: interesting! (but sometimes. . . )
Outcome
?
Applicability of Process Mining Techniques in Business
Environments
3 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
C onformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
C onformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
Conformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Incarnation / Environment
control
Operational
Model
implement
Operational
Incarnation
Information
S ystem
support
describe
(re-)design
protocol
/ audit
basis
Process Mining
Extension
Analytical
Model
analyze
augment
C onformance
compare
create
Discovery
Event Logs
compare
mine
Observation
Source: C. Günther,
Process mining in Flexible Environments . PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources
Noise and incompleteness
5 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and conguring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
5 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Dierent perspectives from
dierent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and conguring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
4 Not overlapping sets
5 of 17
Possible Industry Scenarios
Four possible industry scenarios
Process aware vs. Process unaware
Process aware software vs. Process unaware software
Process Aware
Companies
Process Unaware
Companies
Company 4
Company 3
Company 1
Company 2
Process Unaware
Information Systems
Process Aware
Information Systems
6 of 17
Thesis Structure and Organization
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
6 of 17
Overview Data Preparation
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
6 of 17
Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
7 of 17
Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
Typical set of required elds
(case-id; activity; timestamp; [process-name]; [originator])
7 of 17
Problems with Data Preparation
Problems at dierent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Dicult)
Typical set of required elds
(case-id; activity; timestamp; [process-name]; [originator])
Our context: Company process aware; IS process unaware
Structure of available log
(activity; timestamp; originator; info1; . . . ; infon )
7 of 17
Problems with Data Preparation (cont.)
Case-id
from
infoi
elds
Candidate case-id elds
A-priori knowledge
Events chains
Strings similarity functions
Selection of
maximal
Most activities
or
chain
simplest chain
Process name is not a problem
All events belonging to the same process
8 of 17
Problems with Data Preparation (cont.)
Case-id
from
infoi
elds
Candidate case-id elds
A-priori knowledge
Events chains
Strings similarity functions
Selection of
maximal
Most activities
or
chain
simplest chain
Process name is not a problem
Act.
a
a
a
a
a
a
info1
info2
1
AB-01
BB-01
2
AA-02
AB-01
3
AB-01
BB-02
4
AB-01
BB-03
1
AA-03
BB-04
5
AA-03
BB-05
All events belonging to the same process
8 of 17
Overview Control-ow Mining
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
8 of 17
Exploiting Data Available
Sub?ac?vity 1
Sub?ac?vity 2
Start
Sub?ac?vity n?1
End
Time
Sub?ac?vity n
Main ac?vity
Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information
9 of 17
Exploiting Data Available
Sub?ac?vity 1
Sub?ac?vity 2
Start
Sub?ac?vity n?1
End
Time
Sub?ac?vity n
Main ac?vity
Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information
Process with events as ?me intervals
A
B
B
A
C
A
B
D
Time
D
C
A
D
C
B
C
D
Process with instantaneous events
9 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
10 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
Observations
Process mining algorithms require congurations
Typically, algorithm congurations are threshold on measures
The mining log is nite
Only a nite amount of congurations possible
10 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
Observations
Process mining algorithms require congurations
Typically, algorithm congurations are threshold on measures
The mining log is nite
Only a nite amount of congurations possible
We are able to discretize the parameter values
B
?1 = ?
?2 = ?
?3 = ?
?4 = ?
A
B
C
B
E
D
E
C
C
?
F
D
A
D
A
B
A
D
C
10 of 17
Model Selection Approaches
User-guided Approach
Hierarchical clustering of models
Average linkage
Any model-to-model metric
0.76
0.34
0.69
0.63
0.84
0.45
0.74
0.71
Process 3
Process 2
Process 7
Process 4
Process 6
Process 5
Process 8
Process 9
Process 1
Process 10
0.49
0
0.2
0.4
0.6
0.8
1
Navigation of the dendrogram
11 of 17
Model Selection Approaches
User-guided Approach
Automatic Approach
Hierarchical clustering of models
Average linkage
Any model-to-model metric
Hill climbing with
Maximum plateau steps
Random restarts
(Local optimum)
hMDL = arg min L(h) + L(D |h)
0.76
0.34
0.69
0.63
h ?H
0.84
0.45
0.74
0.71
Process 3
Process 2
Process 7
Process 4
Process 6
Process 5
Process 8
Process 9
Process 1
Process 10
0.49
0
0.2
0.4
0.6
0.8
Navigation of the dendrogram
1
MDL encodings
MDL by Calders et al.
Simplied heuristics
11 of 17
Overview Results Evaluation
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
11 of 17
Evaluation Metrics
Model-to-model Metric
Generation rules (based on Alpha alg.)
Complex process into
A ? B
? A > B, B ? A
Permitted relations
A k B
? A > B, B > A
A # B
? A ? B, B ? A
Forbidden relations
Comparison as Jaccard similarity on two sets (> and ?)
12 of 17
Evaluation Metrics
Model-to-model Metric
Generation rules (based on Alpha alg.)
Complex process into
A ? B
? A > B, B ? A
Permitted relations
A k B
? A > B, B > A
A # B
? A ? B, B ? A
Forbidden relations
Comparison as Jaccard similarity on two sets (> and ?)
Model-to-log Metric
Declare constraint ? and a trace ? ? healthiness measures
Activation sparsity: 1 ?
Violation ratio:
n
v (?,?)
a (?,?)
n
a (?,?)
n
n
(?)
Fulllment ratio:
Conict ratio:
n
n
f (?,?)
a (?,?)
n
c (?,?)
a (?,?)
n
12 of 17
Overview Process Extension
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
12 of 17
Multiperspective Mining
Given
Log with information on originators
Process model
Assumption
Roles are characterized by
consistent set of originators
We add roles to the model
13 of 17
Multiperspective Mining
Given
Log with information on originators
Process model
Assumption
Roles are characterized by
consistent set of originators
We add roles to the model
1
2
Dependencies as handover of roles
Remove dependencies below threshold
Connected components are candidate roles
3
Merge candidate roles if users sets
similarities above threshold
4 Entropy-based metric to tune thresholds
13 of 17
Overview Stream Control-ow Mining
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
13 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Backtracking not feasible
One pass over data
Variable system condition
Ex. uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
14 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Principle
Recent observations are more
important than older ones
Backtracking not feasible
One pass over data
Variable system condition
Ex. uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
14 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Principle
Recent observations are more
important than older ones
Backtracking not feasible
One pass over data
Variable system condition
Ex. uctuating stream rates
Adapt the model to new data
Concept drifts
3 version of Heuristics Miner
Based on Sliding Window
Based on Lossy Counting
Based on Budget Lossy
Counting
4 Completely new problems!
14 of 17
Overview
Data Prepara?on
Process Mining
Capable Event Stream
Process Mining
Capable Event Logs
Control??ow Mining
Process Extension
Stream Control??ow Mining
Process Representa?on
Results Evalua?on
Model Evalua?on
14 of 17
Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
15 of 17
Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
Processes and Logs Generator
Stochastic context free grammar
generates random processes
Rules to simulate a process and
produce an event log
Reference model used for evaluation
control-ow mining algorithms
astart
A
a
P
G
(G ; G )
aend
G0 " G)
(
(G ; G )
A; (G ? G ); A
b A A e
c d
A
f
A
g
15 of 17
Detailed Map of Performed Activities
Legacy, Process?unaware
Informa?on Systems
Process Mining
Capable Event Stream
Random Process
Generator
Process Mining
Capable Event Logs
Control??ow Mining Algorithm
Exploi?ng More Data
User?guided Discovery
Algorithm Con?gura?on
Stream Control??ow
Mining Framework
Data Prepara?on
Automa?c
Algorithm Con?gura?on
Event Logs Generator
Process Representa?on
(e.g. Dependency Graph, Petri Net)
Model?to?model Metric
Extension of Process Models
with Organiza?onal Roles
Model?to?log Metric
Model Evalua?on
(wrt Log / Original Model)
16 of 17
Thanks!
Doing the Ph.D. has been amazing!
A huge
Thank you!
to
My supervisor, Alessandro Sperduti
Siav S.p.A. and Roberto Pinelli
My internal examiners: Tullio Vardanega, Paolo Baldan
My external examiners: Barbara Weber, Diogo Ferreira
All the process mining community!
17 of 17
doc_664222665.pdf