Description
There are data mining methodologies for business intelligence (DM-BI) projects that highlight the importance of planning an ordered.
Please use the following format when citing this chapter:
Britos, P., Dieste, O. and García-Martínez, R., 2008, in IFIP International Federation for Information
Processing, Volume 274; Advances in Information Systems Research, Education and Practice; David
Avison, George M. Kasper, Barbara Pernici, Isabel Ramos, Dewald Roode; (Boston: Springer), pp.
Requirements Elicitation in Data Mining for
Business Intelligence Projects
Paola Britos
1
, Oscar Dieste
2
and Ramón García-Martínez
3
1
Software and Knowledge Engineering Center. Buenos Aires Institute of Technology, AR
2
Empirical Software Engineering Research Group, Polytechnic University of Madrid, ES
3
Intelligent Systems Laboratory. Engineering School. University of Buenos Aires, AR
{pbritos, rgm}@itba.edu.ar, [email protected]
Abstract: There are data mining methodologies for business intelligence (DM-BI) projects
that highlight the importance of planning an ordered, documented, consistent and traceable
requirement’s elicitation throughout the entire project. However, the classical software en-
gineering approach is not completely suitable for DM-BI projects because it neglects the
requirements specification aspects of projects. This article focuses on identifying concepts
for understand DM-BI project domain from DM-BI field experience, including how re-
quirements can be educed by a proposed DM-BI project requirements elicitation process
and how they can be documented by a template set.
1. Introduction
A Data Mining for Business Intelligence (DM-BI) methodology seeks to organize the pat-
tern discovery process in the data warehouse of an organization. These methodologies con-
sider requirements specification as one of the early activities of the project (Chapman et al.
2000; Pyle 2003). Similarly, requirements are an important phase in software engineering
methodologies (IEEE, 1993; Winter & Strauch 2004; Maiden et al. 2004, 2007; Solheim et
al. 2005; Jiang & Eberlein 2007).
Several authors (Winter & Strauch 2002; Silva & Freire 2003; Yang and Wu, 2006) have
addressed the need to improve DM-BI methodologies, but they focuses on DM-BI goals de-
finition and DM-BI tasks specification as exploratory data analysis and develop tools for
DM-BI process documentation, model-building, and pattern-finding. The DM-BI commu-
nity has neglected the requirements specification aspects of projects, failing to identify any
technique to elicit necessary knowledge or suggest any template for systematic documenta-
tion of requirements.
In order to explore how to minimize the impact of the presented problems this research fo-
cuses on an approach based on: understanding the DM-BI project’s domain, knowing the
DM-BI project’s data domain, understanding the DM-BI project’s scope, identifying the
needed human resources, and selecting the appropriate DM-BI tool. The approach also
looks to specify documentation tools for required information of DM-BI projects.
In this paper we present related research on DM methodologies addressing the problem
(section 2); a solution approach is introduced (section 3); a proposed method for require-
ments engineering in DM-BI projects is developed (section 4); focusing on process and
139–150.
P. Britos, O. Dieste and R. García-Martínez 140
products, an example of a real case based use of templates is drawn (section 5); a discussion
of the strengths of the proposed process is presented (section 6); and some conclusions are
drawn (section 7).
2. Current Methodologies for DM-BI
The DM-BI literature on requirements elicitation identifies concepts related to how to ex-
tract, transform, aggregate, and discover business patterns in organization data. Moreover,
these activities should be performed based on a concise dimensional schema. In this con-
text, stakeholders and requirements engineers work together to identify what and where to
look within organization data sources, in order to provide the bases for discovering business
patterns for business improvement. The requirement elicitation process is addressed by
most commonly used data mining (DM) methodologies (Chapman et al, 2000; Pyle, 2003,
SAS, 2008). DM methodologies state the necessity of business understanding as the starting
point for any DM project.
The CRISP (cross industry standard for data mining) methodology (Chapman et al, 2000)
consists of four levels of abstraction, hierarchically organized from general tasks to specific
cases. The process is divided into six phases, each one having many general tasks of second
level or sub phases. General tasks are projected to specific ones, where the actions that must
be developed for specific situations are described. As a consequence, we find a general task
“cleaning data” in second level; then in third level, those tasks that must be developed for a
specific case, as for example “cleaning numerical data” or “cleaning categorical data”. In
the fourth level, groups of actions, decisions and results about the specific data mining pro-
ject are collected. The CRISP-DM methodology presents two different documents as a tool
for assisting during the development of the data mining project: the reference model and the
user’s guide. The document model of reference describes, in a general way, the phases,
general tasks and exit-points of a data mining project. The user’s guide brings detailed in-
formation about practical application of the model of reference to specific data mining pro-
jects; it also gives advices and check-lists about each phase’s tasks.
The methodology P
3
TQ (Product, Place, Price, Time, and Quantity) consists of two parts
(Pyle, 2003): [a] Modeling (PI): provides a step-by-step guide to develop and to build a
model to address a business problem or opportunity. Modeling depends very much on the
business circumstances that prompt the modeling in the first place, as indicated by the five
different entry scenarios to PI. Largely, PI provides lists of actions that must be completed,
depending on circumstances; and Data Mining (PII): provides a step-by-step guide to
mining the data to produce the required model as identified in PI. Data Mining consists of a
series of stages that have to be completed in order. Unlike modeling in which several tasks
may take place at the same time, mining has to proceed from stage to stage. Each part is
based on four types of “activity boxes"; action boxes: indicate one or more required “next
steps” for you to take; discovery boxes: provide exploratory actions that you need to take to
decide what to do next; technique boxes: provide supplemental information about the rec-
ommended steps to be described in the action or discovery boxes; and example boxes: gives
a detailed description of how to use a specific technique, along with pointers to an excel
worksheet.
SEMMA (Sample, Explore, Modify, Model and Assess) is a methodology oriented to se-
lect, explore and model a great amount of data; looking to discover business patterns in the
data (SAS, 2008). The process begins with the extraction of sample data on which analysis
Requirements Elicitation in Data Mining for Business Intelligence Projects 141
is going to be applied. Once the sample is selected, the methodology proposes to explore
the data in order to simplify the model. The third phase involves entailing data to DM tool.
The fourth phase involves running the DM tool on the selected data. The last phase consists
of evaluating results by analyzing the model by contrast with statistical models or new
samples.
One assumption behind approaches to requirements engineering in DM-BI is that sufficient
knowledge of the requirements already exits. It is quite well known that in normal situa-
tions, customers and users are ’speaking another language’ than the development team
(Maiden et al. 2007). The task of translating customers’ and users’ ideas into the develop-
ment teams’ language is done by requirements engineers and business-analysts using dif-
ferent notations (Jiang & Eberlein 2007). However, this is increasingly flawed because of
the breadth of expertise that is needed to specify complex systems and the number of hu-
mans that may be involved in the process. Thus, current requirements elicitation method-
ologies fail in that they do not provide adequately coverage of concepts needed to elicit re-
quirements, nor do they support corresponding documentation or cross referencing.
3. Framework for Requirements Elicitation in DM-BI
The need to adapt traditional requirements engineering process for DM-BI systems is based
on the premise that the requirements analysis for these types of systems differ substantially
from requirements analysis for conventional information systems. Evidence of this asser-
tion is found in a wide range of DM-BI project domains: mobile telephony (Grosser et al,
2005), health policies (Felgaer et al, 2006), agro-industry (Cogliati et al, 2006), and crimi-
nal intelligence (Valenga et al, 2008). In each of these cases, the DM-BI methodologies had
difficulty dealing with some common requirements problems, such as the customer doesn’t
understand the technical lexis used by DM-BI group, the customer were not clear about the
goals and capabilities of the DM-BI project or what it could achieve, or the models defined
by DM-BI group were different from the ones the customer envisioned. A complete list of
the identified problems is shown in Fig. 1. This field experience has taught the authors the
necessity of defining a list of concepts to be educed during the business understanding
phase. The list of needed concepts and its relation to the detected problems is shown in Fig. 1.
PROBLEM CONCEPTS TO BE EDUCED
[a] The customer doesn’t understand the technical lexis
used by DM-BI group
DM BI group can’t understand the lexis of the cus-
tomer’s information domain
[c] The DM-BI group found it hard to understand how
they could help the customer because they didn’t
know the project domain
Definitions, acronyms and abbreviations
[d] The customer was not sure what the DM-BI project
could do or achieve
[e] Models defined by DM- BI group were different from
the ones the customer envisioned
Project objectives
Successful criteria of the project
Project expectations
Project suppositions
[f] The customer was an unpredictable group (not so con-
cerned with the project)
Human resource involved
[g] The customer did not know the needed organizational
information and its condition
Project restrictions
Project risks
Contingency planning
[h] Data identified by requirements were not the right
ones
Requirements goal
The requirement information or data source
Attributes related on requirements
P. Britos, O. Dieste and R. García-Martínez 142
When DM-BI project was in modeling phase (re-
quirements solutions) and DM-BI group detected
problems in data, (i.e., data identified by requirements
were not the right ones), it was necessary to redefine
requirements
Requirement results suppositions
Requirement restrictions
Requirement risks
Requirement contingencies plan
[j] Requirements of DM-BI project misunderstandings
resulted in selection of the wrong modeling tool
Evaluating DM-BI tools
Fig. 1. Relation among identified problems in the field and the Concepts needed to be educed
To solve these problems, we have needed to educe specific information in each DM-BI pro-
ject. This information may be modeled by a list of concepts that are educed in the listed be-
low:
Definitions, acronyms and abbreviations: It is necessary to identify definitions, acro-
nyms and abbreviations for establishing lexis to be shared among all persons related to
the DM-BI project. This addresses problems: [a], and [c] (see Fig. 1).
Project objectives: It is necessary to identify the objective of the DM-BI project and its
motivation to characterize what customer needs. This addresses problems: [d] and [e].
Successful criteria of the project: It is necessary to identify the criteria which turn the
project into a successful one. The criteria must be described in terms of expected
achievements of the DM-BI project. This addresses problems: [d] and [e].
Project expectations: It is necessary to identify what is expected to be achieved by the
DM-BI project and to confirm that they fulfill the customer’s expectations. The expecta-
tions must be aligned with the objectives and the project success criteria. This addresses
problems: [d] and [e].
Project’s suppositions: It is important to identify the suppositions that must be assumed
as true ones in order to start the DM-BI project. The project’s suppositions become the
start point of the requirement elicitation process. This addresses problems: [d] and [e].
Project restrictions: In order to specify the DM-BI project context, it is necessary to
identify the limits previously established for the project: related to organization: politi-
cal, legal and data quantity, related to data: to access sources of information, and data
quality, related to human and technical resources: the size of the data sources related to
hardware and software handlers, hardware and software limitations, human resources;
and related to the project: those activities which affect the project and its security (ac-
cess to documentation about the project, without any possibility of a backup). This ad-
dress problem: [g].
Project risks: Identify risks for the DM-BI project by looking continuously at what
might be wrong in the organization (related to the DM-BI project) and determining
which risks are important to be solved. Risks identification is needed to define contin-
gency plans to be applied to mitigate risk. This address problem: [g].
Contingency plans: It is necessary to define contingency plans to be applied to off-set
risk. This address problem: [g].
Human resource involved: It is important to identify the different roles in the DM-BI
project and the human resources that will fill these roles. The roles are in the areas of
exploring data and business domain expertise. This address problem: [f].
Requirement goals: The project’s objectives are decomposed in requirement goals. The
requirement goals are needed in conjunction with project’s suppositions to define the
DM-BI processes to be applied. This address problem: [h].
Requirements Elicitation in Data Mining for Business Intelligence Projects 143
The requirement information or data source: It is necessary to establish which informa-
tion or data source are going to be used and where is it in order to accomplish a require-
ment's goal. This address problem: [h].
Requirement results suppositions: It is necessary to identify the suppositions about re-
quirement results in order to have guidance to act to accomplish the requirement goal. It
must be consistent with the project goal, its expectations and suppositions. This ad-
dresses problem: .
Requirements restrictions: In order to specify each DM-BI project requirement context,
it is necessary to identify requirement limits which must be consistent with those de-
scribed in other parts of the elicitation document: related to data: to access sources of
information, and data quality, related to human and technical resources: the size of the
data sources related to hardware and software handlers, hardware and software limita-
tions, human resources; and related to the project security. This address problem: .
Attributes related on requirements: Establish which attributes are going to be used in or-
der to accomplish a requirement goal. This address problem: [h].
Requirement risks: It is important to identify DM-BI project requirement risks by look-
ing continuously for what might be wrong in the requirement (related to the DM-BI pro-
ject) and determining which risks are important to be solved. Requirement risks identifi-
cation is needed to define contingency plans to be applied when needed. This address
problem: .
Requirement contingency plans: It is necessary to define contingency plans to be applied
when an occurrence warrants. This address problem: .
Evaluating DM-BI tools: It is necessary to evaluate available DM-BI tools to establish
which are the best ones to accomplish the project’s objectives. This address problem: [j].
Based on these concepts, to address the problems identified in Fig. 1, we propose a method
for DM-BI project requirements elicitation process next.
4. Proposed Method
The proposed method of five steps is described in section 4.1, and the process products and
their relation with the process steps is shown in section 4.2.
4.1. Process
Once the needed concepts have been identified, it is necessary to establish the steps to
educe those concepts. The proposed structure is similar to those proposed by Software En-
gineering that allows progressing over the needed concepts to maintain their natural order.
In the business understanding phase of any DM-BI methodology, we propose a DM-BI pro-
ject requirements elicitation process of five steps that is shown in Fig. 2.
Fig. 2. Process of requirements elicitation
P. Britos, O. Dieste and R. García-Martínez 144
The purpose of the step "understand the project’s domain" consists of establishing commu-
nication channels in ordinary language among persons involved into the DM-BI project.
The purpose of the step "know the project’s data domain" consists of establishing the pro-
ject’s requirements; the data needed for those requirements and its location, risks involved
in the data and the requirements´ development, the data and requirements’ restrictions, and
finally its suppositions. The purpose of the step "understand the project’s scope" is to
achieve the DM-BI projects objective, its limitations, expectations and risks. The purpose
of the step "identify the human resources needed skills" consists of knowing the list of hu-
man resources involved, its restrictions, risks and responsibilities. The purpose of the step
"select the correct DM-BI tool" is to select an adequate tool according to the information
obtained in the earlier steps.
To know the project’s data domain in terms of requirements goal, the requirements infor-
mation of data source information, requirements results suppositions, requirements restric-
tions, attributes involved in requirements, risks and contingency plans; it is necessary to
understand the project’s domain in terms of definitions, acronyms and abbreviations. To
understand the project’s scope in terms of project objectives, successful criteria of the pro-
ject, project expectations, project suppositions, restrictions, risks, and contingency plans; it
is necessary to know the project’s data domain in terms of requirements goal, the require-
ments information of data source information, requirements results suppositions, require-
ments restrictions, attributes involved in requirements, requirements risks and requirements
contingency plans. To identify the human resources needed in terms of defining human re-
sources involved; it is necessary to understand the project’s scope in terms of project objec-
tives, project successful criteria, project expectations, project suppositions, project restric-
tions, project risks, and contingency plans. To identify the human resources needed skills in
terms of defining human resources involved; it is necessary to select the correct DM-BI tool
in terms of tools evaluation.
The conceptual dependency among the needed concept is shown in Fig. 3.
Fig. 3. Cross references of educed concepts represented by the templates
Requirements Elicitation in Data Mining for Business Intelligence Projects 145
4.2 Products
We have defined a set of templates. Each template is associated to each concept. These
templates have a detailed description of the concepts to be educed (see examples in section
5). The templates allow the concept evolution through the requirements elicitation process.
The relation between the educed concepts as products and the steps of the proposed process
(see section 4.1) to generate them is shown in Fig. 4.
PRODUCT
(concepts to be educed)
STEPS
D
e
f
i
n
i
t
i
o
n
s
,
a
c
r
o
n
y
m
s
a
n
d
a
b
b
r
e
v
i
a
t
i
o
n
s
P
r
o
j
e
c
t
’
s
o
b
j
e
c
t
i
v
e
s
S
u
c
c
e
s
s
f
u
l
c
r
i
t
e
r
i
a
o
f
t
h
e
p
r
o
j
e
c
t
P
r
o
j
e
c
t
’
s
e
x
p
e
c
t
a
t
i
o
n
s
P
r
o
j
e
c
t
’
s
s
u
p
p
o
s
i
t
i
o
n
s
P
r
o
j
e
c
t
’
s
r
e
s
t
r
i
c
t
i
o
n
s
P
r
o
j
e
c
t
'
s
r
i
s
k
s
C
o
n
t
i
n
g
e
n
c
i
e
s
p
l
a
n
H
u
m
a
n
r
e
s
o
u
r
c
e
i
n
v
o
l
v
e
d
R
e
q
u
i
r
e
m
e
n
t
'
s
g
o
a
l
T
h
e
r
e
q
u
i
r
e
m
e
n
t
’
s
i
n
f
o
r
m
a
t
i
o
n
o
r
d
a
t
a
s
o
u
r
c
e
R
e
q
u
i
r
e
m
e
n
t
'
s
r
e
s
u
l
t
s
s
u
p
p
o
s
i
t
i
o
n
s
R
e
q
u
i
r
e
m
e
n
t
s
r
e
s
t
r
i
c
t
i
o
n
s
A
t
t
r
i
b
u
t
e
s
r
e
l
a
t
e
d
o
n
r
e
q
u
i
r
e
m
e
n
t
s
R
e
q
u
i
r
e
m
e
n
t
'
s
r
i
s
k
s
R
e
q
u
i
r
e
m
e
n
t
'
s
c
o
n
t
i
n
g
e
n
c
i
e
s
p
l
a
n
E
v
a
l
u
a
t
i
n
g
D
M
-
B
I
t
o
o
l
s
Understand the project’s domain ? ?
Know the project’s data domain ? ? ? ? ? ? ? ? ?
Understand the project’s scope ? ? ? ? ? ? ? ? ?
Identify the human resources
needed skills
? ?
Select the correct DM-BI tool ? ? ? ? ? ?
Fig. 4. Relation among products (educed concepts) and process steps
5. Examples of Real Case Based Use of Templates
In this section we present a set of template examples based on a real DM-BI project (real
case example) within the telecommunications industry. The case is centered on a company
that scrutinizes the customers´ service closely, and the project objective is to show the rela-
tion between customer satisfaction (especially customers fidelity), and the company’s
products and qualities improvement initiatives. The DM-BI project requirements elicitation
products and concept cross references are captured by the fulfillment and interaction among
the different templates (i.e. for the real case example see Fig. 5 to Fig. 8).
Fig. 5 shows how the requirements objective: "causal evidence detection of the wide band
service sign-off" (see Template "Report - Requirements Goal") needs supposition 1: "to
identify causes of wide band service sign-off" (see Template "Report - Requirement's Re-
sults Supposition"), restriction 1: "amount of available identified wide band service sign-
off cases" (see Template: "Report – Requirement's Restrictions"), attribute: "Service Sign-
Off" (see Template: "Report - Attributes Related On Requirements"), contingency 1: "to
identify the attributes more important for every requirement by means of brainstorming"
P. Britos, O. Dieste and R. García-Martínez 146
(see Template: "Report – Requirement's Contingencies Plan"). The information origin for
the attribute: "Service Sign-Off" (see Template: "Report - Attributes Related on Require-
ments") is in "Database of sign-off products and services transactions" (see Template "Re-
port - The Requirement’s Information of Data Source").
Fig. 5. Set of templates needed to define Requirements Goal
Requirements Elicitation in Data Mining for Business Intelligence Projects 147
The information origin for the contingency 2: "they were realized more interviews to the
organizations clients to detect their satisfaction" (see Template: "Report – Requirement's
Contingencies Plan") is in risk 2: “there exits a few amount of data, and the sample seem to
be unrepresentative” (see Tem plate: “Report – Requirement’s Risk”). The definition of the
concept: “go-on-buying attitude” (see Template: “Report - Definitions, Acronyms and Ab-
breviations”) is used to understand the meaning of that attribute in Template: "Report - At-
tributes Related on Requirements". The definition of the concept: “wide band service sing -
off” (see Template: “Report - Definitions, Acronyms and Abbreviations”) is used to under-
stand the requirements objective: "causal evidence detection of the wide band service sign-
off" (see Template "Report - Requirements Goal").
Fig. 6 shows that the project’s objective: "to determine indicators of correlation between
investment and actions for quality improvement" (see Template "Report - Project’s Objec-
tives") needs criteria 1: " sign-Up and Sign-Off customer causes identification related to
the described satisfaction measures." (See Template "Report – Successful Criteria of the
Project"), expectation 1: “to identify variables
Fig. 6. Set of templates needed to define Project’s Objectives
P. Britos, O. Dieste and R. García-Martínez 148
which affect sign-up and sign-off of the customer related to the described satisfaction
measures" (see Template: "Report – Project’s Expectations"), supposition 1: " sign-Off cus-
tomer causes because they are not satisfied measures " (see Template: "Report – Project’s
Suppositions"), restriction 1: “Database with customer satisfaction measures covering 5%
of customer population” (see Template: “Report – Project’s Restrictions“), contingency 1:
"brief training on DM_BI" (see training Template: "Report – Contingencies Plan"). The in-
formation origin for the contingency 1: "brief on DM_BI" (see Template: "Report –
Contingencies Plan") is in risk 1: “inexpert personnel of the organization in DM-BI” (see
Template: “Report – Project’s Risk”).
Fig. 7 shows that to evaluate DM-BI tools needs requirements objective: "causal evidence
detection of the wide band service sign-off" (see Template "Report - Requirements Goal").
Fig. 7. Set of templates needed to define Evaluating DM-BI Tools
Fig. 8 shows how the human resources involved: "experts in business’ domain”, “leader of
the project” and “data mining experts” in other (see Template "Report - Human Resource
Involved") needs objective 1 “to determine indicators of correlation between investment
and actions for quality improvement" (see Template "Report - Project’s Objectives"), and
the requirements objective: "causal evidence detection of the wide band service sign-off"
(see Template "Report - Requirements Goal") .
Fig. 8. Set of templates needed to define Human Resource Involved
Requirements Elicitation in Data Mining for Business Intelligence Projects 149
6. Discussion
Current DM-BI methodologies fail to educe all the concepts (see section 3) needed during
the business understanding phase of DM-BI (shown in Fig. 9). CRISP-DM educe on set of
concepts, P3TQ another and SEMMA yet a third. In general, these methodologies attend to
concepts related to determining business objectives and assess situations (at least for one
methodology) and concepts related to determine data mining goals and project plan produc-
tion are not attended. In this context, the proposed methodology is more robust than current
ones, because it educes all the necessary concepts to model the DM-BI project's require-
ments.
Our consulting engagements in DM-BI projects have allowed us to test our ideas in the
field, but we recognize the necessity for a formal research approach. So the next step will
be to carry out experiments comparing the proposed DM-BI project requirements elicitation
process with existing ones. The first step in the experiment will be to build a set of testing
DM-BI project cases in which each case includes a case description K
i
(description of case
i) and the list of the requirements of the case R
i
(requirements of case i) to be educed. Then
two development groups will be considered, one trained in existing DM-BI business under-
standing phase (control group) and the other trained in our approach (testing group); both
groups will be asked to identify and document DM-BI project requirement from the set of
the previously defined DM-BI project cases (the set of K
i
). The list of the requirements of
the case will not be shown to any groups. Referees will compare the amount of well identi-
fied requirements from the control group with those identified by the testing group, and us-
ing statistical tests, differences between the groups will be compared. We expect to validate
experimentally that the amount of correct requirements educed by the testing group (the one
using the proposed process) is significantly better than the amount of correct requirements
educed by the control group (current methodologies).
7. Conclusion
This paper presents an approach to educe the requirements for DM-BI project that ad-
dresses identified weaknesses in current data mining methodologies. The approach is based
on a list of DM-BI project requirements, needed concepts that have to be educed, a set of
templates to document its elicitation and the associated process.
The proposed process and set of templates have been tuned in field cases and their effec-
tiveness has been demonstrated. To further verify the effectiveness of the proposed ap-
proach, a formal experiment is planned for the second semester 2008 with a population of
advance students at the Software Engineering Bachelor Program, University of Buenos
Aires.
The focus on DM-BI project requirements documentation enables the achievement of req-
uisite, consistent, and traceable requirements specifications over the entire project. This
documentation encourages beginning the modeling activities based on a common lexis and
cross-referenced concepts related to the target business domain requirements.
P. Britos, O. Dieste and R. García-Martínez 150
8. References
Chapman P, Clinton J, Keber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM
1.0 Step by step BIguide Edited by SPSS.http://www.crisp-dm.org/CRISPWP-0800.pdf Ac-
cessed 14 September 2006.
Cogliati M, Britos P, García-Martínez R (2006) Patterns in Temporal Series of Meteorological
Variables Using SOM & TDIDT In: Bramer M (ed) Artificial Intelligence in Theory and
Practice, Boston, Springer, IFIP Series 217:305-314
Felgaer P, Britos P, and García-Martínez R, (2006) Prediction in Health Domain Using Bayesian
Network Optimization Based on Induction Learning Techniques. Int. J. of Mod. Ph. C 17(3):
447-455
Grosser H, Britos P, García-Martínez R (2005) Detecting Fraud in Mobile Telephony Using
Neural Networks. LNAI 3533:613-615
IEEE (1993) Standard IEEE 830-1993: Recommended Practice for Software Requirements Spe-
cifications. Institute of Electronic and Electrical Engineers Press.
IEEE (2004) Guide to the Software Engineering Body of Knowledge. IEEE Comp. Society Press
Jiang L, Eberlein A (2007) Selecting Requirements Engineering Techniques based on Project At-
tributes - A Case Study. 14th Annual IEEE ECBS: 269-278
Maiden N, Robertson S, Gizikis A (2004) Provoking Creativity: Imagine What Your Require-
ments Could be Like. IEEE Software 21(5): 68-75
Maiden N, Ncube C, Robertson S (2007) Can Requirements Be Creative? Experiences with an
Enhanced Air Space Management System Proceedings 29th ICSE: 632-641
Pyle D (2003) Business Modeling and Business intelligence. Morgan Kaufmann
SAS (2008) SAS Enterprise Miner: SEMMAhttp://www.sas.com/technologies/analytics/ data-
mining/ miner/semma.html. Accessed 29 February 2008
Silva F, Freire J (2003) DWARF: An Approach for Requirements Definition and Management of
Data Warehouse Systems. RE´03: 75-84
Solheim H, Lillehagen F, Petersen S, Jorgensen H, Anastasiou M (2005) Model-driven visual re-
quirements engineering Proceedings RE´05:421-428
Valenga F, Fernández E, Merlino H, Rodríguez D, Procopio C, Britos P, García-Martínez R
(2008) Minería de Datos Aplicada a la Detección de Patrones Delictivos en Argentina. VII
JIISIC´08: 31-39
Winter R, Strauch B (2002) A Method for Demand-driven Information Requirements Analysis in
Data Warehousing Projects. HICSS-36:231-239
Yang Q, Wu X (2006) 10 Challenging Problems in Data Mining Research. Int. J. Inf. Tech. &
Decis. Mak. 5(4):597–604
doc_527663548.pdf
There are data mining methodologies for business intelligence (DM-BI) projects that highlight the importance of planning an ordered.
Please use the following format when citing this chapter:
Britos, P., Dieste, O. and García-Martínez, R., 2008, in IFIP International Federation for Information
Processing, Volume 274; Advances in Information Systems Research, Education and Practice; David
Avison, George M. Kasper, Barbara Pernici, Isabel Ramos, Dewald Roode; (Boston: Springer), pp.
Requirements Elicitation in Data Mining for
Business Intelligence Projects
Paola Britos
1
, Oscar Dieste
2
and Ramón García-Martínez
3
1
Software and Knowledge Engineering Center. Buenos Aires Institute of Technology, AR
2
Empirical Software Engineering Research Group, Polytechnic University of Madrid, ES
3
Intelligent Systems Laboratory. Engineering School. University of Buenos Aires, AR
{pbritos, rgm}@itba.edu.ar, [email protected]
Abstract: There are data mining methodologies for business intelligence (DM-BI) projects
that highlight the importance of planning an ordered, documented, consistent and traceable
requirement’s elicitation throughout the entire project. However, the classical software en-
gineering approach is not completely suitable for DM-BI projects because it neglects the
requirements specification aspects of projects. This article focuses on identifying concepts
for understand DM-BI project domain from DM-BI field experience, including how re-
quirements can be educed by a proposed DM-BI project requirements elicitation process
and how they can be documented by a template set.
1. Introduction
A Data Mining for Business Intelligence (DM-BI) methodology seeks to organize the pat-
tern discovery process in the data warehouse of an organization. These methodologies con-
sider requirements specification as one of the early activities of the project (Chapman et al.
2000; Pyle 2003). Similarly, requirements are an important phase in software engineering
methodologies (IEEE, 1993; Winter & Strauch 2004; Maiden et al. 2004, 2007; Solheim et
al. 2005; Jiang & Eberlein 2007).
Several authors (Winter & Strauch 2002; Silva & Freire 2003; Yang and Wu, 2006) have
addressed the need to improve DM-BI methodologies, but they focuses on DM-BI goals de-
finition and DM-BI tasks specification as exploratory data analysis and develop tools for
DM-BI process documentation, model-building, and pattern-finding. The DM-BI commu-
nity has neglected the requirements specification aspects of projects, failing to identify any
technique to elicit necessary knowledge or suggest any template for systematic documenta-
tion of requirements.
In order to explore how to minimize the impact of the presented problems this research fo-
cuses on an approach based on: understanding the DM-BI project’s domain, knowing the
DM-BI project’s data domain, understanding the DM-BI project’s scope, identifying the
needed human resources, and selecting the appropriate DM-BI tool. The approach also
looks to specify documentation tools for required information of DM-BI projects.
In this paper we present related research on DM methodologies addressing the problem
(section 2); a solution approach is introduced (section 3); a proposed method for require-
ments engineering in DM-BI projects is developed (section 4); focusing on process and
139–150.
P. Britos, O. Dieste and R. García-Martínez 140
products, an example of a real case based use of templates is drawn (section 5); a discussion
of the strengths of the proposed process is presented (section 6); and some conclusions are
drawn (section 7).
2. Current Methodologies for DM-BI
The DM-BI literature on requirements elicitation identifies concepts related to how to ex-
tract, transform, aggregate, and discover business patterns in organization data. Moreover,
these activities should be performed based on a concise dimensional schema. In this con-
text, stakeholders and requirements engineers work together to identify what and where to
look within organization data sources, in order to provide the bases for discovering business
patterns for business improvement. The requirement elicitation process is addressed by
most commonly used data mining (DM) methodologies (Chapman et al, 2000; Pyle, 2003,
SAS, 2008). DM methodologies state the necessity of business understanding as the starting
point for any DM project.
The CRISP (cross industry standard for data mining) methodology (Chapman et al, 2000)
consists of four levels of abstraction, hierarchically organized from general tasks to specific
cases. The process is divided into six phases, each one having many general tasks of second
level or sub phases. General tasks are projected to specific ones, where the actions that must
be developed for specific situations are described. As a consequence, we find a general task
“cleaning data” in second level; then in third level, those tasks that must be developed for a
specific case, as for example “cleaning numerical data” or “cleaning categorical data”. In
the fourth level, groups of actions, decisions and results about the specific data mining pro-
ject are collected. The CRISP-DM methodology presents two different documents as a tool
for assisting during the development of the data mining project: the reference model and the
user’s guide. The document model of reference describes, in a general way, the phases,
general tasks and exit-points of a data mining project. The user’s guide brings detailed in-
formation about practical application of the model of reference to specific data mining pro-
jects; it also gives advices and check-lists about each phase’s tasks.
The methodology P
3
TQ (Product, Place, Price, Time, and Quantity) consists of two parts
(Pyle, 2003): [a] Modeling (PI): provides a step-by-step guide to develop and to build a
model to address a business problem or opportunity. Modeling depends very much on the
business circumstances that prompt the modeling in the first place, as indicated by the five
different entry scenarios to PI. Largely, PI provides lists of actions that must be completed,
depending on circumstances; and Data Mining (PII): provides a step-by-step guide to
mining the data to produce the required model as identified in PI. Data Mining consists of a
series of stages that have to be completed in order. Unlike modeling in which several tasks
may take place at the same time, mining has to proceed from stage to stage. Each part is
based on four types of “activity boxes"; action boxes: indicate one or more required “next
steps” for you to take; discovery boxes: provide exploratory actions that you need to take to
decide what to do next; technique boxes: provide supplemental information about the rec-
ommended steps to be described in the action or discovery boxes; and example boxes: gives
a detailed description of how to use a specific technique, along with pointers to an excel
worksheet.
SEMMA (Sample, Explore, Modify, Model and Assess) is a methodology oriented to se-
lect, explore and model a great amount of data; looking to discover business patterns in the
data (SAS, 2008). The process begins with the extraction of sample data on which analysis
Requirements Elicitation in Data Mining for Business Intelligence Projects 141
is going to be applied. Once the sample is selected, the methodology proposes to explore
the data in order to simplify the model. The third phase involves entailing data to DM tool.
The fourth phase involves running the DM tool on the selected data. The last phase consists
of evaluating results by analyzing the model by contrast with statistical models or new
samples.
One assumption behind approaches to requirements engineering in DM-BI is that sufficient
knowledge of the requirements already exits. It is quite well known that in normal situa-
tions, customers and users are ’speaking another language’ than the development team
(Maiden et al. 2007). The task of translating customers’ and users’ ideas into the develop-
ment teams’ language is done by requirements engineers and business-analysts using dif-
ferent notations (Jiang & Eberlein 2007). However, this is increasingly flawed because of
the breadth of expertise that is needed to specify complex systems and the number of hu-
mans that may be involved in the process. Thus, current requirements elicitation method-
ologies fail in that they do not provide adequately coverage of concepts needed to elicit re-
quirements, nor do they support corresponding documentation or cross referencing.
3. Framework for Requirements Elicitation in DM-BI
The need to adapt traditional requirements engineering process for DM-BI systems is based
on the premise that the requirements analysis for these types of systems differ substantially
from requirements analysis for conventional information systems. Evidence of this asser-
tion is found in a wide range of DM-BI project domains: mobile telephony (Grosser et al,
2005), health policies (Felgaer et al, 2006), agro-industry (Cogliati et al, 2006), and crimi-
nal intelligence (Valenga et al, 2008). In each of these cases, the DM-BI methodologies had
difficulty dealing with some common requirements problems, such as the customer doesn’t
understand the technical lexis used by DM-BI group, the customer were not clear about the
goals and capabilities of the DM-BI project or what it could achieve, or the models defined
by DM-BI group were different from the ones the customer envisioned. A complete list of
the identified problems is shown in Fig. 1. This field experience has taught the authors the
necessity of defining a list of concepts to be educed during the business understanding
phase. The list of needed concepts and its relation to the detected problems is shown in Fig. 1.
PROBLEM CONCEPTS TO BE EDUCED
[a] The customer doesn’t understand the technical lexis
used by DM-BI group
DM BI group can’t understand the lexis of the cus-
tomer’s information domain
[c] The DM-BI group found it hard to understand how
they could help the customer because they didn’t
know the project domain
Definitions, acronyms and abbreviations
[d] The customer was not sure what the DM-BI project
could do or achieve
[e] Models defined by DM- BI group were different from
the ones the customer envisioned
Project objectives
Successful criteria of the project
Project expectations
Project suppositions
[f] The customer was an unpredictable group (not so con-
cerned with the project)
Human resource involved
[g] The customer did not know the needed organizational
information and its condition
Project restrictions
Project risks
Contingency planning
[h] Data identified by requirements were not the right
ones
Requirements goal
The requirement information or data source
Attributes related on requirements
P. Britos, O. Dieste and R. García-Martínez 142
When DM-BI project was in modeling phase (re-
quirements solutions) and DM-BI group detected
problems in data, (i.e., data identified by requirements
were not the right ones), it was necessary to redefine
requirements
Requirement results suppositions
Requirement restrictions
Requirement risks
Requirement contingencies plan
[j] Requirements of DM-BI project misunderstandings
resulted in selection of the wrong modeling tool
Evaluating DM-BI tools
Fig. 1. Relation among identified problems in the field and the Concepts needed to be educed
To solve these problems, we have needed to educe specific information in each DM-BI pro-
ject. This information may be modeled by a list of concepts that are educed in the listed be-
low:
Definitions, acronyms and abbreviations: It is necessary to identify definitions, acro-
nyms and abbreviations for establishing lexis to be shared among all persons related to
the DM-BI project. This addresses problems: [a], and [c] (see Fig. 1).
Project objectives: It is necessary to identify the objective of the DM-BI project and its
motivation to characterize what customer needs. This addresses problems: [d] and [e].
Successful criteria of the project: It is necessary to identify the criteria which turn the
project into a successful one. The criteria must be described in terms of expected
achievements of the DM-BI project. This addresses problems: [d] and [e].
Project expectations: It is necessary to identify what is expected to be achieved by the
DM-BI project and to confirm that they fulfill the customer’s expectations. The expecta-
tions must be aligned with the objectives and the project success criteria. This addresses
problems: [d] and [e].
Project’s suppositions: It is important to identify the suppositions that must be assumed
as true ones in order to start the DM-BI project. The project’s suppositions become the
start point of the requirement elicitation process. This addresses problems: [d] and [e].
Project restrictions: In order to specify the DM-BI project context, it is necessary to
identify the limits previously established for the project: related to organization: politi-
cal, legal and data quantity, related to data: to access sources of information, and data
quality, related to human and technical resources: the size of the data sources related to
hardware and software handlers, hardware and software limitations, human resources;
and related to the project: those activities which affect the project and its security (ac-
cess to documentation about the project, without any possibility of a backup). This ad-
dress problem: [g].
Project risks: Identify risks for the DM-BI project by looking continuously at what
might be wrong in the organization (related to the DM-BI project) and determining
which risks are important to be solved. Risks identification is needed to define contin-
gency plans to be applied to mitigate risk. This address problem: [g].
Contingency plans: It is necessary to define contingency plans to be applied to off-set
risk. This address problem: [g].
Human resource involved: It is important to identify the different roles in the DM-BI
project and the human resources that will fill these roles. The roles are in the areas of
exploring data and business domain expertise. This address problem: [f].
Requirement goals: The project’s objectives are decomposed in requirement goals. The
requirement goals are needed in conjunction with project’s suppositions to define the
DM-BI processes to be applied. This address problem: [h].
Requirements Elicitation in Data Mining for Business Intelligence Projects 143
The requirement information or data source: It is necessary to establish which informa-
tion or data source are going to be used and where is it in order to accomplish a require-
ment's goal. This address problem: [h].
Requirement results suppositions: It is necessary to identify the suppositions about re-
quirement results in order to have guidance to act to accomplish the requirement goal. It
must be consistent with the project goal, its expectations and suppositions. This ad-
dresses problem: .
Requirements restrictions: In order to specify each DM-BI project requirement context,
it is necessary to identify requirement limits which must be consistent with those de-
scribed in other parts of the elicitation document: related to data: to access sources of
information, and data quality, related to human and technical resources: the size of the
data sources related to hardware and software handlers, hardware and software limita-
tions, human resources; and related to the project security. This address problem: .
Attributes related on requirements: Establish which attributes are going to be used in or-
der to accomplish a requirement goal. This address problem: [h].
Requirement risks: It is important to identify DM-BI project requirement risks by look-
ing continuously for what might be wrong in the requirement (related to the DM-BI pro-
ject) and determining which risks are important to be solved. Requirement risks identifi-
cation is needed to define contingency plans to be applied when needed. This address
problem: .
Requirement contingency plans: It is necessary to define contingency plans to be applied
when an occurrence warrants. This address problem: .
Evaluating DM-BI tools: It is necessary to evaluate available DM-BI tools to establish
which are the best ones to accomplish the project’s objectives. This address problem: [j].
Based on these concepts, to address the problems identified in Fig. 1, we propose a method
for DM-BI project requirements elicitation process next.
4. Proposed Method
The proposed method of five steps is described in section 4.1, and the process products and
their relation with the process steps is shown in section 4.2.
4.1. Process
Once the needed concepts have been identified, it is necessary to establish the steps to
educe those concepts. The proposed structure is similar to those proposed by Software En-
gineering that allows progressing over the needed concepts to maintain their natural order.
In the business understanding phase of any DM-BI methodology, we propose a DM-BI pro-
ject requirements elicitation process of five steps that is shown in Fig. 2.
Fig. 2. Process of requirements elicitation
P. Britos, O. Dieste and R. García-Martínez 144
The purpose of the step "understand the project’s domain" consists of establishing commu-
nication channels in ordinary language among persons involved into the DM-BI project.
The purpose of the step "know the project’s data domain" consists of establishing the pro-
ject’s requirements; the data needed for those requirements and its location, risks involved
in the data and the requirements´ development, the data and requirements’ restrictions, and
finally its suppositions. The purpose of the step "understand the project’s scope" is to
achieve the DM-BI projects objective, its limitations, expectations and risks. The purpose
of the step "identify the human resources needed skills" consists of knowing the list of hu-
man resources involved, its restrictions, risks and responsibilities. The purpose of the step
"select the correct DM-BI tool" is to select an adequate tool according to the information
obtained in the earlier steps.
To know the project’s data domain in terms of requirements goal, the requirements infor-
mation of data source information, requirements results suppositions, requirements restric-
tions, attributes involved in requirements, risks and contingency plans; it is necessary to
understand the project’s domain in terms of definitions, acronyms and abbreviations. To
understand the project’s scope in terms of project objectives, successful criteria of the pro-
ject, project expectations, project suppositions, restrictions, risks, and contingency plans; it
is necessary to know the project’s data domain in terms of requirements goal, the require-
ments information of data source information, requirements results suppositions, require-
ments restrictions, attributes involved in requirements, requirements risks and requirements
contingency plans. To identify the human resources needed in terms of defining human re-
sources involved; it is necessary to understand the project’s scope in terms of project objec-
tives, project successful criteria, project expectations, project suppositions, project restric-
tions, project risks, and contingency plans. To identify the human resources needed skills in
terms of defining human resources involved; it is necessary to select the correct DM-BI tool
in terms of tools evaluation.
The conceptual dependency among the needed concept is shown in Fig. 3.
Fig. 3. Cross references of educed concepts represented by the templates
Requirements Elicitation in Data Mining for Business Intelligence Projects 145
4.2 Products
We have defined a set of templates. Each template is associated to each concept. These
templates have a detailed description of the concepts to be educed (see examples in section
5). The templates allow the concept evolution through the requirements elicitation process.
The relation between the educed concepts as products and the steps of the proposed process
(see section 4.1) to generate them is shown in Fig. 4.
PRODUCT
(concepts to be educed)
STEPS
D
e
f
i
n
i
t
i
o
n
s
,
a
c
r
o
n
y
m
s
a
n
d
a
b
b
r
e
v
i
a
t
i
o
n
s
P
r
o
j
e
c
t
’
s
o
b
j
e
c
t
i
v
e
s
S
u
c
c
e
s
s
f
u
l
c
r
i
t
e
r
i
a
o
f
t
h
e
p
r
o
j
e
c
t
P
r
o
j
e
c
t
’
s
e
x
p
e
c
t
a
t
i
o
n
s
P
r
o
j
e
c
t
’
s
s
u
p
p
o
s
i
t
i
o
n
s
P
r
o
j
e
c
t
’
s
r
e
s
t
r
i
c
t
i
o
n
s
P
r
o
j
e
c
t
'
s
r
i
s
k
s
C
o
n
t
i
n
g
e
n
c
i
e
s
p
l
a
n
H
u
m
a
n
r
e
s
o
u
r
c
e
i
n
v
o
l
v
e
d
R
e
q
u
i
r
e
m
e
n
t
'
s
g
o
a
l
T
h
e
r
e
q
u
i
r
e
m
e
n
t
’
s
i
n
f
o
r
m
a
t
i
o
n
o
r
d
a
t
a
s
o
u
r
c
e
R
e
q
u
i
r
e
m
e
n
t
'
s
r
e
s
u
l
t
s
s
u
p
p
o
s
i
t
i
o
n
s
R
e
q
u
i
r
e
m
e
n
t
s
r
e
s
t
r
i
c
t
i
o
n
s
A
t
t
r
i
b
u
t
e
s
r
e
l
a
t
e
d
o
n
r
e
q
u
i
r
e
m
e
n
t
s
R
e
q
u
i
r
e
m
e
n
t
'
s
r
i
s
k
s
R
e
q
u
i
r
e
m
e
n
t
'
s
c
o
n
t
i
n
g
e
n
c
i
e
s
p
l
a
n
E
v
a
l
u
a
t
i
n
g
D
M
-
B
I
t
o
o
l
s
Understand the project’s domain ? ?
Know the project’s data domain ? ? ? ? ? ? ? ? ?
Understand the project’s scope ? ? ? ? ? ? ? ? ?
Identify the human resources
needed skills
? ?
Select the correct DM-BI tool ? ? ? ? ? ?
Fig. 4. Relation among products (educed concepts) and process steps
5. Examples of Real Case Based Use of Templates
In this section we present a set of template examples based on a real DM-BI project (real
case example) within the telecommunications industry. The case is centered on a company
that scrutinizes the customers´ service closely, and the project objective is to show the rela-
tion between customer satisfaction (especially customers fidelity), and the company’s
products and qualities improvement initiatives. The DM-BI project requirements elicitation
products and concept cross references are captured by the fulfillment and interaction among
the different templates (i.e. for the real case example see Fig. 5 to Fig. 8).
Fig. 5 shows how the requirements objective: "causal evidence detection of the wide band
service sign-off" (see Template "Report - Requirements Goal") needs supposition 1: "to
identify causes of wide band service sign-off" (see Template "Report - Requirement's Re-
sults Supposition"), restriction 1: "amount of available identified wide band service sign-
off cases" (see Template: "Report – Requirement's Restrictions"), attribute: "Service Sign-
Off" (see Template: "Report - Attributes Related On Requirements"), contingency 1: "to
identify the attributes more important for every requirement by means of brainstorming"
P. Britos, O. Dieste and R. García-Martínez 146
(see Template: "Report – Requirement's Contingencies Plan"). The information origin for
the attribute: "Service Sign-Off" (see Template: "Report - Attributes Related on Require-
ments") is in "Database of sign-off products and services transactions" (see Template "Re-
port - The Requirement’s Information of Data Source").
Fig. 5. Set of templates needed to define Requirements Goal
Requirements Elicitation in Data Mining for Business Intelligence Projects 147
The information origin for the contingency 2: "they were realized more interviews to the
organizations clients to detect their satisfaction" (see Template: "Report – Requirement's
Contingencies Plan") is in risk 2: “there exits a few amount of data, and the sample seem to
be unrepresentative” (see Tem plate: “Report – Requirement’s Risk”). The definition of the
concept: “go-on-buying attitude” (see Template: “Report - Definitions, Acronyms and Ab-
breviations”) is used to understand the meaning of that attribute in Template: "Report - At-
tributes Related on Requirements". The definition of the concept: “wide band service sing -
off” (see Template: “Report - Definitions, Acronyms and Abbreviations”) is used to under-
stand the requirements objective: "causal evidence detection of the wide band service sign-
off" (see Template "Report - Requirements Goal").
Fig. 6 shows that the project’s objective: "to determine indicators of correlation between
investment and actions for quality improvement" (see Template "Report - Project’s Objec-
tives") needs criteria 1: " sign-Up and Sign-Off customer causes identification related to
the described satisfaction measures." (See Template "Report – Successful Criteria of the
Project"), expectation 1: “to identify variables
Fig. 6. Set of templates needed to define Project’s Objectives
P. Britos, O. Dieste and R. García-Martínez 148
which affect sign-up and sign-off of the customer related to the described satisfaction
measures" (see Template: "Report – Project’s Expectations"), supposition 1: " sign-Off cus-
tomer causes because they are not satisfied measures " (see Template: "Report – Project’s
Suppositions"), restriction 1: “Database with customer satisfaction measures covering 5%
of customer population” (see Template: “Report – Project’s Restrictions“), contingency 1:
"brief training on DM_BI" (see training Template: "Report – Contingencies Plan"). The in-
formation origin for the contingency 1: "brief on DM_BI" (see Template: "Report –
Contingencies Plan") is in risk 1: “inexpert personnel of the organization in DM-BI” (see
Template: “Report – Project’s Risk”).
Fig. 7 shows that to evaluate DM-BI tools needs requirements objective: "causal evidence
detection of the wide band service sign-off" (see Template "Report - Requirements Goal").
Fig. 7. Set of templates needed to define Evaluating DM-BI Tools
Fig. 8 shows how the human resources involved: "experts in business’ domain”, “leader of
the project” and “data mining experts” in other (see Template "Report - Human Resource
Involved") needs objective 1 “to determine indicators of correlation between investment
and actions for quality improvement" (see Template "Report - Project’s Objectives"), and
the requirements objective: "causal evidence detection of the wide band service sign-off"
(see Template "Report - Requirements Goal") .
Fig. 8. Set of templates needed to define Human Resource Involved
Requirements Elicitation in Data Mining for Business Intelligence Projects 149
6. Discussion
Current DM-BI methodologies fail to educe all the concepts (see section 3) needed during
the business understanding phase of DM-BI (shown in Fig. 9). CRISP-DM educe on set of
concepts, P3TQ another and SEMMA yet a third. In general, these methodologies attend to
concepts related to determining business objectives and assess situations (at least for one
methodology) and concepts related to determine data mining goals and project plan produc-
tion are not attended. In this context, the proposed methodology is more robust than current
ones, because it educes all the necessary concepts to model the DM-BI project's require-
ments.
Our consulting engagements in DM-BI projects have allowed us to test our ideas in the
field, but we recognize the necessity for a formal research approach. So the next step will
be to carry out experiments comparing the proposed DM-BI project requirements elicitation
process with existing ones. The first step in the experiment will be to build a set of testing
DM-BI project cases in which each case includes a case description K
i
(description of case
i) and the list of the requirements of the case R
i
(requirements of case i) to be educed. Then
two development groups will be considered, one trained in existing DM-BI business under-
standing phase (control group) and the other trained in our approach (testing group); both
groups will be asked to identify and document DM-BI project requirement from the set of
the previously defined DM-BI project cases (the set of K
i
). The list of the requirements of
the case will not be shown to any groups. Referees will compare the amount of well identi-
fied requirements from the control group with those identified by the testing group, and us-
ing statistical tests, differences between the groups will be compared. We expect to validate
experimentally that the amount of correct requirements educed by the testing group (the one
using the proposed process) is significantly better than the amount of correct requirements
educed by the control group (current methodologies).
7. Conclusion
This paper presents an approach to educe the requirements for DM-BI project that ad-
dresses identified weaknesses in current data mining methodologies. The approach is based
on a list of DM-BI project requirements, needed concepts that have to be educed, a set of
templates to document its elicitation and the associated process.
The proposed process and set of templates have been tuned in field cases and their effec-
tiveness has been demonstrated. To further verify the effectiveness of the proposed ap-
proach, a formal experiment is planned for the second semester 2008 with a population of
advance students at the Software Engineering Bachelor Program, University of Buenos
Aires.
The focus on DM-BI project requirements documentation enables the achievement of req-
uisite, consistent, and traceable requirements specifications over the entire project. This
documentation encourages beginning the modeling activities based on a common lexis and
cross-referenced concepts related to the target business domain requirements.
P. Britos, O. Dieste and R. García-Martínez 150
8. References
Chapman P, Clinton J, Keber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM
1.0 Step by step BIguide Edited by SPSS.http://www.crisp-dm.org/CRISPWP-0800.pdf Ac-
cessed 14 September 2006.
Cogliati M, Britos P, García-Martínez R (2006) Patterns in Temporal Series of Meteorological
Variables Using SOM & TDIDT In: Bramer M (ed) Artificial Intelligence in Theory and
Practice, Boston, Springer, IFIP Series 217:305-314
Felgaer P, Britos P, and García-Martínez R, (2006) Prediction in Health Domain Using Bayesian
Network Optimization Based on Induction Learning Techniques. Int. J. of Mod. Ph. C 17(3):
447-455
Grosser H, Britos P, García-Martínez R (2005) Detecting Fraud in Mobile Telephony Using
Neural Networks. LNAI 3533:613-615
IEEE (1993) Standard IEEE 830-1993: Recommended Practice for Software Requirements Spe-
cifications. Institute of Electronic and Electrical Engineers Press.
IEEE (2004) Guide to the Software Engineering Body of Knowledge. IEEE Comp. Society Press
Jiang L, Eberlein A (2007) Selecting Requirements Engineering Techniques based on Project At-
tributes - A Case Study. 14th Annual IEEE ECBS: 269-278
Maiden N, Robertson S, Gizikis A (2004) Provoking Creativity: Imagine What Your Require-
ments Could be Like. IEEE Software 21(5): 68-75
Maiden N, Ncube C, Robertson S (2007) Can Requirements Be Creative? Experiences with an
Enhanced Air Space Management System Proceedings 29th ICSE: 632-641
Pyle D (2003) Business Modeling and Business intelligence. Morgan Kaufmann
SAS (2008) SAS Enterprise Miner: SEMMAhttp://www.sas.com/technologies/analytics/ data-
mining/ miner/semma.html. Accessed 29 February 2008
Silva F, Freire J (2003) DWARF: An Approach for Requirements Definition and Management of
Data Warehouse Systems. RE´03: 75-84
Solheim H, Lillehagen F, Petersen S, Jorgensen H, Anastasiou M (2005) Model-driven visual re-
quirements engineering Proceedings RE´05:421-428
Valenga F, Fernández E, Merlino H, Rodríguez D, Procopio C, Britos P, García-Martínez R
(2008) Minería de Datos Aplicada a la Detección de Patrones Delictivos en Argentina. VII
JIISIC´08: 31-39
Winter R, Strauch B (2002) A Method for Demand-driven Information Requirements Analysis in
Data Warehousing Projects. HICSS-36:231-239
Yang Q, Wu X (2006) 10 Challenging Problems in Data Mining Research. Int. J. Inf. Tech. &
Decis. Mak. 5(4):597–604
doc_527663548.pdf