Business Intelligence A Design Science A Design Science Perspective

oneonone · Jan 23, 2016

Description
Design ... is concerned with how things Design ... is concerned with how things ought to be, with devising artifacts to attain goals.

Business Intelligence: Business Intelligence:
A Design Science A Design Science
Perspective p
Salvatore T. March Salvatore T. March
David K Wilson Professor of Management
Owen Graduate School of Management
Vanderbilt University Vanderbilt Universityhttp://SalMarch.com
Herb Simon
"The natural sciences are concerned
Herb Simon
The natural sciences are concerned
with how things are."
"Design ... is concerned with how things Design ... is concerned with how things
ought to be, with devising artifacts to
attain goals." attain goals.
The Sciences of the Artificial
Agenda g
Introduction and Overview
Case Studies
Data Warehouse Representations Data Warehouse Representations
Business Intelligence Tools
Reporting Reporting
OLAP
Data Miningg
Predictive Analytics
Conclusions
Top 10 Business and Technology Priorities in 2010
Source: Gartner EXP (January 2010)
Business Priorities Technology Priorities
Business process improvement Virtualization
Reducing enterprise costs Cloud computing
Increasing the use of
information/analytics
Web 2.0
Improving enterprise workforce Networking voice and data Improving enterprise workforce
effectiveness
Networking, voice and data
communications
Attracting and retaining new
customers
Business Intelligence
customers
Managing change initiatives Mobile technologies
Creating new products or services
(innovation)
Data/document management and
storage ( o at o ) sto age
Targeting customers and markets
more effectively
Service-oriented applications and
architecture
Consolidating business operations Security technologies
Expanding current customer
relationships
IT management
Business Intelligence g
Online Analytical Processing (OLAP)
• Reporting Tools
• Dashboards and Data Cubes
Performance Measures • Performance Measures
• Analysis Dimensions
Analytical Modeling
• Data Mining
• Statistical
• Artificial Intelligence
• Predictive Models • Predictive Models
In 2008, the business intelligence (BI) tools market reached
$7 8 billion in software license and maintenance revenue $7.8 billion in software license and maintenance revenue.
The market growth of 10.6% in 2008 surpassed previous
IDC projections, as spending by organizations of all sizes
continued Organizations are focusing on BI and analytics continued. Organizations are focusing on BI and analytics
projects that help reduce costs or retain customers. There is
growing evidence that more pervasive BI and analytics have
a direct impact on competitiveness. Better decision making
is more important when resources become restricted during
a recession, so BI and analytics projects will still appeal to a recession, so BI and analytics projects will still appeal to
management. However, justifying large capital outlays for
software will be challenging unless short term benefits can
be directly correlated with the investment As more be directly correlated with the investment. As more
incremental projects are undertaken, it will be important to
execute these projects within the long-term strategic plan of
organization wide decision management.
Dan Vesset, Program Vice President, Business Analytics.
Do you want run one of the world’s y
largest data warehouses?
At Amazon.com data and analysis guide every business y g y
decision and deliver enormous business value. Amazon’s
Business Intelligence team is responsible for a business
analytics platform that provides reporting analysis and analytics platform that provides reporting, analysis, and
data mining, and to over thousands of internal and external
customers worldwide. To help deliver these core business
metrics and decisions, we run one of the world’s largest
data warehouses. The BI team is recruiting for a Manager
of Database Administration to help deliver the next p
generation of Amazon.com's data warehousing solution.
Business System Concepts
Goals
Analyze
Data Data
Work flow
Manage
Work flow
Data flow
Decision flow
I t
P
O t t Input
Process
Output
Intelligence
Design
Choice
Conceptual MIS Structure
St t i
n
s
m
e
n
t
n
t
Strategic
Tactical
O
p
e
r
a
t
i
o
n
a
n
a
g
e
m
u
r
c
e
s
a
g
e
m
e
n
Data Warehousing and Business
Intelligence Systems
Tactical
Operational
i
n
g
c
t
i
o
n

/

O
n
t
i
n
g
e
a
t
i
o
n

M
a
n

R
e
s
o
u
g
i
c

M
a
n
a
Transaction
M
a
r
k
e
t
P
r
o
d
u
c
A
c
c
o
u
n
F
i
n
a
n
c
e
I
n
f
o
r
m
a
H
u
m
a
n
S
t
r
a
t
e
g
Transaction Processing Systems
Network Infrastructure
Corporate Databases
Red Flags g
• Businesspeople debating the "correct"
numbers coming from different reports numbers coming from different reports.
• Businesspeople using spreadsheets to
"adjust" the numbers to be "correct." j
• Data shadow systems or "spreadmarts."
• Multiple pockets of IT resources developing,
d l i i i i di d deploying, maintaining, upgrading and
growing skills in different BI tools.
• Debates among different IT groups as to • Debates among different IT groups as to
what data, metrics and algorithms should be
used for different reports. used for different reports.
SearchBusinessAnalytics.com
References
Houghton, R. et al. "Vigilant Information Systems for
Managing Enterprises in Dynamic Supply Chains: Managing Enterprises in Dynamic Supply Chains:
Real-time Dashboards At Western Digital," MIS
Quarterly Executive, March 2004.
McAfee, A. "Business Intelligence Software at
SYSCO," Harvard Business School Case Study,
September 2006 September 2006.
Wetherbe, J. C. "Executive Information Requirements:
Getting it Right " MIS Quarterly March 1991 Getting it Right, MIS Quarterly, March 1991.
Davenport, T. "Competing on Analytics," Harvard
Business Review Jan 2006 Business Review, Jan. 2006.
References: Yogi Berra g
• If you don't know where you are going, you
might wind up someplace else might wind up someplace else.
• When you come to a fork in the road, take it.
• You can observe a lot by just watching.
• If the world were perfect, it wouldn't be. If the world were perfect, it wouldn t be.
• If you ask me a question I don't know, I'm not
going to answer going to answer.
• It's tough to make predictions, especially
about the future about the future.
Executive Information
Wetherbe (MISQ, March 1991)
• Articulate business objectives • Articulate business objectives
• Analyze problems encountered
• Analyze decisions made Analyze decisions made
• Define CSFs (KPIs)
• Assess "Ends and Means" Assess Ends and Means
Transform into information requirements
using Joint Application Design principles using Joint Application Design principles.
Then assess information importance and
availability.
Western Digital g
Western Digital (WD) is a $3 billion global
designer and manufacturer of high-performance designer and manufacturer of high-performance
hard drives for desktop personal computers,
corporate networks, enterprise storage, and p , p g ,
home entertainment applications. WD’s top five
business challenges:
1 C t tl h i t i t 1. Constantly changing customer requirements
2. A fiercely competitive global industry
3. Avoiding business disruption, product returns, g p , p ,
excess inventory, and bad scheduling
4. Short product lifecycles
5 The need for extremely high quality and reliability 5. The need for extremely high quality and reliability
[Houghton, et al.]
Factory Dashboards
Core requirements:
y
1. Show the health of the factory by providing
near-real-time, graphical views of KPIs.
2 Show when a KPI goes below 2 sigma of its 2. Show when a KPI goes below 2 sigma of its
allowable value.
3. Give staff ways to drill down on each KPI to 3. Give staff ways to drill down on each KPI to
find the source of a problem.
4. Automatically issue alerts to the individuals
responsible for a KPI so they can initiate
damage control.
Corporate Dashboards
1. Billings and returns
p
2. Backlog
3. Outlook
4 Finished goods inventory 4. Finished goods inventory
5. Distributor inventory and sell-through
6. Point of sale 6. Point of sale
7. Planned shipments
8. Finished goods in transit
9. Revenue recognition
10. Customer/channel status
2009 2003
Sales $37B $26B
Net Earnings $1B $0.78B
Employees 47K 46K
Operating Companies 140 100
Orders
D li i
SYSCO
OP Co
ERP
Customers
Deliveries
Payments
DB
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
How are
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
How are
d i ?
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB we doing?
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
Corporate
Headquarters
Orders
Deliveries
OP Co
ERP
Customers
Payments
DB
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
Orders
Data
Warehouse
OP Co
ERP
Customers
Deliveries
Payments
DB
Orders
Deliveries
ETL
How are
OP Co
ERP
Customers
Deliveries
Payments
DB
Orders
ETL
How are
we doing?
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
Corporate
Headquarters
Data Warehouse Structure
Mirror the Operational Database
• Easy to load data (copy from operational) Easy to load data (copy from operational)
• Maximum flexibility and data content for existing data
• Massive storage and processing requirements
• Difficult or impossible to develop BI applications (if p p pp (
dimensions or performance measures are not included in
operational databases)
Dimensional Data Model (Star Schema)
• More difficult to load data (must transform data from the
operational databases and integrate with external data to p g
represent performance measures and dimensions)
• Potentially less flexibility and data detail
• Reduced storage and processing requirements
• Easy to develop BI applications
Business Objects
CSFs for Business Intelligence
j
1. Focus Your Efforts
2. Secure Executive Sponsorship
3 B ild Wi i P j t Pl 3. Build a Winning Project Plan
4. Make it Easy to Data Access
5 Make it Easy to Analyze Data 5. Make it Easy to Analyze Data
6. Make it Easy to Share Knowledge
7. Deliver Exceptionally Clean Data p y
8. Insist on Zero Client Administration
9. Implement Bullet-Proof Security
10. Plan for Growth
SYSCO BI Questions Q
• What additional products could we be
lli t h f t ? selling to each of our customers?
• Develop "typical purchase activity" by
customer profile (size type geography customer profile (size, type, geography,
and so on)
• Which of our current customers are we
most likely to lose?
• Examine customers' ordering patterns
over time (reduced volume may indicate
high risk of loss)
Orders
Deliveries
SYSCO
OP Co
ERP
Customers
Deliveries
Payments
DB
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
Data
Warehouse
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
Orders
Warehouse
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
ETL
BI
OP Co
ERP
Customers
Orders
Deliveries
Payments
DB
Corporate
Headquarters
Assess Costs and Benefits
Costs
• Hardware • Hardware
• Software
• Personnel and Consulting • Personnel and Consulting
Benefits
M d b tt i f ti • More and better information
• Improved decision-making
P l i • Personnel savings
• Business process improvement
• Support for strategic objectives • Support for strategic objectives
Cost Scenario ($000)
Bare
Bones
Middle of
the Road
Volume
Discount Cost Scenario ($000) Bones the Road Discount
Module
Query/Analysis $152 $191 $205 Query/Analysis $152 $191 $205
Performance Mgt $592 $759 $843
Report Creation $86 $154 $139
Report View $450 $455 $500
Analytical $170 $205
S l Ch i $109 Supply Chain $109
Total Software $1,280 $1,729 $2,001
Consulting $1,000 $1,000 $1,000
Maintenance $256 $346 $400 Maintenance $256 $346 $400
Total Cost $2,536 $3,075 $3,401
OLAP Cube
Column Dimension Values
Row Row
Dimension
Values
Cells contain a cross-tabulation of
performance measure values for
Values
each combination of Row and
Column Dimension Values. Row
and Column Dimensions typically and Column Dimensions typically
support "drill-down" and "roll-up"
capabilities capabilities.
What performance measures and dimensions would What performance measures and dimensions would
help SYSCO address its two questions?
Data Warehouse Structure
Mirror the Operational Database
• Easy to load data (copy from operational) Easy to load data (copy from operational)
• Maximum flexibility and data content for existing data
• Massive storage and processing requirements
• Difficult or impossible to develop BI applications (if p p pp (
dimensions or performance measures are not included in
operational databases)
Dimensional Data Model (Star Schema)
• More difficult to load data (must transform data from the
operational databases and integrate with external data to p g
represent performance measures and dimensions)
• Potentially less flexibility and data detail
• Reduced storage and processing requirements
• Easy to develop BI applications
Operational DB REA Model
Internal Give A Resource Internal
Agent
Give A
Event
Resource
A
External
Agent
Internal Get B Resource Internal
Agent
Get B
Event
Resource
B
Revenue Cycle Model
Sales
S l I t
Person
Sale Inventory
Customer
AR Receive Cash
Manager Cash Account
Sam's Club data
warehouse structure warehouse structure
mirrors Point-of-Sale transactions
Star Schema
Time
Product
ProductNo
Time
SaleDate
Sales
Revenue
Cost
ProductName
ProductLine
ProductType
Cost
Quantity
FK PromotionID
FK ChannelID
Channel
ChannelID
ChannelType
FK SaleDate
FK CustomerID
FK ProductNo
Customer
CustomerID
CustomerName
Industry
Region
Salesperson
Promotion
PromotionID
Fact Table
(Measures)http://www.dbmsmag.com/9708d15.html
Size
PromotionType
Customer Hierarchy
Geography Hierarchy
Fact Tables
MicroStrategy BI Interface
Create a Report
Choose Template, Builder, or Wizard
gy
Choose Template, Builder, or Wizard
Drag and drop report elements
Customer Analysis Template
Agenda g
Introduction and Overview
Case Studies
Data Warehouse Representations Data Warehouse Representations
Business Intelligence Tools
Reporting Reporting
OLAP
Data Miningg
Predictive Analytics
Conclusions
Data Miningg
Is:
• Extracting useful information from large datasets • Extracting useful information from large datasets.
• Exploration and analysis, by automatic or
semiautomatic means, of large quantities of data in , g q
order to discover meaningful patterns and rules.
• Discovering meaningful new correlations, patterns,
d d b if i h h l f and trends by sifting through large amounts of
data, using pattern recognition technologies as well
as statistical and mathematical techniques. q
Is not:
Identifying random correlations • Identifying random correlations.
Key Technologies
Computing Computing
Power
Data Data
Mining
Statistical
Data
Statistical
& Learning
Algorithms
Data
Management
Capabilities
g
Successful Applications pp
• Customer Segmentation
• Targeted Marketing (Predicting Response)
F d D t ti • Fraud Detection
• Predicting Customer Attrition
• Channel Optimization
• Predicting Loan Defaults • Predicting Loan Defaults
• Product Recommendations
Main Subdivisions
• Supervised Learning
The goal is to predict the value of an outcome
measure based on a number of input measures
(e.g., regression, logistic regression, discriminate ( g , g , g g ,
analysis, neural networks).
U i d L i • Unsupervised Learning
No output measure; the goal is to describe
associations and patterns in the input measures associations and patterns in the input measures
(e.g., association rules, principal components,
clustering).
Data Mining Challenges Data Mining Challenges
• Personnel: domain experts, IT support,
modelers
• Methodology: project management, Methodology: project management,
problem definition, data acquisition,
model development, knowledge
deployment
• Technology Architecture: data ec o ogy c tectu e data
warehouse, analytical tools
Major Players j y
• SAS (Enterprise Miner)
• Oracle (Darwin, Hyperion)
• IBM (Cognos SPSS Clementine) • IBM (Cognos, SPSS Clementine)
• SAP (Business Objects, Crystal Reports)
• Teradata Partners (Microstrategy, SAS,
SAP, Microsoft, etc.)
• Microsoft (SQL Server, Excel, SharePoint,
PowerPivot, Access)
Data Mining Process
1. develop an understanding of the problem
2. obtain the dataset
g
ob a e da ase
3. explore, clean and prepare the data (e.g.
missing values, outliers)
4. reduce the dimensionality if necessary
5. determine the data mining task (classification,
prediction association rule discovery) prediction, association rule discovery)
6. choose the data mining technique(s)
7 apply the technique(s) evaluate compare and 7. apply the technique(s), evaluate, compare and
refine
8. interpret the results, choose the best model
9. deploy the selected model
Data Mining Isn't a Good Bet For
Stock-Market Predictions
"… data-mined numbers can be so irresistible that
they are one of the leading causes of the
evaporation of money [in the stock market] " evaporation of money [in the stock-market].
Over a 13-year period annual butter production in
B l d h " l i d" 75% f th i ti i th Bangladesh "explained" 75% of the variation in the
annual returns of the Standard & Poor's 500-stock
index Tossing in U S cheese production and the index. Tossing in U.S. cheese production and the
total population of sheep in both Bangladesh and
the U.S. improved the "explanation" to 99%.
Wall Street Journal, August 8, 2009
p p
Y = 0.679 + 0.998 * X
20.00
R² = 0.70
15.00
5.00
10.00
0.00
-20.00 -15.00 -10.00 -5.00 0.00 5.00 10.00 15.00
Average of Y
-5.00
-15.00
-10.00
-20.00
S&P 500 Returns 1950 ? 2007 S&P 500 Returns
95% Conf Int 2008 ? present p
Mean Std Dev Lower Upper 2008 2009 2010
Jan 1.36 4.67 0.13 2.59 ?6.1% ?8.57% ?3.70%
Feb ?0.10 3.24 ?0.95 0.75 ?3.5% ?10.99% 2.85%
Mar 0.92 3.28 0.06 1.79 ?0.6% 8.54% 5.88%
A 1 27 3 78 0 28 2 27 4 8% 9 39% Apr 1.27 3.78 0.28 2.27 4.8% 9.39%
May 0.30 3.57 ?0.64 1.24 1.1% 5.31%
Jun 0 19 3 35 ?0 69 1 08 ?8 6% 0 02% Jun 0.19 3.35 ?0.69 1.08 ?8.6% 0.02%
Jul 0.75 4.02 ?0.31 1.80 ?1.0% 7.41%
Aug 0.00 4.67 ?1.23 1.22 1.2% 3.36% g
Sep ?0.61 4.22 ?1.72 0.50 ?9.1% 3.57%
Oct 0.93 5.14 ?0.42 2.29 ?16.9% ?1.98%
Nov 1.58 4.37 0.43 2.73 ?7.5% 5.74%
Dec 1.61 3.18 0.78 2.45 0.8% 1.78%
A $1000 Investment
2008 2009 2010 2008 2009 2010
Jan $939 $833 $1,026 $939 $641 $904
Feb $939 $833 $1,026 $939 $641 $904
Mar $933 $905 $1,086 $933 $696 $957
A $978 $990 $978 $761 Apr $978 $990 $978 $761
May $978 $990 $988 $801
Jun $978 $990 $903 $802 Jun $978 $990 $903 $802
Jul $978 $990 $894 $861
Aug $978 $990 $905 $890 Aug $978 $990 $905 $890
Sep $978 $990 $905 $890
Oct $978 $990 $752 $872
Nov $904 $1,046 $695 $922
Dec $912 $1,065 $701 $939
Data Miningg
Data
Validation
Data
Source
Domain
Knowledge
D t
Model
Knowledge
Training
Data
Data
Mining
Algorithm(s)
Model
Evaluation
Data Algorithm(s)
What data and algorithms would
help SYSCO address its two questions?
Classification Techniques
Predict the best classification for an observation.
Example Business Tasks:
q
Example Business Tasks:
• Detect / Predict Fraud
• Predict Bankruptcy p y
• Predict Response to Marketing Promotion
Basic Techniques:
• Naïve Classifier (Predominant Class)
Bayesian Classifier (Conditional Probability) • Bayesian Classifier (Conditional Probability)
• K-Nearest Neighbors (Similarity)
• Classification Tree (Iterative Partitioning) Classification Tree (Iterative Partitioning)
Cell Phone Insurance Claims
Carrier Legitimate Fraudulent Total
A 174 19 193 A 174 19 193
B 79 22 101
C 26 15 41
D 135 61 196
E 522 86 608
Total 936 203 1,139
A l f 1 139 t f 100 000 l i A sample of 1,139 out of 100,000 claims
processed in a single month were investigated to
determine if they were legitimate or fraudulent determine if they were legitimate or fraudulent.
"Bayesian" Classifier y
Carrier Legitimate Fraudulent Total
A 90% 10% 100% A 90% 10% 100%
B 78% 22% 100%
C 63% 37% 100% C 63% 37% 100%
D 69% 31% 100%
E 86% 14% 100%
Total 82% 18% 100%
Pr(Fraudulent) = 0 18 Pr(Fraudulent) = 0.18
Pr(Fraudulent | A) = 0.10
Pr(Fraudulent | B) = 0.22
Pr(Fraudulent | D) = 0.31
Pr(Fraudulent | E) = 0.14 Pr(Fraudulent | B) 0.22
Pr(Fraudulent | C) = 0.37
Pr(Fraudulent | E) 0.14
Claims Investigation g
Suppose the cost to pay a legitimate claim is $30
and the cost to investigate a claim is $5 Then if and the cost to investigate a claim is $5. Then if
no information about the carrier is used the
conclusion is to investigate all claims. conclusion is to investigate all claims.
+ $24.65
$29.65
$30.00
Claims Investigation g
+ $27.05
$32.05
$30.00
Including the condition that the claim came from
Carrier A the probability that the claim is Carrier A, the probability that the claim is
legitimate changes, resulting in a different
decision. What other factors change the
probability of legitimacy?
Credit Scoring
A bank would like to score loan applications
based on the likelihood that the loan will be
g
based on the likelihood that the loan will be
repaid. They plan to use two factors: FICO credit
score purchased from Fair, Isaac & Company p , p y
and a profitability index computed from a factors
on the loan application such as the ratio of the
l i d h i f loan amount to income and the interest rate of
the loan. To develop the model they have
gathered this data for 1000 past completed gathered this data for 1000 past completed
loans. Of these loans 700 have been paid in full
(Default = 0), 700 have defaulted (Default = 1). (Default 0), 700 have defaulted (Default 1).
K-Nearest Neighbors
A k-nearest neighbors (k-NN) approach classifies
a new observation according to the majority class g j y
of its k "most similar"
observations in the
t i i d t t training data set.
"k" is a parameter
selected during selected during
the training
process. Small p
values of k are
very sensitive to
local neighbors.
The 10-nearest neighbors procedure yields the The 10 nearest neighbors procedure yields the
above "classification frontier" for new observations.
Classification Tree
p = 1 / (1 + exp(-7 4349 + 19 839 * FICO)) p 1 / (1 + exp( 7.4349 + 19.839 FICO))
R
2
= 0.6357
Logistic Regression g g
Linear Regression Linear Regression
Comparison of Methods
10-Nearest Neighbors Error Report
Class # Cases # Errors % Error
1 120 15 12 50 1 120 15 12.50
0 280 6 2.14
Overall 400 21 5.25
Classification Tree Error Report
Class # Cases # Errors % Error
1 120 14 11.67
Logistic Regression Error Report
0 280 9 3.21
Overall 400 23 5.75
Logistic Regression Error Report
Class # Cases # Errors % Error
1 120 15 12.50
0 280 9 3 21 0 280 9 3.21
Overall 400 24 6.00
Advanced Classifiers
Neural Networks: simulation of an hypothesized
model of human learning: interconnected model of human learning: interconnected
neurons (nodes) that interact to transform
inputs (factors) into an output response inputs (factors) into an output response
(prediction or classification).
Discriminant Analysis: uses a linear function of Discriminant Analysis: uses a linear function of
factors (similar to regression analysis) to
score observations, separating , p g
(discriminating) observations based on
the "statistical distance" between an
observation and the centroid of a class.
Neural Networks
… … …
Input Hidden Output
Neural Network Process
1. Randomly initialize a weighted linear response
function (?
ki
) for each hidden and output node function (?
ki
) for each hidden and output node
k: S
k
= ?
k0
+ ?
k1
*X
1
+ … + ?
kn
* X
n
.
2. Compute the output for an observation: The
output from a node is its response function S
k
used in a selected transfer function, typically a
logistic / sigmoid function: O = 1 / (1 + e
-Sk
) logistic / sigmoid function: O
k
= 1 / (1 + e
-Sk
)
3. Adjust the weights for each node to reduce the
error in the prediction for that observation error in the prediction for that observation.
4. Repeat 2 and 3 for the training data.
5. Repeat 4 until: no improvement or acceptable
misclassification rate or maximum iterations.
2-Hidden Node Neural Network Error Report p
Class # Cases # Errors % Error
1 120 20 16.67
0 280 5 1 79 0 280 5 1.79
Overall 400 25 6.25
3-Hidden Node Neural Network Error Report p
Class # Cases # Errors % Error
1 120 20 16.67
0 280 5 1 79
25-Hidden Node Neural Network Error Report
0 280 5 1.79
Overall 400 25 6.25
Class # Cases # Errors % Error
1 120 18 15.00
0 280 7 2 50 0 280 7 2.50
Overall 400 25 6.25
Decile-Wise Lift Comparison (Testing Data)
Cl L i ti N l Di i i t Class
Tree
Logistic
Regr'n
Neural
Net
Discriminant
Analysis
Decile Mean Mean Mean Mean Decile Mean Mean Mean Mean
1 0.9400 0.8100 0.9200 0.7800
2 0.0228 0.0800 0.0500 0.1300
3 0.0041 0.0400 0.0200 0.0600
4 0.0041 0.0400 0.0100 0.0200
5 0.0041 0.0100 0.0100 0.0100
6 0.0041 0.0200 0.0100 0.0100
7 0 0041 0 0100 0 0000 0 0100 7 0.0041 0.0100 0.0000 0.0100
8 0.0041 0.0100 0.0000 0.0000
9 0.0113 0.0000 0.0000 0.0000 9 0.0113 0.0000 0.0000 0.0000
10 0.0211 0.0000 0.0000 0.0000
Clustering Techniques
Find groups of observations with a high degree
of intra group similarity and a low degree of
g q
of intra-group similarity and a low degree of
inter-group similarity. Understanding similarities
and differences among groups enables the and differences among groups enables the
development of different strategies relative to
those groups.
Such grouping techniques are widely applied in
k t t ti k t t t l i market segmentation, market structure analysis,
industry analysis, and portfolio analysis.
What is "Similarity?"
Quantitative (Normalized)
• Euclidean Distance (k-NN)
y
Euclidean Distance (k NN)
• Statistical Distance (Discriminant)
• Manhattan Distance (Square Block) ( q )
• Maximum Co-ordinate Distance
• Correlation (Inverse of Distance)
Qualitative
• Proportion of Matches (0 or 1)
• Proportion of Positive Matches (1 only)
Mixed Quantitative and Qualitative
• Gower's Similarity Measure
Collaborative Filtering
Movie
Rating
Gone With
the Wind Spaceballs
Star
Wars
Men in
Black Rating the Wind Spaceballs Wars Black
Sue
Naumi
10 N/R 10 2
Jane
Arnold
1 8 2 9
Joe
Beans
8 2 9 2
Isabel
Loud
2 9 1 8
Would Spaceballs be a good recommendation for Sue?
Collaborative Filtering
Movie
Rating
Gone With
the Wind
Star
Wars Spaceballs
Men in
Black
Sue
Naumi
10 10 N/R 2
Joe
Beans
8 9 2 2
Jane
Arnold
1 2 8 9
Isabel
Loud
2 1 9 8
Permute rows and columns to identify "affinity groups" (similar
segments). Spaceballs is not a good recommendation for Sue!
Recommendations
Product
Customer Customer
A d t i i ( l t i ) l ith t d fi Amazon uses data mining (clustering) algorithms to define
affinity groups based on purchases and stated preferences.
Optimization Techniques
Select the set of values for specified decision
variables that optimizes (maximizes or
p q
variables that optimizes (maximizes or
minimizes) an objective function subject to a
set of constraints. The objective function and set of constraints. The objective function and
constraints are specified in terms of the
decision variables and constants.
• Calculus
• Linear and Nonlinear Programming Linear and Nonlinear Programming
• Dynamic Programming
• Simulation and Numerical Methods
Extent Decisions
• Extent decisions deal with the allocation
of scarce resources that determine the of scarce resources that determine the
"production" level (output produced by an
activity). y)
• Of necessity the extent of one activity
impacts the extent of other activities impacts the extent of other activities
competing for those scarce resources.
• Scarce resources include money • Scarce resources include money
(capital), people (labor), facilities
(equipment), raw materials, etc. (equipment), raw materials, etc.
Beer or Ale?
Production Problem: At a price of $23 per barrel
for beer and $13 per barrel for ale a small brewery $ p y
can sell all of the beer and ale it can produce. A
barrel of beer requires 15 lbs of corn, 4 oz of hops,
and 20 lbs of barley malt. A barrel of ale requires 5
lbs of corn, 4 oz of hops, and 35 lbs of barley malt.
It currently has 480 lbs of corn 160 oz of hops It currently has 480 lbs of corn, 160 oz of hops,
and 1,190 lbs of barley malt in raw material
inventory (assume perishable). The corn was inventory (assume perishable). The corn was
purchased for $100, the barley malt for $300, and
the hops for $80. Production capacity is 100
barrels. What quantities should be produced?
Beer or Ale?
Analysis
What are the decisions?
What is the objective (costs and benefits)?
What are the constraints?
Beer is "more profitable" then ale ($23 vs. $13).
Why would a firm ever produce ale? y p
Beer or Ale?
Analysis
What are the decisions? What are the decisions?
Beer vs. Ale production
What is the objective (costs and benefits)? What is the objective (costs and benefits)?
Revenue = 23 Beer + 13 Ale
What are the constraints? What are the constraints?
Corn: 15 Beer + 5 Ale

Business Intelligence A Design Science A Design Science Perspective

Attachments