Introduction to Data Mining and Business Intelligence

Description
Data Mining and Business Intelligence

Introduction to
Data Mining and Business Intelligence
Lecture 1/DMBI/IKI83403T/MTI/UI
Yudho Giri Sucahyo, Ph.D, CISA ([email protected])
Faculty of Computer Science, University of Indonesia
Objectives
? Motivation: Why data mining?
? What is data mining?
? Understand the drivers for BI initiatives in modern
organizations
? Understand the structure, components, and process of BI
2
Motivation: Why data mining?
3
? Data explosion problem:
? Automated data collection tools and mature database
technology lead to tremendous amounts of data stored in
databases, data warehouses and other information repositories.
? We are drowning in data, but starving for knowledge!
? Data Mining:
? Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases [JH].
? Analysis of the large quantities of data that are stored in
computers [DO],
? Alternative names
? KDD, knowledge extraction, data archeology, information
harvesting, business intelligence, etc.
Data Rich but Information Poor
4
Databases are too big
Data Mining can help
discover knowledge
Evolution of Database Technology
5
? Data Collection, database creation, network DBMS
? Relational data model, relational DBMS implementation
? RDBMS, advanced data models (extended-relational, OO,
etc.) and application-oriented DBMS (spatial, scientific,
engineering, etc.)
? Data mining and data warehousing, multimedia databases,
and Web technology
Potential Applications
6
? See TSBD lecture notes – Data Mining
? See Chapter 1 of DO
? Retailing
? Banking
? Credit Card Management
? Insurance
? Telecommunications
? Telemarketing
? Human Resource Management
Data Mining Should Not be Used Blindly
7
? Data mining find regularities from history, but history is
not the same as the future.
? Association does not dictate trend nor causality.
? Some abnormal data could be caused by human.
Another view of BI
8
? BI is a broad field and it is viewed differently by different
people.
? Common agreement on major components:
? A centralized repository of data ?data warehouse
? An end-user set of tools to create reports and queries from
data and information and to analyze the data, information, and
reports ?business analytics
? To find non-obvious relationship among large amounts of data
?data mining, for text ?text mining, for web ?web mining
? Business Performance Management (BPM) to set goals as
metrics and standards and monitoring and measuring
performance by using the BI methodology.
Drivers of BI
9
? Organizations are being compelled to capture,
understand, and harness their data to support
decision making in order to improve business
operations
? Business cycle times are now extremely compressed;
faster, more informed, and better decision making is
therefore a competitive imperative
? Managers need the right information at the right time
and in the right place
? Case Study 1: BI success story at Toyota Motor
Company (Chapter 1 ET pg. 4-6).
Business Value of BI
10
Data Mining Functionality
11
? Association
? From association, correlation, to causality
? Finding rules like ?A -> B?
? Classification and Prediction
? Classify data based on the values ina classifying attribute
? Predict some unknown or missing attribute values based on other
information
? Cluster analysis
? Group data to form new classes, e.g., cluster houses to find distribution
patterns
? Outlier and exception data analysis
? Time series analysis (trend and deviation)
? Trend and deviation analysis: regression, sequential pattern, similiar
sequences e.g. Stock analysis
Sarbanes-Oxley Act of 2002
(extracted from Gartner, Inc., 2004)
12
? The Sarbanes-Oxley Act of 2002 mandates
drove one firm to implement a new financial
performance management system, capable of
meeting the new requirements to:
? Perform flawless analysis and compilation of
thousands of transactions and journal entries.
? Balance more access to data with the need to
control access to sensitive insider information.
? Deliver reports to the SEC in less time.
Sarbanes-Oxley Act of 2002
(extracted from Gartner, Inc., 2004) ... continued
13
? Within the overarching goal of achieving financial-reporting
compliance, these objectives included the following:
? Get more eyes on the data and KPI and build in strict security
controls
? Provide live reports that allow people to drill down to the lowest
level of transaction detail
? Proactively scour the financial databases for anomalies, using variance
triggers
? Gather all financial data into a cohesive database
? Complement accounting and budgeting applications for flexible
reporting, free-form investigation, and automated data analysis.
? BI can proactively alert specific individuals whenever an
anomay is detected.
Now let us see some screenshots.....
14
Dashboard
15
And another dashboard.....
16
And another dashboard....
17
Financial Reporting
18
Back to theory..... ?
19
KDD Process
20
Selection
Preprocessin
g
Transformation
Target Data
Data Mining
Interpretation/
Evaluation
Knowledge
Data
Steps of a KDD Process
21
? Learning the application domain:
? relevant prior knowledge and goals of application
? Creating a target data set: data selection
? Data cleaning and preprocessing: (may take 60% of effort!)
? Data reduction and projection:
? Find useful features, dimensionality/variable reduction, invariant
representation.
? Choosing functions of data mining
? summarization, classification, regression, association, clustering.
? Choosing the mining algorithm(s)
? Data mining: search for patterns of interest
? Interpretation: analysis of results.
? visualization, transformation, removing redundant patterns, etc.
? Use of discovered knowledge
Teradata Advanced Analytics Methodology
(similar to CRISP-DM)
22
Structure and Components of BI
23
Structure and Components of BI... continued
24
? Data Warehouse
? Data flows from operational systems (e.g., CRM,
ERP) to a DW, which is a special database or
repository of data that has been prepared to
support decision-making applications ranging from
those for simple reporting and querying to complex
optimization
? Business Analytics/OLAP
? Software tools that allow users to create on-
demand reports and queries and to conduct
analysis of data
Structure and Components of BI... continued
25
? Data Mining
? Data mining is a class of database information
analysis that looks for hidden patterns in a
group of data that can be used to predict future
behavior
? Used to replace or enhance human intelligence
by scanning through massive storehouses of data
to discover meaningful new correlations,
patterns, and trends, by using pattern
recognition technologies and advanced statistics
Structure and Components of BI... continued
26
? Business Performance Management (BPM)
? Based on the balanced scorecard methodology—
a framework for defining, implementing, and
managing an enterprise’s business strategy by
linking objectives with factual measures
? Dashboards
A visual presentation of critical data for
executives to view. It allows executives to see
hot spots in seconds and explore the situation
BI: Today and Tomorrow
27
? Recent industry analyst reports show that in the coming
years, millions of people will use BI visual tools and
analytics every day
? BI takes advantage of already developed and installed
components of IT technologies, helping companies
leverage their current IT investments and use valuable
data stored in legacy and transactional systems
? Some Issues:
? Mining information from heterogeneous databases and global
information systems
? Handling relational and complex types of data
? Efficiency and scalability of data mining algorithms

doc_512213687.pdf
 

Attachments

Back
Top