Description
Foundations of Business Intelligence Databases and Information Management
Foundat i ons of Busi ness
I nt el l i genc e: Dat abases and
I nf or mat i on Management
IM (Spring 2015)
2
Cont ent
• Problems of managing data resources in a
traditional file environment
• Capabilities and value of a database management
system
• Database design principles
• Tools and technologies for accessing information
from databases
• The role of information policy, data
administration, and data quality assurance
IM (Spring 2015)
3
• Database: Group of related files
• File: Group of records of same type
• Record: Group of related fields
• Field: Group of characters as word(s) or number
– Describes an entity (person, place, thing on which we
store information)
– Attribute: Each characteristic, or quality, describing
entity
•E.g., Attributes Date or Grade belong to entity COURSE
Fi l e or gani zat i on c onc ept s
IM (Spring 2015)
4
THE DATA HIERARCHY
IM (Spring 2015)
5
Pr obl ems w i t h t he t r adi t i onal f i l e
envi r onment
• Files maintained separately by different
departments
– Data redundancy:
•Presence of duplicate data in multiple files
– Data inconsistency:
•Same attribute has different values
– Program-data dependence:
•When changes in program requires changes to data
accessed by program
– Lack of flexibility
– Poor security
– Lack of data sharing and availability
IM (Spring 2015)
6
TRADITIONAL FILE PROCESSING
IM (Spring 2015)
7
Dat abase and DBMS
• Database
– Serves many applications by centralizing data and controlling
redundant data
• Database management system (DBMS)
– Interfaces between applications and physical data files
– Separates logical and physical views of data
– Solves problems of traditional file environment
• Controls redundancy
• Eliminates inconsistency
• Uncouples programs and data
• Enables organization to centrally manage data and data security
IM (Spring 2015)
8
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
IM (Spring 2015)
9
Rel at i onal DBMS
• Relational DBMS
– Represent data as two-dimensional tables called relations or files
– Each table contains data on entity and attributes
• Table: grid of columns and rows
– Rows (tuples): Records for different entities
– Fields (columns): Represents attribute for entity
– Key field: Field used to uniquely identify each record
– Primary key: Field in table used for key fields
– Foreign key: Primary key used in second table as look-up field to
identify records from original table
IM (Spring 2015)
10
RELATIONAL DATABASE TABLES
IM (Spring 2015)
11
RELATIONAL DATABASE TABLES (cont.)
IM (Spring 2015)
12
Oper at i ons of a Rel at i onal DBMS
• Three basic operations used to develop useful
sets of data
– SELECT: Creates subset of data of all records
that meet stated criteria
– JOIN: Combines relational tables to provide user
with more information than available in
individual tables
– PROJECT: Creates subset of columns in table,
creating tables with only the information specified
IM (Spring 2015)
13
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
IM (Spring 2015)
14
Ot her DBMS
• Object-Oriented DBMS (OODBMS)
– Stores data and procedures as objects
– Objects can be graphics, multimedia, J ava applets
– Relatively slow compared with relational DBMS for
processing large numbers of transactions
– Hybrid object-relational DBMS: Provide capabilities
of both OODBMS and relational DBMS
• Databases in the cloud
– Typically less functionality than on-premises DBs
– Amazon Web Services, Microsoft SQL Azure
IM (Spring 2015)
15
Capabi l i t i es of Dat abase Management
Syst ems
• Data definition capability: Specifies structure of
database content, used to create tables and define
characteristics of fields
• Data dictionary: Automated or manual file storing
definitions of data elements and their characteristics
• Data manipulation language: Used to add, change,
delete, retrieve data from database
– Structured Query Language (SQL)
– Microsoft Access user tools for generation SQL
• Many DBMS have report generation capabilities for
creating polished reports (Crystal Reports)
IM (Spring 2015)
16
MICROSOFT ACCESS DATA DICTIONARY FEATURES
IM (Spring 2015)
17
EXAMPLE OF AN SQL QUERY
IM (Spring 2015)
18
AN ACCESS QUERY
IM (Spring 2015)
19
Dat abase Desi gn
• Designing Databases
– Conceptual (logical) design: Abstract model from business
perspective
– Physical design: How database is arranged on direct-access storage
devices
• Design process identifies
– Relationships among data elements, redundant database elements
– Most efficient way to group data elements to meet business
requirements, needs of application programs
• Normalization
– Streamlining complex groupings of data to minimize redundant data
elements and awkward many-to-many relationships
IM (Spring 2015)
20
AN UNNORMALIZED RELATION FOR ORDER
IM (Spring 2015)
21
NORMALIZED TABLES CREATED FROM ORDER
IM (Spring 2015)
22
E-R Di agr am
• Entity-relationship diagram
– Used by database designers to document the data
model
– Illustrates relationships between entities
• Distributing databases: Storing database in more
than one place
– Partitioned: Separate locations store different parts of
database
– Replicated: Central database duplicated in entirety at
different locations
IM (Spring 2015)
23
AN ENTITY?RELATIONSHIP DIAGRAM
IM (Spring 2015)
24
Handl i ng Lar ge Dat abase
• Very large databases and systems require special
capabilities, tools
– To analyze large quantities of data
– To access data from multiple systems
• Three key techniques
1.Data warehousing
2.Data mining
3.Tools for accessing internal databases through the
Web
IM (Spring 2015)
25
Dat a War ehouse & Dat a Mar t s
• Data warehouse:
– Stores current and historical data from many core operational
transaction systems
– Consolidates and standardizes information for use across
enterprise, but data cannot be altered
– Data warehouse system will provide query, analysis, and
reporting tools
• Data marts:
– Subset of data warehouse
– Summarized or highly focused portion of firm’s data for use
by specific population of users
– Typically focuses on single subject or line of business
IM (Spring 2015)
26
COMPONENTS OF A DATA WAREHOUSE
IM (Spring 2015)
27
Busi ness I nt el l i genc e
• Tools for consolidating, analyzing, and providing
access to vast amounts of data to help users make
better business decisions
• E.g., Harrah’s Entertainment analyzes customers
to develop gambling profiles and identify most
profitable customers
• Principle tools include:
– Software for database query and reporting
– Online analytical processing (OLAP)
– Data mining
IM (Spring 2015)
28
Onl i ne anal yt i c al pr oc essi ng (OLAP)
• Supports multidimensional data analysis
– Viewing data using multiple dimensions
– Each aspect of information (product, pricing, cost,
region, time period) is different dimension
– E.g., how many washers sold in the East in J une
compared with other regions?
• OLAP enables rapid, online answers to ad hoc
queries
IM (Spring 2015)
29
MULTIDIMENSIONAL DATA MODEL
IM (Spring 2015)
30
Dat a Mi ni ng
• More discovery driven than OLAP
• Finds hidden patterns, relationships in large databases and
infers rules to predict future behavior
• E.g., Finding patterns in customer data for one-to-one
marketing campaigns or to identify profitable customers.
• Types of information obtainable from data mining
– Associations
– Sequences
– Classification
– Clustering
– Forecasting
IM (Spring 2015)
31
Pr edi c t i ve Anal ysi s & Tex t Mi ni ng
• Predictive analysis
– Uses data mining techniques, historical data, and
assumptions about future conditions to predict
outcomes of events
– E.g., Probability a customer will respond to an offer
• Text mining
– Extracts key elements from large unstructured data
sets (e.g., stored e-mails)
IM (Spring 2015)
32
Web Mi ni ng
• Discovery and analysis of useful patterns and
information from WWW
– E.g., to understand customer behavior, evaluate
effectiveness of Web site, etc.
• Web content mining
– Knowledge extracted from content of Web pages
• Web structure mining
– E.g., links to and from Web page
• Web usage mining
– User interaction data recorded by Web server
IM (Spring 2015)
33
Dat abases and t he Web
• Many companies use Web to make some internal
databases available to customers or partners
• Typical configuration includes:
– Web server
– Application server/middleware/CGI scripts
– Database server (hosting DBM)
• Advantages of using Web for database access:
– Ease of use of browser software
– Web interface requires few or no changes to database
– Inexpensive to add Web interface to system
IM (Spring 2015)
34
LINKING INTERNAL DATABASES TO THE WEB
IM (Spring 2015)
35
Est abl i shi ng an i nf or mat i on pol i c y
• Firm’s rules, procedures, roles for sharing, managing,
standardizing data
• Data administration:
– Firm function responsible for specific policies and procedures
to manage data
• Data governance:
– Policies and processes for managing availability, usability,
integrity, and security of enterprise data, especially as it relates
to government regulations
• Database administration:
– Defining, organizing, implementing, maintaining database;
performed by database design and management group
IM (Spring 2015)
36
Ensur i ng dat a qual i t y
• More than 25% of critical data in Fortune 1000
company databases are inaccurate or incomplete
• Most data quality problems stem from faulty
input
• Before new database in place, need to:
– Identify and correct faulty data
– Establish better routines for editing data once
database in operation
IM (Spring 2015)
37
Dat a Qual i t y
• Data quality audit:
– Structured survey of the accuracy and level of
completeness of the data in an information system
•Survey samples from data files, or
•Survey end users for perceptions of quality
• Data cleansing
– Software to detect and correct data that are incorrect,
incomplete, improperly formatted, or redundant
– Enforces consistency among different sets of data
from separate information systems
doc_560111096.pdf
Foundations of Business Intelligence Databases and Information Management
Foundat i ons of Busi ness
I nt el l i genc e: Dat abases and
I nf or mat i on Management
IM (Spring 2015)
2
Cont ent
• Problems of managing data resources in a
traditional file environment
• Capabilities and value of a database management
system
• Database design principles
• Tools and technologies for accessing information
from databases
• The role of information policy, data
administration, and data quality assurance
IM (Spring 2015)
3
• Database: Group of related files
• File: Group of records of same type
• Record: Group of related fields
• Field: Group of characters as word(s) or number
– Describes an entity (person, place, thing on which we
store information)
– Attribute: Each characteristic, or quality, describing
entity
•E.g., Attributes Date or Grade belong to entity COURSE
Fi l e or gani zat i on c onc ept s
IM (Spring 2015)
4
THE DATA HIERARCHY
IM (Spring 2015)
5
Pr obl ems w i t h t he t r adi t i onal f i l e
envi r onment
• Files maintained separately by different
departments
– Data redundancy:
•Presence of duplicate data in multiple files
– Data inconsistency:
•Same attribute has different values
– Program-data dependence:
•When changes in program requires changes to data
accessed by program
– Lack of flexibility
– Poor security
– Lack of data sharing and availability
IM (Spring 2015)
6
TRADITIONAL FILE PROCESSING
IM (Spring 2015)
7
Dat abase and DBMS
• Database
– Serves many applications by centralizing data and controlling
redundant data
• Database management system (DBMS)
– Interfaces between applications and physical data files
– Separates logical and physical views of data
– Solves problems of traditional file environment
• Controls redundancy
• Eliminates inconsistency
• Uncouples programs and data
• Enables organization to centrally manage data and data security
IM (Spring 2015)
8
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
IM (Spring 2015)
9
Rel at i onal DBMS
• Relational DBMS
– Represent data as two-dimensional tables called relations or files
– Each table contains data on entity and attributes
• Table: grid of columns and rows
– Rows (tuples): Records for different entities
– Fields (columns): Represents attribute for entity
– Key field: Field used to uniquely identify each record
– Primary key: Field in table used for key fields
– Foreign key: Primary key used in second table as look-up field to
identify records from original table
IM (Spring 2015)
10
RELATIONAL DATABASE TABLES
IM (Spring 2015)
11
RELATIONAL DATABASE TABLES (cont.)
IM (Spring 2015)
12
Oper at i ons of a Rel at i onal DBMS
• Three basic operations used to develop useful
sets of data
– SELECT: Creates subset of data of all records
that meet stated criteria
– JOIN: Combines relational tables to provide user
with more information than available in
individual tables
– PROJECT: Creates subset of columns in table,
creating tables with only the information specified
IM (Spring 2015)
13
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
IM (Spring 2015)
14
Ot her DBMS
• Object-Oriented DBMS (OODBMS)
– Stores data and procedures as objects
– Objects can be graphics, multimedia, J ava applets
– Relatively slow compared with relational DBMS for
processing large numbers of transactions
– Hybrid object-relational DBMS: Provide capabilities
of both OODBMS and relational DBMS
• Databases in the cloud
– Typically less functionality than on-premises DBs
– Amazon Web Services, Microsoft SQL Azure
IM (Spring 2015)
15
Capabi l i t i es of Dat abase Management
Syst ems
• Data definition capability: Specifies structure of
database content, used to create tables and define
characteristics of fields
• Data dictionary: Automated or manual file storing
definitions of data elements and their characteristics
• Data manipulation language: Used to add, change,
delete, retrieve data from database
– Structured Query Language (SQL)
– Microsoft Access user tools for generation SQL
• Many DBMS have report generation capabilities for
creating polished reports (Crystal Reports)
IM (Spring 2015)
16
MICROSOFT ACCESS DATA DICTIONARY FEATURES
IM (Spring 2015)
17
EXAMPLE OF AN SQL QUERY
IM (Spring 2015)
18
AN ACCESS QUERY
IM (Spring 2015)
19
Dat abase Desi gn
• Designing Databases
– Conceptual (logical) design: Abstract model from business
perspective
– Physical design: How database is arranged on direct-access storage
devices
• Design process identifies
– Relationships among data elements, redundant database elements
– Most efficient way to group data elements to meet business
requirements, needs of application programs
• Normalization
– Streamlining complex groupings of data to minimize redundant data
elements and awkward many-to-many relationships
IM (Spring 2015)
20
AN UNNORMALIZED RELATION FOR ORDER
IM (Spring 2015)
21
NORMALIZED TABLES CREATED FROM ORDER
IM (Spring 2015)
22
E-R Di agr am
• Entity-relationship diagram
– Used by database designers to document the data
model
– Illustrates relationships between entities
• Distributing databases: Storing database in more
than one place
– Partitioned: Separate locations store different parts of
database
– Replicated: Central database duplicated in entirety at
different locations
IM (Spring 2015)
23
AN ENTITY?RELATIONSHIP DIAGRAM
IM (Spring 2015)
24
Handl i ng Lar ge Dat abase
• Very large databases and systems require special
capabilities, tools
– To analyze large quantities of data
– To access data from multiple systems
• Three key techniques
1.Data warehousing
2.Data mining
3.Tools for accessing internal databases through the
Web
IM (Spring 2015)
25
Dat a War ehouse & Dat a Mar t s
• Data warehouse:
– Stores current and historical data from many core operational
transaction systems
– Consolidates and standardizes information for use across
enterprise, but data cannot be altered
– Data warehouse system will provide query, analysis, and
reporting tools
• Data marts:
– Subset of data warehouse
– Summarized or highly focused portion of firm’s data for use
by specific population of users
– Typically focuses on single subject or line of business
IM (Spring 2015)
26
COMPONENTS OF A DATA WAREHOUSE
IM (Spring 2015)
27
Busi ness I nt el l i genc e
• Tools for consolidating, analyzing, and providing
access to vast amounts of data to help users make
better business decisions
• E.g., Harrah’s Entertainment analyzes customers
to develop gambling profiles and identify most
profitable customers
• Principle tools include:
– Software for database query and reporting
– Online analytical processing (OLAP)
– Data mining
IM (Spring 2015)
28
Onl i ne anal yt i c al pr oc essi ng (OLAP)
• Supports multidimensional data analysis
– Viewing data using multiple dimensions
– Each aspect of information (product, pricing, cost,
region, time period) is different dimension
– E.g., how many washers sold in the East in J une
compared with other regions?
• OLAP enables rapid, online answers to ad hoc
queries
IM (Spring 2015)
29
MULTIDIMENSIONAL DATA MODEL
IM (Spring 2015)
30
Dat a Mi ni ng
• More discovery driven than OLAP
• Finds hidden patterns, relationships in large databases and
infers rules to predict future behavior
• E.g., Finding patterns in customer data for one-to-one
marketing campaigns or to identify profitable customers.
• Types of information obtainable from data mining
– Associations
– Sequences
– Classification
– Clustering
– Forecasting
IM (Spring 2015)
31
Pr edi c t i ve Anal ysi s & Tex t Mi ni ng
• Predictive analysis
– Uses data mining techniques, historical data, and
assumptions about future conditions to predict
outcomes of events
– E.g., Probability a customer will respond to an offer
• Text mining
– Extracts key elements from large unstructured data
sets (e.g., stored e-mails)
IM (Spring 2015)
32
Web Mi ni ng
• Discovery and analysis of useful patterns and
information from WWW
– E.g., to understand customer behavior, evaluate
effectiveness of Web site, etc.
• Web content mining
– Knowledge extracted from content of Web pages
• Web structure mining
– E.g., links to and from Web page
• Web usage mining
– User interaction data recorded by Web server
IM (Spring 2015)
33
Dat abases and t he Web
• Many companies use Web to make some internal
databases available to customers or partners
• Typical configuration includes:
– Web server
– Application server/middleware/CGI scripts
– Database server (hosting DBM)
• Advantages of using Web for database access:
– Ease of use of browser software
– Web interface requires few or no changes to database
– Inexpensive to add Web interface to system
IM (Spring 2015)
34
LINKING INTERNAL DATABASES TO THE WEB
IM (Spring 2015)
35
Est abl i shi ng an i nf or mat i on pol i c y
• Firm’s rules, procedures, roles for sharing, managing,
standardizing data
• Data administration:
– Firm function responsible for specific policies and procedures
to manage data
• Data governance:
– Policies and processes for managing availability, usability,
integrity, and security of enterprise data, especially as it relates
to government regulations
• Database administration:
– Defining, organizing, implementing, maintaining database;
performed by database design and management group
IM (Spring 2015)
36
Ensur i ng dat a qual i t y
• More than 25% of critical data in Fortune 1000
company databases are inaccurate or incomplete
• Most data quality problems stem from faulty
input
• Before new database in place, need to:
– Identify and correct faulty data
– Establish better routines for editing data once
database in operation
IM (Spring 2015)
37
Dat a Qual i t y
• Data quality audit:
– Structured survey of the accuracy and level of
completeness of the data in an information system
•Survey samples from data files, or
•Survey end users for perceptions of quality
• Data cleansing
– Software to detect and correct data that are incorrect,
incomplete, improperly formatted, or redundant
– Enforces consistency among different sets of data
from separate information systems
doc_560111096.pdf