Datawarehousing and Business Intelligence

Description
Datawarehousing and Business Intelligence








Datawarehousing
and
Business Intelligence



Vannaratana (Bee) Praruksa
March 2001














Report for the course component
Datawarehousing and OLAP
MSc in Information Systems Development
Academy of Communication and Information Technology
HAN University of Applied Science

Final editing by Guido Bakema
September 2002

Vannaratana Praruksa Data Warehousing & Business Intelligence

1
Vannaratana Praruksa Data Warehousing & Business Intelligence

2
Data Warehousing and Business Intelligence

1. Introduction

In the current business environment, characterized by increased competition and rapidly
changing demands, improvement of the decision-support process can make the difference
between a business prospering or going under.

Information systems solutions need to be adapted in order to be able to comply with these
growing needs of decision-support. In addition to a new decision-support mentality that
must be implemented at all levels of the company, also new decision-support technology has
an active role to play. For achieving this, data warehousing nowadays is a vital element for
acquiring and analyzing business data, whether derived from internal production systems or
external sources, including call centers, help desks, web sites or market information sources.
This involves also new techniques for providing in a very flexible way the required
information that is needed for the decision process as accurate as possible, at the desired
level of aggregation and in the desired format. Decision-support then provides business
intelligence. Such new decision-support solutions are the key link in adding value to
ordinary transactional systems by transforming the transactional data into useful, intelligent
information.

The aim of this document is to show why new decision-support technology and business
intelligence is a major issue for companies. By giving some concrete examples, the
advantages that can be expected from a modern decision-support solution are illustrated.


2. Decision-support solutions

In many companies huge volumes of potentially valuable internal and external data exist,
but this information is hardly available for use to the manager who must make the decisions,
although many companies have done much effort and made costs to implement a decision-
support solution based on their current transactional systems. However, many companies
still have problems, for example monthly sales reports are not available in time or not
detailed enough, salesman can not immediately access information for prospective clients
and managers still lack needed information to support their business decisions. IS
departments have to dedicate people to service the ad hoc information needs – translating
business questions into database queries and formatting reports, and salesman and managers
spend their time on tedious, low-return activities: sifting through reports, re-keying data into
spreadsheets, manually manipulating charts, etc.

To provide adequate information systems to decision-support, some interrelated problems
must be overcome:
• Data is not information: Data is only one of the required ingredients in the recipe of
information. To deliver information, we must, at a minimum, provide relevance and
context in addition to data.
Vannaratana Praruksa Data Warehousing & Business Intelligence

3
• Information is not understanding: It is important to realize that merely exposing
decision persons to information does not necessarily provide the value that we want.
The available information has to make the journey into the mind of the decision
person and take up residence.
• Analysis is iterative: The decision person lacks the technical skills required to
interact directly with complex operational database systems, IS must act as the
intermediary to translate business questions into database queries, and query results
into reports and this process takes time depending on the organization and the
question.
• Business is dynamic: Business is competitive. Competition fuels ingenuity, and
ingenuity drives change. The data that represents your business is dynamic. It
changes as the business grows. It changes as the operational data changes. Change
makes the information needs of decision persons (manager) dynamic because what
is important to them changes with time.

The internal and external data becomes the kind of enterprise intelligence that provides
valuable guidance in the decision-making process, which adds considerable value to any
business. To reach the power of these data without the troubles as before, companies decide
for new decision-support solutions. Therefor companies can decide to adopt modern
decision-support solutions that make the information easily accessible by all the users who
need it. These new decision-support solutions may have different architectures, according to
the needs they are supposed to meet and the technologies available. Very important
nowadays is data warehousing. See figure 2.1.












Data warehouse and data marts:
• Data warehouse: An enterprise-wide data warehouse enables the centralized storage
of all data of a company. It is based on the adoption of a common data model for the
company’s activities and tasks, and their mutual coherence, unlike a data mart that
is confined to a single business function. The objective of data warehousing is to
derive maximum benefit from the data.
• Data mart: A data mart is a data warehouse dedicated to a given activity, such as
marketing, made available to users in a department or work-group. A data mart is
Enterprise-wide
data warehouse only
Independent
data marts
Enterprise-wide data warehouse
+ dependent data marts
Figure 2.1: Different architectures for decision-support systems
Vannaratana Praruksa Data Warehousing & Business Intelligence

4
not characterized by its size. Two types of data marts can be installed. A dependent
data mart is associated with the enterprise-wide data warehouse, from which it takes
data to meet the specific requirements of a particular department. An independent
data mart is sometimes called a departmental data warehouse, because it can be
conceived as just a small data warehouse for one department only.

A more sophisticated decision-support solution is data mining that consists of extracting
information from a set of data by finding unknown relationships within them. It calls upon
varied mathematical and statistical techniques that use tools such as decision trees, neural
networks, or knowledge-based systems. Data mining allows users to analyze large database
to solve business decision problems. For example, consider a catalog retailer who needs to
decide who to send a new catalog to. The information incorporated into the customer
relationship management process is the historical database of previous mailing and the
features associated with the potential customers, such as age, zip code, their response in the
past, etc. This information is used to build a model of customer behavior that could be used
to predict which customers would be likely to respond to the new catalog.

This leads to the following general characteristics of data mining:
• Discovering unknown associations. For example, beer buyers are likely to purchase
peanuts.
• Sequences, where one event leads to another later event. For example, customers
who purchase curtains are likely to come back to purchase rugs from the same store.
• Recognizing patterns which lead to classification, or new organization of data. For
example, certain profiles are established for customers based on what they purchase.
• Finding groups of facts not previously known. This process is known as event
clustering.
• Forecasting, or simply discovering patterns in the data that can lead to predictions
about the future.


3. Datawarehousing versus OLTP

A data warehouse contains in a single location all enterprise-data that could be useful for
decision-support. These data may be derived from production systems or from external
sources: market research, consumer behavior studies, or opinion surveys, etc. The data will
be transformed, cleaned and stored in a dedicated database that is independent of the
production databases. This separation is needed because the objectives and use of
production systems and decision-support systems are quite different.

Data warehouses differ in their purpose and design from production systems, also called
online transactional processing (OLTP) systems. Production systems are designed for
running pre-defined processes, the evolution of which is fairly limited. An OLTP system
contains data needed for running the day to day operations of a business and therefor must
in principle be up-to-date at any time.

Vannaratana Praruksa Data Warehousing & Business Intelligence

5
A data warehouse contains data that is used for analyzing the business. As a consequence a
data warehouse will contain time fixed snapshots of the enterprise data, because in general
decisions can best be based on overviews and comparisons of business data at moments that
are fixed in time: end of every week, month, quarter and year. An OLTP system is designed
and optimized for data entry and update transactions, whereas a data warehouse is optimized
for data retrieval and reporting: it is usually a read only system.

Apart from this time fixing aspect, for decision-support purposes historical information is
indispensable and must be as easily accessible as current data. Furthermore, a data
warehouse must provide the ability to access data at various levels (aggregate or detail) and
to launch complex ad hoc queries.

If we compare a data mart with a data warehouse, we will discover that they have similar
functions but the size of data mart is a lot smaller and the group of users is smaller as well.
In principle for any department a data mart can be designed that is adapted for the
department specific decision-support information needs only. The data mart can contain
additional domain specific information for the department. Data marts cost less time and
money to build and design can be much more flexible.

The main goal of data warehousing (i.e. by means of a data warehouse and/or data marts, the
smaller department level equivalents), is to improve the productivity of corporate decision
making. This is achieved by transformation and integration of operational data and
providing a consistent view of an enterprise at any complexity level, without affecting the
security and performance of existing systems. As a consequence, managers can analyze their
department’s operational costs more effectively and supervisors can better control their
supply chain.


4. On-line Analytical Processing

How is this achieved at the user level, i.e. by the decision-making managers and
supervisors? The answer is: On-line Analytical Processing (OLAP). OLAP-tools are
specialized tools for retrieving decision-support information from data warehouses and data
marts, to present the information in a desired format, and – very important – to change from
lower detail to a more global (i.e. aggregated) level and vice-versa in a very flexible way.

OLAP enables analyst, managers, and executives to gain insight into data through fast,
consistent, interactive access to a wide variety of possible views of information that can be
presented in different formats. OLAP transforms raw data to useful information so that it
reflects the real factors affecting or enhancing the line of business of the enterprise. OLAP
and data warehousing complement each other. The data warehouse / data mart stores and
manages the data, while OLAP converts the stored data into useful information. The main
strengths of OLAP-tools are their ability to dynamically present reports and to look at the
same kinds of information at different levels at the same time. OLAP techniques may range
from simple navigation and browsing of the data, to more serious analyses, such as time-
series and complex modeling.
Vannaratana Praruksa Data Warehousing & Business Intelligence

6
Nigel Pendse and Richard Creeth in their OLAP Report fedine OLAP applications as
applications that should deliver Fast Analysis of Shared Multidimensional Information
(FASMI):
• Fast: The user of these applications is an interactive user who expects the delivery
of the information they require at a fairly constant rate. Most queries should be
delivered to the user in five seconds or less.
• Analysis: OLAP applications should perform basic numerical and statistical analysis
of the data. These calculations could be pre-defined by the application developer, or
defined by the user as ad hoc queries.
• Shared: The data delivered by OLAP applications should be shared across a large
user population. This stresses the need of implementation of the security
requirements necessary for keeping the data confidential and safe.
• Multidimensional: OLAP applications are based on multidimensional databases,
which is an essential characteristic of OLAP. For example, in multi-dimensional
analysis, data entities such as products, geographies, time periods, and sales
channels may all represent different dimensions. Actual sales revenues and
forecasted revenues may represent measurements to analyze.
• Information: OLAP applications should be able to access all the data and
information necessary and relevant for the application. The data may be located in
different sources and be large in volume.

The importance of OLAP applications is the ability to provide information as needed. OLAP
applications always have the following key features:
• Business models are multidimensional in nature so multidimensional views (data
cubes) are inherently the best way of representing an actual business model. Data
cubes can have as many dimensions as needed for representing the business model
properly.
• Using OLAP applications, managers should be able to analyze potentially large
amounts data across any dimension, at any level of aggregation, with ease and very
fast response times. This is done by slicing and dicing through the data and drilling
down or rolling up through various dimensions as defined by the data structure. In
this way it is possible to quickly identify trends or problem areas that would
otherwise be missed.
• Calculation-intensive capabilities. Data cubes are used to collect and manage data,
while OLAP-tools are used to create information from the collected data that may
lead to new knowledge. It is the ability to perform complex calculations such as
Figure 4.1: A data cube

Each point in the cube represents a
particular combination of measures
for Product, Market and Time.
Vannaratana Praruksa Data Warehousing & Business Intelligence

7
moving averages and percent growth by the OLAP-applications that allows for
successful transfer of the raw data to information, and later to knowledge. OLAP
must be able to do more than simple aggregation. An example of complex
calculations that can be done with modern OLAP-tools is allocation and trend
analysis, computation of the forecast, etc.
• Time intelligence. Time is an integral component of almost any analytical
application. It used to compare and judge performance of a business process, like
‘year to date’ and ‘period over period comparisons’. For example, a manager might
ask to see the sales for August or the sales for the first five months of 1998. The
same manager might also ask to see the sales for the sport shoe but would never ask
to see the sales for the first five shoes.

Benefits from OLAP:
• OLAP-tools can improve the productivity of the whole organization by focusing on
what is essential for its growth, and by transferring the responsibility of the analysis
to the operational parts of the organization. For example, a product manager to view
sales figures for a given product at the national level, see them broken down by
division. He can drill down to see figures for territories within a division, check
sales numbers for each store in a territory and then compare them against sales of
stores from different territory-all at the push of a button of the OLAP application.
• Developers also benefit from OLAP. By using software specifically designed for
OLAP, developers can deliver applications to business users faster, providing better
service that in turn allows the developers to build more applications.
• OLAP will improve the OLTP systems performance, due to the reduced network
traffic and elimination of long queries to the OLTP database.

To illustrate the benefits of OLAP, we show the using OLAP for each department in an
organization:
• The finance and accounting department can use OLAP-tools for budgeting
applications, activity-based costing (allocation), financial performance analyses and
financial modeling. The financial department can determine the next year’s budget
to accurately reflect the expenses of the organization and avoid budget deficits. The
department can also use its analyses to reveal weak points in the business that
should be eliminated, and points of strength that should be given more focus.
• The sales department, on the other hand, can use OLAP-tools to build sales analysis
and forecasting applications. These applications help the sales department to realize
the best sales techniques and the products that will sell better than others.
• The marketing department may use OLAP-tools for market research analysis, sales
forecasting, promotion analysis, customer analysis, and market/customer
segmentation. These applications will reveal the best markets and the markets that
don’t yield good returns. They will also help device where a given product can be
marketed versus another product. For instance, it is wise to market products used by
a certain segment of society in areas where people belonging to this segment are
located.
• The production department can use the OLAP-application for production planning
and defect analysis.
Vannaratana Praruksa Data Warehousing & Business Intelligence

8
5. Data Transformation

ETL (Extraction, Transformation and Loading) tools retrieve data from different transaction
processing systems and send them to the data warehouse database. The data may originate
from varied systems that in the company can be found in various formats. In the first place
data from the company’s existing production systems, running on mainframes or on open
systems such as Unix or Windows NT. These systems, heterogeneous in most cases, host
business applications that were often developed years ago. Data may originate from any
other data source as well, like sequential tapes (especially historical data).











ETL can represent a large percentage of the implementation costs of a decision support
solution. ETL tools perform the following tasks:
• Modeling: This phase corresponds to the creation of the plan, which logically
structures the data.
• Extraction: Relevant data must be extracted from source systems.
• Description: To ensure the coherence of data and its readability, the data is given a
consistent description that is stored in the company repository.
• Transformation: Data need to be cleaned and converted to a common format.
• Validation: Data pass various checks to ensure coherence, to eliminate duplication,
etc.
• Loading: Following the various cleaning phases, data is loaded in the data
warehouse DBMS.

6. Proclarity

As Business Intelligence applications proliferate, organizations must consider the platforms
on which they are built. A Business Intelligence platform offers a complete set of tools for
the creation, deployment, support and maintenance of Business Intelligence applications.
These are data-rich applications, with custom end-user interfaces, organized around specific
business problems, with targeted analyses and models.
Extraction Integration Conversion Preparation Cleaning
Data Warehouse
Figure 5.1: The ETL process
Data sources
Data transformation
Data extraction
Data loading
Vannaratana Praruksa Data Warehousing & Business Intelligence

9

The Proclarity OLAP Client is a Business Intelligence application that can be used to reduce
development and deployment time of highly scalable custom analytical applications. It is a
dynamic business analysis tool developed around Microsoft SQL Server and Analysis
Services. ProClarity provides intuitive desktop access to powerful analytic and data-
visualization capabilities that significantly reduce the time required for decision-makers to
explore and understand the key metrics driving our business.

The advantages of Proclarity’s OLAP Client:
• Intuitive and interactive grid and suite of standard business charts. It makes users
feel less like they're working with analysis tools and more like they're connected
directly to the source of their business information.
• Requires no new technical knowledge. Users employ basic browser navigation skills
to surf their data, resulting in dramatically lower training cost.
• Quickly identifies trend patterns and exceptions while isolating causal relationships.
(See figures 6.2, 6.3).
• Provides control of the context and relevancy of business data, such as Decomp and
Perspective provide greater understanding of complex business information. To
analyze and compare the relationship between two sets of measures for a particular
dimension the perspective view is used. In contrast with most data analysis tools that
can work only with limited quantities of data, prospective enables analysis of large
volumes of data. You can see the relationships those data points have with each
other. (See figure 6.4)
• Integration of third-party visualization tools for specialized data visualization, geo-
spatial mapping and forecasting future improve data access.
• Browser-inspired interface augmented with multiple wizards reduces training time.
• The multi-dimensional, menu-driven approach enables setting up virtually any
combination of variables and exhibiting their effects on the dependent outcome.
• Users will customize reports to effectively communicate their findings because the
ProClarity application is closely developed with Microsoft, graphical outcomes are
particularly customizable within Office Applications.
• Radically simplifies the exploration of complex data.
• Decreases traditional reporting loads.
• Reduces implementation risk.
• Open component nature and reusable.
• Users are free to carry ‘live’ data and associated display techniques back and forth
between the Microsoft Office tools. For example additional queries are made real-
time from within the PowerPoint presentation to answer on-the-spot questions. Time
isn't wasted mastering complex graphical interfaces or relying on cut and paste
artwork to convey important business information. (See figure 6.1)
Vannaratana Praruksa Data Warehousing & Business Intelligence

10

Figure 6.1: An example of PowerPoint Presentation with Proclarity tool

Proclarity’s Decomposition Tree visualizes the entire drill-down path while assessing the
ranking and relevant of any data point in any direction. In this example, you realize that
North America is the biggest market of PC’s product and more than half of all products sold
to North America. When you want to see more detail of each item (rectangle, line graph)
you just point at that item. (See figure 6.2)
Vannaratana Praruksa Data Warehousing & Business Intelligence

11

Figure 6.2: An example of Decomposition Tree Graph

To spot trends and to make comparisons, you can choose Chart View for analyzing and
varying the display of data. (See figure 6.3).



Figure 6.3: An example of Chart View
Vannaratana Praruksa Data Warehousing & Business Intelligence

12


Figure 6.4: An example of Perspective


7. Conclusion

The growth of data warehousing is going to be enormous with new products and
technologies. A data warehouse environment can offer great benefits to the enterprise,
allowing it to reduce costs and build new business. It will more useful if approached in the
correct way and the planners and developers has a clear idea of their purpose and choose
strategies and methods that will provide the organization with performance and flexibility
for long-term information environments. Nevertheless the technology is worthless unless
there is the right management to guide it to success.


8. References

1. Ralph Kimball, The Data Warehouse Toolkit, John Wiley & Sons Inc., 1996.
2. Rob Mattison, Data Warehousing: Strategies, Tools and Techniques, McGraw-Hill Inc.,
1996.
3. Guido Bakema, Handouts from the course component Datawarehousing and OLAP,
HAN University, 2002.
4. Information from Internet, for instance:
http://www.olapreport.com/DatabaseExplosion.htm

Vannaratana Praruksa Data Warehousing & Business Intelligence

13
Appendix1: Glossary

Some terms used within the Data Warehousing community.

Aggregates
Pre-calculated and pre-stored summaries that are stored in the data warehouse to
improve query performance. For example, for every VAR there might be a pre-stored
summary detailing the total number of licenses, purchased per VAR, each month.

Business Intelligence Tools
Those client products which typically reside at client side that are the decision support
systems (DSS) to the warehouse. These products provide the user with a method of
looking and manipulating the data.

Data Extraction, Acquisition
The process of copying data from a legacy or production system in order to load it into
a warehouse.

Data Mart
Separate, smaller warehouses typically defined along organization's departmental
needs. This selectivity of information results in greater query performance and
manageability of data. A collection of data marts (functional warehouses) for each of
the organizations business functions can be considered as an enterprise warehousing
solution.

Data Mining
"...a collection of powerful analysis techniques for making sense out of very large
datasets." - R.Kimball

Data Modeling
The process of changing the format of production data to make it usable for heuristic
business reporting. It also serves as a road map for the integration of data sources into
the data warehouse.

Data Staging
"The data staging area is the data warehouse workbench. It is the place where raw data
is brought in, cleaned, combined, archived, and eventually exported to one or more
data marts." -R.Kimball

Data Warehouse
Architecture for putting data within reach of business intelligence systems. This is
data from a production system that now resides on a different machine, to be used
strictly for business analysis and querying, allowing the production machine to handle
mostly data input.

Vannaratana Praruksa Data Warehousing & Business Intelligence

14
Data Transformation
Performed when data is extracted from the operational systems, including integrating
dissimilar data types and processing calculations.

Database Gateway
Used to extract or pass data between dissimilar databases or systems. This middle
ware component is the front-end component prior to the transformation tools.

Drill Down
The process of navigating from a top-level view of overall sales down through the
sales territories, down to the individual sales person level. This is a more intuitive way
to obtain information at the detail level.

DSS (Decision Support Systems)
Business intelligence tools that are utilize data to form the systems that support the
business decision making process of the organization.

EIS (Executive Information Systems)
These are business intelligence tools that are aimed at less sophisticated users, who
want to look at complex information without the need to have complete manipulative
control of the information presented.

Metadata
Data that defines or describes the warehouse data separate from the actual warehouse
data, which is used to maintain, manage, and support the actual warehouse data.

OLAP (On-Line Analytical Processing)
Describes the systems used not for application delivery, but for analyzing the business,
e.g., sales forecasting, market trends analysis, etc. These systems are also move
conducive to heuristic reporting and often involve multidimensional data analysis
capabilities.

OLTP (On-Line Transactional Processing)
The activities and systems associated with a company's day-to-day operational
processing and data (order entry, invoicing, general ledger, etc.).

Query Tools and Queries
An application that sends native database commands, usually SQL, to extract
information from a database server. Queries can either browse the contents of a single
table or using the database's SQL engine perform join conditioned queries, that
produce result sets involving data from multiple tables that meet certain selection
criterion.

Scrubbing/ Transformation
The processes of altering data from its original form into a format suitable for business
analysis by non-technical staff.
Vannaratana Praruksa Data Warehousing & Business Intelligence

15
Star Schema
An OLAP oriented database design used by relational databases to model
multidimensional data. A Star Schema usually contains two types of tables: fact and
dimension. The fact table contains the measurement data, for example, the salary paid,
vacation earned, etc. The dimensions hold descriptive data, for example, name,
address, etc.


doc_709161389.pdf
 

Attachments

Back
Top