Practical Implications of Real Time Business Intelligence

oneonone · Jan 23, 2016

Description
Businesses are facing challenges in todays en- vironment.

Journal of Computing and Information Technology - CIT 20, 2012, 4, 257–264
doi:10.2498/cit.1002081
257
Practical Implications of Real Time
Business Intelligence
Dale Rutz, Tara Nelakanti and Nayem Rahman
Intel Corporation, USA
The primary purpose of business intelligence is to im-
prove the quality of decisions while decreasing the time
it takes to make them. Because focus is required
on internal as well as external factors, it is critical to
decrease data latency, improve report performance and
decrease systems resource consumption. This article will
discuss the successful implementation of a BI reporting
project directly against an OLTP planning solver. The
planning solver imports data concerning supply, demand,
capacity, bill of materials, inventory and the like. It
then uses linear programming to determine the correct
product mix to produce at various factories worldwide.
The article discusses the challenges faced and a working
model in which real-time BI was achieved by providing
data to a separate BI server in an innovative way resulting
in decreased latency, reduced resource consumption and
improved performance. We demonstrated an alternative
approach to hosting data for the BI application separately
by loading BI and solver databases at the same time,
resulting in faster access to information.
Keywords: business intelligence, real-time business in-
telligence, OLTP, data warehouse, planning solver,
replication tool
1. Introduction
In today’s environment businesses need to make
informed business decisions as things are chang-
ing in their environments. Companies typi-
cally create and collect data in operational data
stores (order taking, accounting, procurement,
and planning systems). This data is then posted
to the enterprise data warehouse for various an-
alytical needs. As the time frame which com-
panies have to react to changes in the market
place shrinks, there is a push to access informa-
tion faster, often as soon as it is created. For
decades we have known that reporting against
operational data stores should be limited as data
layout and indexing needs are different between
on line transaction processing (OLTP) which
writes, and business intelligence (BI) reporting
which reads.
Businesses are facing challenges in today’s en-
vironment. Volume and complexity of informa-
tion in enterprises is increasing at the same time
that business people are looking to take advan-
tage of that information to gain a competitive
edge. Companies face a tradeoff between time-
liness of information and cost and operational
impact. Generally, companies create data in On
Line Transaction Processing (OLTP) systems.
They then extract data from these OLTP sys-
tems and place it into the enterprise data ware-
house (EDW) for various reporting and analyti-
cal needs. This framework allows separation of
duties (creation vs. reporting) and also facili-
tates data cleansing efforts while ensuring that
different aspects of the business are looking at
the same information resulting in a single ver-
sion of truth. There are, however, significant
downsides with this traditional approach.
Generally, enterprise data warehouses are loaded
via batch processes on a certain schedule, often
every 4 hours or even longer between batches.
Data definitions must be negotiated across vari-
ous business processes and changes require sig-
nificant planning and testing. The result is of-
ten a fairly rigid and bureaucratic process which
may not serve all business functions. Addition-
ally, because the environment is shared, the time
required to implement changes is often signifi-
cantly longer (measured in months or quarters)
than business units can tolerate in a changing
environment.
Alternatives, such as reporting directly against
OLTP systems, have been explored. Often
258 Practical Implications of Real Time Business Intelligence
called real time analytics, these efforts have
their own challenges. OLTP systems are de-
signed for data capture; hence, they are struc-
tured, indexed and coded for fast writes. Con-
versely, reporting and analytic applications re-
quire fast reads. Indexes which are created to
enhance report performance may adversely af-
fect inserts and updates.
A detailed analysis of the business requirement
is needed to determine what level of timeliness
best serves the business purpose. This is called
right time business intelligence (BI). In some
cases, a delay of even a day will not adversely
affect the quality of business decisions. In other
cases, users may need to inspect and analyze
information at various levels of aggregation be-
fore finalizing a plan and little or no delay can
be tolerated, resulting in a need for real time BI.
In our planning application, business intelli-
gence reports are designed to summarize ex-
cess and shortfall to demand across the plan-
ning horizon, as well as identify changes which
have occurred since the previous planning cycle.
There are a wealth of reports which look at var-
ious angles and levels of aggregation of infor-
mation in order to identify patterns and trends
which are occurring and affecting supply and
demand. These reports are used to judge the
quality of the production plan and allow plan-
ners to communicate roadblocks to sales and
manufacturing organizations before customers
are impacted.
Initially, our BI implementation was planned to
be on a separate, replicated database. However,
due to concerns about latency as well as resource
constraints, the initial implementation had both
the solver and the BI reporting running on the
same server. While this has been shown to work
and is in production now, several shortcomings
(index would enhance BI performance but neg-
atively impacts the solver, processes blocking
each other, tight coupling of OLTP and BI) have
caused us to step back and reconsider. Exper-
imentation has shown that by loading both a
BI database and a solver database at the same
time does not take longer than simply loading
the solver database. Updates can be replicated
experiencing only 20 or 30 seconds of latency.
By hosting the BI implementation separately,
overall latency was actually reduced because re-
ports which were taking more than 10 minutes
now take seconds by taking advantage of better
indexing and precooked tables. IO and CPU
consumption for the solver in no way impact
reporting performance and vice versa. Locks
or other server issues can be root caused to one
application only.
Hosting BI applications inside Operational Data
Stores is possible, but the benefits of hosting it
separately have been shown to far outweigh the
benefits of combining them. In fact, the largest
perceived benefit, real time BI, is actually better
achieved by separating the two functions.
2. Literature Research
There is strong evidence of the importance of
business intelligence [17]. Many business as
well academic publications describe different
ways companies are using and benefiting from
business intelligence [2 and 17]. Data ware-
housing and data management have been identi-
fied as one of the six physical capability clusters
of IT-infrastructure services required for strate-
gic agility [19]. Data warehousing and business
intelligence are closely connected to each other.
Significant research has been done on different
aspects of data warehousing [1, 6, 8, 11, 15 and
20] and business intelligence [5, 7 and 24] over
the last one decade. Previous work on business
intelligence has focused on design issues [4, 9,
10, and 25], BI tools [16, 22, 23 and 24], BI data
maintenance and collection strategies [12], and
implementation issues and best practices [21].
Wu et al. [24] propose a service-oriented archi-
tecture for business intelligence that makes a
seamless integration of technologies into a co-
herent business intelligence environment. They
suggest that this kind of architecture enables
a simplified data delivery and low-latency an-
alytics. They compared our service oriented
approach with traditional business oriented ar-
chitectures, and presented the advantages of the
service oriented paradigm. They assert that the
proposed approach is the best way to reduce the
total development and maintenance cost, and
to minimize the risk and impact across an en-
tire enterprise when introducing business intel-
ligence solutions [24]. The Oracle [10] white
paper addresses the business reasons to move to
real-time data warehousing and describes some
of the common data integration approaches,
Practical Implications of Real Time Business Intelligence 259
with an emphasis on using real-time CDC capa-
bilities [10]. Sandu [13] and Davis [3] provide
an overviewof operational and real-time BI and
how they optimize the decision making process
by reducing to eliminating latency. Given real-
time BI is costly, they suggest that companies
do not always need to reduce latency to zero
and do not always need to take and implement
decisions in real time. They argue that com-
panies should define optimum frame time, the
right-time for any decision process, an interval
that should reflect the business needs and that
should offer the best risks-costs ratio [13].
Watson et al. [18] assert that “to be successful
with real-time BI, organizations must overcome
both organizational and technical challenges.”
They suggest that on the technical side, new
hardware and software must be acquired and
implemented, processes and procedures for sup-
porting and managing real-time data feeds from
source systems must be established. They also
state that the purpose of real-time BI is to in-
crease revenues and decrease costs. Compa-
nies that successfully implement real-time BI
can dramatically improve their profitability[18].
Ramakrishnan et al. [12] examine how external
pressures influence the relationship between an
organization’s business intelligence (BI) data
collection strategy and the purpose for which
BI is implemented. They provide managers
with a mental model on which to base decisions
about the data required to accomplish their goals
for BI [12]. Steiger [14] suggests that BI tech-
niques can be applied to knowledge creation as
an enabling technology. The author proposes a
business intelligence design theory for DSS as
knowledge creation, and that indicates how BI
can be focused internally on the decision maker
to discover and enhance his/her mental model
and improve the quality of decisions [14]. The
author also suggests that BI is an appropriate
enabling technology for knowledge creation.
So, most of the above research work focused
on how BI could be used for different pur-
poses, best practices, and what benefits busi-
nesses could achieve by effectively using BI. In
this article, we talk about the back-end side of
BI tools. We present a novel approach of host-
ing BI applications and solver database sepa-
rately to achieve maximum benefits in terms
of efficiency, resource usage and performance.
The article discusses how real-time BI could be
achieved by providing data to a BI server in
an innovative way while decreasing latency and
without impacting performance and resources
consumption. We propose hosting data to BI
application separately by loading BI and solver
databases separately and at the same time.
3. Using Replication Technology for Data
Movement
Our initial approach of using the OLTP database
to support BI functionality has been available
in production for nine months. It is fully func-
tional and users access the data to support their
business processes real time. Challenges have
been encountered with processes blocking each
other, causing response time concerns on both
the OLTP solver aspect of the application as
well as BI reports. Additionally, some reports
run slower than they need to because indices
which could be used to speed up data retrieval
would have the opposite effect on OLTP and
slow those processes down. As the application
matures and grows, solutions have been sought.
We initially considered using triggers to capture
data changes and write those from the trans-
actional database to another server for use in
reporting. This approach is not ideal in that
it would require custom code within the OLTP
application. Embedding business rules in com-
plex SQL (structured query language) is never
a good choice in production applications. Also,
performance implications of triggers are widely
understood. The more ideas were discussed,
the more it became clear that a method of data
transfer which is integral to the database en-
gine itself would provide speed and flexibility
needed while minimizing customization.
Based on our analysis and research, replication
rose as the clear choice to move data seam-
lessly. It avoids the issue of embedding business
logic in SQL and entirely sidesteps the custom
code concerns as it is made up of commercially
available components. Replication technology
is used to move data from server to server in
a transactionally consistent state from one in-
stance to another instance. Replication can be
transmitted on a schedule or in real time to a
single instance or to multiple instances using
either pull or push method. Our case study in-
volves real time replication which moves data
260 Practical Implications of Real Time Business Intelligence
using push method to a single instance. Replica-
tion has a publisher, distributor and a subscriber.
The publisher runs on the transactional db in-
stance. The Distributor and distributor db can
be on its own server, or it can reside on either
the publisher or subscriber if resources allow.
The Subscriber runs on the BI server.
Figure 1. Transactional replication.
The Distributor uses the transaction log to prop-
agate transactions to the Subscriber. In our im-
plementation, as data is transmitted to the Sub-
scriber, all data changes are processed in the
order they were made on the publisher. This
ensures data consistency on both transactional
(OLTP) and BI instances. Our case study trans-
ferred/replicated only Table articles. We made
schema changes on the transaction database and
recorded/observed that the changes were imme-
diately replicated on to the BI server. Any data
changes that were made on the transactional
database (through User interface) were picked
up by the BI server in less than ten seconds. We
were concerned about the transaction log size.
Without replication, a transaction log is almost
always written to, and rarely read from. When
replication is configured, the transaction log file
will not be truncated until all the transactions are
replicated. We wanted to study this, and ensure
that the log file growth will not cause bottle
neck to the transactional OLTP DB. We moni-
tored the latency that it takes the Log Reader to
move transactions from OLTP transaction log
to the distribution database. We monitored the
latency it takes the Distributor Agent to move
transactions from the distribution database to
the Subscriber database. The total of these two
figures is the amount of time it takes a transac-
tion to get from the publication database to the
subscriber database. The counters for these two
processes are the DB Server Replication Log
Reader: Delivery Latency counter, and the DB
Server Replication Distributor: Delivery La-
tency counter. We concluded that if significant
increase in the latency for either of these pro-
cesses occurred, then this should be a signal to
find out what new or different action has hap-
pened to cause the increased latency. We also
used SQL queries to monitor log file growth.
We also studied the I/O performance of the
server. Transactional replication can cause I/O
performance issues on databases that experience
large numbers of transactions. To address this
we moved the transaction log of databases in-
volved in transactional replication on its own
dedicated RAID 1 or RAID 10 disk array. This
reduced the risk of reading from OLTP transac-
tion log file.
We did several tests for transactional database
performance to ensure that replication did not
add overhead to the transactional database server.
We monitored the health of the application and
the database for two months during the peak us-
age times. We did not observe any significant
performance issues at the application level or at
the database server level.
Figure 2. Replicaton impact on publisher.
Practical Implications of Real Time Business Intelligence 261
Figure 3. Results – replication latency.
4. Business Context
Our usage model involves factory optimization
and loading on a varied product mix. Many
products are capacity constrained. Adding to
the complexity our bill of materials (BOMs)
are many and varied. One assembly product can
make literally thousands of different test items
and conversely one test item can be made from
many different assembly products. Due to these
complex business scenarios there is a significant
level of difficulty involved in trying to analyze
the factory loading levels to ensure optimal out-
put. On the forefront of these challenges is
the fact that traditional business measures, such
as available to promise, require significant re-
sources to calculate in real time. Let us take
available to promise as an example, and explore
the challenges faced.
Available to promise (ATP) is defined by APICS
(APICS Dictionary, 13
th
edition) as “the un-
committed portion of a company’s inventory
and planned production maintained in the mas-
ter schedule to support customer order process-
ing.” The Dictionary defines order promising
as “the process of making a delivery commit-
ment.” Generally, this value consists of inven-
tory on hand plus the Master Production Sched-
ule (MPS) minus the sum of customer order
prior to the next MPS for the near term. How-
ever, once a certain time threshold is passed,
the forecast of customer orders is substituted
for actual orders. Because forecasts are only an
estimate, smoothing must be applied to avoid
wide variations in signals given to the produc-
tion floor. Further, different products have dif-
ferent complexities in terms of their bill of ma-
terials, yields, throughput times et al. A variety
of different subassemblies can be manufactured
into the same finished good depending on avail-
ability of materials, machinery and manpower.
So, the calculation for available to promise must
take information at the lowest level (item/day)
and calculate through all of the various combi-
nations and permutations of bills of materials
to determine available supply as well projected
demand and then aggregate these values across
product families and weeks/months/quarters to
create information which business people can
use to plan production facilities and personnel,
as well as materials requirements all in the inter-
est of excellent supplier and customer relations.
Fromthis one example of one metric it becomes
clear that a variety of stakeholders will need to
access this information at different levels of ag-
gregation, using different filters, for a whole
host of different business activities. The goal
of Business Intelligence is to facilitate quality
decisions in the least time possible. In some
cases planners are creating production sched-
ules and need to use reports to determine the
quality of the schedules immediately before
making them plan of record (POR). In other
cases business unit managers need to examine
product schedules to ensure their products are
fully supported and manage tradeoffs between
their item groups. Marketing representatives
need to understand what they can and cannot
promise their clients, and factory managers need
visibility into which products can be swapped
for which others at various points in the process.
All of this means that data needs to be stored
at the lowest possible level. Hierarchies must
be created and maintained that allow users to
traverse the data without risk of neither double
counting nor any other mistake in aggregation
which could give a wrong calculation result.
Initially these requirements were taken into ac-
count and drove a decision in which the data
must be real time and reported against at the
same level at which it is created. However, after
this approach has been implemented, different
observations have driven a different conclusion.
262 Practical Implications of Real Time Business Intelligence
5. Performance Advantages of Separate
OLTP vs. Reporting Databases
As previously discussed, performance of both
OLTP and BI reporting were compromised to
some extent by the existence of the other. Ini-
tially users felt the performance hits were justi-
fied by the immediate access to information cre-
ated by the application. However, after seeing
how quickly data could be replicated to a sep-
arate server, we realized that users could have
access to information even faster via replication
than they could by running reports against the
OLTP system.
To understand why this would be, first we must
consider the nature of the database itself. The
database size is roughly 120 GB. There are 110
tables, the largest of which holds over 32 million
rows of data, and the smallest of which holds
half a dozen rows of data. All information is
stored at the very lowest level of detail, which
is individual line items and day. The breakdown
is provided in Table 1:
Number of Tables Number of rows
7 10M – 35M
14 1M – 10M
16 100K – 1M
9 20K – 1K
64

Practical Implications of Real Time Business Intelligence

Attachments