Business Intelligence for Small and Middle-Sized Entreprises

oneonone · Jan 22, 2016

Description
Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world.

Business Intelligence for Small
and Middle-Sized Entreprises
Oksana Grabova
?
University of Lyon (ERIC
Lyon2)
5 av P. Mendes-France
69676 Bron Cedex, France
[email protected]
lyon2.fr
Jerome Darmont
University of Lyon (ERIC
Lyon2)
5 av P. Mendes-France
69676 Bron Cedex, France
jerome.darmont@univ-
lyon2.fr
Jean-Hugues Chauchat
University of Lyon (ERIC
Lyon2)
5 av P. Mendes-France
69676 Bron Cedex, France
jean-
hugues.chauchat@univ-
lyon2.fr
Iryna Zolotaryova
Kharkiv National University of
Economics
9-a, pr. Lenina
61001 Kharkov, Ukraine
[email protected]
ABSTRACT
Data warehouses are the core of decision support sys-
tems, which nowadays are used by all kind of enter-
prises in the entire world. Although many studies have
been conducted on the need of decision support systems
(DSSs) for small businesses, most of them adopt ex-
isting solutions and approaches, which are appropriate
for large-scaled enterprises, but are inadequate for small
and middle-sized enterprises.
Small enterprises require cheap, lightweight architec-
tures and tools (hardware and software) providing on-
line data analysis. In order to ensure these features, we
review web-based business intelligence approaches. For
real-time analysis, the traditional OLAP architecture is
cumbersome and storage-costly; therefore, we also re-
view in-memory processing.
Consequently, this paper discusses the existing approa-
ches and tools working in main memory and/or with
web interfaces (including freeware tools), relevant for
small and middle-sized enterprises in decision making.
1. INTRODUCTION
During the last decade, data warehouses (DWs)
have become an essential component of modern de-
cision support systems in most companies of the
world. In order to be competitive, even small and
middle-sized enterprises (SMEs) now collect large
?
This author also works at the Kharkiv National Uni-
versity of Economics, 9-a, pr.Lenina, 61001 Kharkov,
Ukraine
volumes of information and are interested in busi-
ness intelligence (BI) systems [26]. SMEs are re-
garded as signi?cantly important on a local, na-
tional or even global basis and they play an impor-
tant part in the any national economy [34]. In spite
of multiples advantages, existing DSSs frequently
remain inaccessible or insu?cient for SMEs because
of the following factors:
• high price;
• high requirements for a hardware infrastruc-
ture;
• complexity for most users;
• irrelevant functionality;
• low ?exibility to deal with a fast changing dy-
namic business environment [56];
• low attention to di?erence in data access ne-
cessity in SMEs and large-scaled enterprises.
In addition, many projects fail due to the com-
plexity of the development process. Moreover, as
the work philosophies of small and large-scaled en-
terprises are considerably di?erent, it is not advis-
able to use tools destined to large-scaled enterprises.
In short, ”one size does not ?t all” [51]. Further-
more, there are a lot of problems in the identi?ca-
tion of information needs of potential users in the
process of building a data warehouse [7].
Thereby, SMEs require lightweight, cheap, ?exible,
SIGMOD Record, June 2010 (Vol. 39, No. 2) 39
simple and e?cient solutions. To aim at these fea-
tures, we can take advantage of light clients with
web interfaces. For instance, web technologies are
utilized for data warehousing by large corporations,
but there is an even greater demand of such kind of
systems among small and middle-sized enterprises.
Usage of web technologies provides cheap software,
because it eliminates the necessity for numerous
dispersed applications, the necessity of deployment
and maintenance of corporate network, and reduces
training time. It is simple for end-users to utilize
web-based solutions. In addition, a web-based ar-
chitecture requires only lightweight software clients
(i.e., web browsers).
Besides, there is a need for real-time data analysis,
which induces memory and storage issues. Tradi-
tional OLAP (On line Analytical Processing) tools
are often based on a cumbersome hardware and
software architecture, so they require signi?cant re-
sources to provide a high performance. Their ?ex-
ibility is limited by data aggregation. At the same
time, in-memory databases provide signi?cant per-
formance improvements. Absence of disk I/O oper-
ations permits fast query response times. In-memo-
ry databases do not require indexes, recalculation
and pre-aggregations, thus system becomes more
?exible because analysis is possible to a detailed
level without its pre-de?nition. Moreover, accord-
ing to analyst ?rms, ”by 2012, 70% of Global 1000
organizations will load detailed data into memory
as the primary method to optimize BI application
performance” [47].
Thus, our objective is to propose an original and
adapted BI solutions for SMEs. To this aim, in a
?rst step, we review in this paper the existing re-
search related to this issue.
The remainder of this paper is organized as follows.
In section 2, we ?rst present and discuss web-based
BI approaches, namely web data warehouses and
web-based open source software for data warehous-
ing. In Section 3, we review in-memory BI solutions
(MOLAP, vector database-based BI software) and
technologies that can support it (in-memory and
vector databases). We ?nally conclude this paper
in Section 4 and provide our view on how the re-
search and technologies surveyed in this paper can
be enhanced to ?t SME’s BI needs.
2. WEB-POWERED BI
The Web has become the platform of choice for
the delivery of business applications for large-scaled
entreprises as well as for SMEs. Web warehousing
is a recent approach that merges data warehous-
ing and business intelligence systems with web tech-
nologies [52]. In this section, we present and discuss
web data warehousing approaches, their features,
advantages and possibilities, as well as their neces-
sity and potential for SMEs.
2.1 Web warehousing
2.1.1 General information
There are two basic de?nitions of web warehous-
ing. The ?rst one simply states that web ware-
houses use data from the Web. The second con-
centrates on the use of web technologies in data
warehousing. We focus on second de?nition in our
paper.
Web-data warehouses inherit a lot of characteristics
from traditional data warehouses, including: data
are organized around major subjects in the enter-
prise; information is aggregated and validated; data
is represented by times series, not by current status.
Web-based data warehouses nonetheless di?er from
traditional DWs. Web warehouses organize and
manage the stored items, but do not collect them
[52]. Web-based DW technology changes the pat-
tern of users accessing to the DW: instead of ac-
cessing through a LAN (Local Area Network), users
access via Internet/Intranet [30].
Speci?c issues raised by web-based DW include un-
realistic user expectations, especially in terms of
how much information they want to be able to ac-
cess from the Web; security issues; technical imple-
mentation problems related to peak demand and
load problems [42].
Eventually, web technologies make data warehouses
and decision support systems friendlier to users.
They are often used in data warehouses only to vi-
sualize information [18]. At the same time, web
technology opens up multiple information formats,
such as structured data, semi-structured data and
unstructured data, to end-users. This gives a lot
of possibilities to users, but also creates a problem
known as data heterogeneity management [19].
Another important issue is the necessity to view the
Web as an enormous source of business data, with-
out whose enterprises loose a lot of possibilities.
Owing to the Web, business analysts can access
large external to enterprise information and then
study competitor’s movements by analyzing their
web site content, can analyze customer preferences
or emerging trends [11]. So, e-business technologies
are expected to allow SMEs to gain capabilities that
were once the preserve of their larger competitors
[34]. However, most of the information in the Web
is unstructured, heterogeneous and hence di?cult
to analyze [26].
40 SIGMOD Record, June 2010 (Vol. 39, No. 2)
Among web-technologies used in data warehousing,
we can single out web browsers, web services and
XML. Usage of web browser o?ers some advantages
over traditional warehouse interface tools [19, 33]:
• cheapness and simplicity of web browser in-
stallation and use;
• reduction of system training time;
• elimination of problems posed by operating
systems;
• low cost of deployment and maintenance;
• elimination of necessity for numerous dispersed
applications;
• possibility to open data warehouse to business
partners over an extranet.
Web warehouses can be divided into two classes:
XML document warehouses and XML data ware-
houses. We present them in sections 2.1.2. and
2.1.3. respectively. We also introduce OLAP on
XML data (XOLAP) in section 2.1.4. We ?nish
this section by web-based paradigm known as cloud
computing (section 2.1.5). Section 2.2. ?nally pre-
sents web-based open source software for data ware-
housing analysis.
2.1.2 XML document warehouses
An XML document warehouse is a software frame-
work for analyzing, sharing and reusing unstruc-
tured data (texts, multimedia documents, etc.). Un-
structured data processing takes an important place
in enterprise life because unstructured data are larger
in volume than structured data, are more di?cult
to analyze, and are an enormous source of raw in-
formation.
Representing unstructured or semi-structured data
with traditional data models is very di?cult. For
example, relational models such as star and snow?ake
schemas are semantically poor for unstructured data.
Thus, Nassis et al. utilize object-oriented concepts
to develop a conceptual model for XML document
warehouses [35]. They use UML diagrams to build
hierarchical conceptual views. By combination of
object oriented concepts and XML Schema, they
build the xFACT repository.
2.1.3 XML data warehouses
In contrast to XML document warehouses, XML
data warehouses focus on structured data. XML
data warehouse design is possible from XML sources
[3]. In this case, it is necessary to translate XML
data into a relational schema by XML schema [3, 8].
Xyleme is one of the ?rst projects aimed at XML
data warehouse design [57]. It collects and archives
web XML documents into a dynamic XML ware-
house.
Some more recent approaches are based on classical
warehouse schemas. Pokorny adapts the traditional
star schema with explicit dimension hierarchies for
XML environments by using Document Type Def-
inition (DTD) [41]. Boussa¨?d et al. de?ne data
warehouse schemas via XML schema in a methodol-
ogy named X-Warehousing [8]. Golfarelli proposes
a semi-automatic approach for building the concep-
tual schema for a data mart starting directly from
XML sources [15]. This work elaborates the con-
cept of Dimensional Fact Model. Baril and Bellah-
sene propose a View Model from XML Documents
implemented in the DAWAX (Data Warehouse for
XML) system [4]. View speci?cation mechanism al-
lows ?ltering data to be stored. Nørv?ag introduces
a temporal XML data warehouses to query histor-
ical document versions and query changes between
document versions [36]. Nørv?ag et al. also propose
TeXOR, a temporal XML database system built on
top of an object-relational database system [37]. Fi-
nally, Zhang et al. propose an approach, named X-
Warehouse, to materialize data warehouses based
on frequent query patterns represented by Frequent
Pattern Trees [58].
2.1.4 XOLAP
Some recent research attempts to perform OLAP
analysis over XML data. In order to support OLAP
queries and to be able to construct complex ana-
lytic queries, some researches extend the XQuery
language with aggregation features [5].
Wiwatwattana et al. also introduce an XQuery
cube operator, Xˆ3 [55], Hachicha et al. also pro-
pose a similar operator, but based on TAX (Tree
Algebra for XML)[17].
2.1.5 Cloud computing
Another, increasingly popular web-based solution
is cloud computing. Cloud computing provides ac-
cess to large amounts of data and computational
resources through a variety of interfaces [38]. It is
provided as services via cloud (Internet). These ser-
vices delivered through data centers are accessible
anywhere. Besides, they allow the rise of cloud an-
alytics [2].
The main consumers of cloud computing are small
enterprises and startups that do not have a legacy of
IT investments to manage [50]. Cloud computing-
based BI tools are rather cheap for small and middle-
sized enterprises, because they provide no need of
hardware and software maintenance [1] and their
SIGMOD Record, June 2010 (Vol. 39, No. 2) 41
prices increase according to required data storages.
Contrariwise, cloud computing does not allow users
to physically possess their data storage. It causes
user dependence on the cloud computing provider,
loss of data control and data security. In conclu-
sion, most cloud computing-based BI tools do not
?t enterprise requirements yet.
2.1.6 Discussion
Data storage and analysis interface solutions should
be easily deployed in a small organization at low
cost, and thus be based on web technologies such
as XML and web services. Web warehousing is
rather recent, but a popular direction that provides
a lot of advantages, especially in data integration.
Web-based tools provide light interface. Thereby,
their usage by small and middle-sized enterprises is
limited. Existing cloud-based BI tools are appro-
priated for small and middle-sized enterprises with
respect to price and ?exibility. However, they are
so far enterprise-friendly and are in need of data
security enhancements.
2.2 Web-based open source software
In this section, we focus on ETL(Extraction Trans-
formation Loading) tools, OLAP servers and OLAP
clients. Their characteristics are summarized in Ta-
ble 1.
2.2.1 ETL
Web-based free ETL tools are in most cases RO-
LAP (Relational OLAP, discussed in Section 3.1.1.)-
oriented. ROLAP-oriented ETL tools allow user
to de?ne and create data transformations in Java
(JasperETL) or in TL (Clover.ETL)
1
. Singular MO-
LAP (Multidimensional OLAP, discussed in Section
3.1.1.)-oriented ETL Palo de?nes the ETL process
either via web interfaces or via XML structures for
experts. All studied ETL tools con?gure heteroge-
neous data sources and complex ?le formats. They
interact with di?erents DBMSs (DataBase Manage-
ment Systems). Some of the tools can also ex-
tract data from ERP (Enterprise Resource Plan-
ning) and CRM (Customer Relationship Manage-
ment) systems [53].
2.2.2 OLAP
In this section we review OLAP servers as well
as OLAP clients. All studied OLAP severs use
the MDX (Multi-Dimensional eXpression) language
for aggregating tables. They parse MDX into SQL
to retrieve answers to dimensional queries. All re-
viewed OLAP servers exists for Java, but a Palo
1http://www.cloveretl.com
exists also for .NET, PHP, and C. Moreover, Palo
is an in-memory Multidimensional OLAP database
server
2
. Mondrian schemas are represented in XML
?les
3
. Mondrian Pentaho Sever is used by di?erent
OLAP clients, e.g., FreeAnalysis.
All studied OLAP clients are Java applications. They
usually run on client, but tools also exist that run
on web servers[53]. So far, only PocOLAP is a
lightweight, open source OLAP solution.
2.2.3 Discussion
The industrial use of open source business intel-
ligence tools is becoming increasingly common, but
it is still not as wide- spread as for other types of
software [53]. Moreover, freeware OLAP systems
often propose simple web-based interfaces. In addi-
tion, there are some web-based open source BI tools
that work in memory.
Nowadays, there are three complete solutions, in-
cluding ETL and OLAP: Talend OpenStudio, Mon-
drian Pentaho and Pa- lo.
Among ETL tools, only Palo is MOLAP-oriented.
Not all of these tools provide free graphical user in-
terfaces. All three represented ETL tools support
Java. They can be implemented on di?erent plat-
forms.
Free web-based OLAP servers are used by di?erent
OLAP clients. The most extended and widely used
is Mondrian Pentaho Server due to its function-
ality. All studied OLAP clients are Java applica-
tions. Most of them can be used with XMLA(XML
for Analysis)-enabled sources. But they have not
enough documentation.
Generally, web-based studied tools provide su?cient
functionality, but they remain cumbersome due to
traditional OLAP usage.
3. IN-MEMORY BI SOLUTIONS
In the late eighties, main memory databases were
researched by numerous authors. Thereafter, it has
rarely been discussed because of limits of technolo-
gies at this time, but nowadays it takes back an
important place in database technologies.
3.1 MOLAP
3.1.1 OLAP and MOLAP
Before studying existing MOLAP approaches, we
review general OLAP principles and de?nitions. The
OLAP concept was introduced in 1993 by Codd.
OLAP is an approach to quickly answer multidi-
mensional analytical queries [13]. In OLAP, a di-
2http://www.jedox.com/en/products/palo olap server
3http://mondrian.pentaho.org/
42 SIGMOD Record, June 2010 (Vol. 39, No. 2)
Tools Platform License Particular features
ETL
ROLAP
Clover.ETL Java LGPL does not have an open source GUI; uses
its own TL language for data transfor-
mations
JasperETL Java GPL generated code - Java or Perl; can use
CRM systems as data sources
MOLAP Palo ETL
Server
Java GPL does not have a GUI for a while; parallel
jobs are not supported
OLAP
servers
Mondrian Java CPL ROLAP-based; data cubes via XML
Palo Java GPL MOLAP-based; works in memory; data
cubes via Excel add-in
clients
FreeAnalysis Java MPL works with servers that use XMLA,
e.g., Modrian
JPalo Java GPL works with the Palo server
PocOLAP Java LGPL
Table 1: Web-based open source software
mension is a sequence of analyzed parameter val-
ues. An important goal of multidimensional mod-
eling is to use dimensions to provide as much con-
text as possible for facts [21]. Combinations of di-
mension values de?ne a cube’s cell. A cube stores
the result of di?erent calculations and aggregations.
There are three variants of OLAP: MOLAP, RO-
LAP, Hybrid OLAP (HOLAP). We compare these
approaches in table 2.
With respect to ROLAP and HOLAP, MOLAP pro-
vides faster computation time and querying [48] due
to a storage of all required data in the OLAP server.
Moreover, it provides more space-e?cient storage
[40].
Since the purpose of MOLAP is to support deci-
sion making and management, data cubes must con-
tain su?cient information to support decision mak-
ing and reply to every user expectation. In this
context, researches try to improve three main as-
pects: response time (by new aggregations algo-
rithms [28], new operators [46]), query personaliza-
tion, data analysis visualization [26].
3.1.2 Storage methods
Researchers interested in MOLAP focus a lot on
storage techniques. In addition, most researches
choose MOLAP as the most suitable among OLAP-
techniques for storage [31], although MOLAP re-
quires signi?cant storage capacity. According to
Kudryavcev, there are three basic types of storage
methods: semantic, syntactical, approximate [23].
Syntactical approaches transform only data stor-
age schemas. Semantic storage techniques trans-
form cube structures. Approximate storage tech-
niques compress initial data. One semantic storage
technique is Quotient Cube. It consists in a se-
mantic compression by partitioning the set of cells
of a cube into equivalent classes, while keeping the
cube’s roll-up and drill-down semantics and lattice
structure [25]. The main objective of such approxi-
mating storage technique such as Wavelets is range-
sum query optimization [29]. In the syntactical ap-
proach DWARF, a cube is compressed by deleting
redundant information [49]. Data are represented
as graphs with keys and pointers in graphs nodes.
Data redundancy decrease is provided by an ad-
dressing and data storage improvement.
3.1.3 Schema evolution
There are a lot of works that bring up the prob-
lem of schema evolution, because working only with
the latest version hides the existence of information
that may be critical for data analysis. It is possible
to classify these studies into two groups: updat-
ing models (mapping data in the last version) and
tracking history models (saving schema evolution).
Other types of approaches look at the possibility
for users to choose which presentation they want for
query reponses. For instance, Body et al. proposed
a novel temporal multidimensional model for sup-
porting evolutions on multidimensional structures
by introducing a set of temporal modes of presen-
tation for dimensions in a star schema [6].
3.1.4 Discussion
Multidimensional OLAP is appropriate for de-
cision making. It o?ers a number of advantages,
including automatic aggregation, visual querying,
and good query performance due to the use of pre-
aggregation [39].Besides, MOLAP may be a good
solution for the situations in which small to medium-
sized DBs are the norm and application software
SIGMOD Record, June 2010 (Vol. 39, No. 2) 43
Table 2: Comparison of OLAP technologies
MOLAP ROLAP HOLAP
Data storage Multidimensional
database
Relational database Uses MOLAP technology to store
higher-level summary data, a RO-
LAP system to store detailed data
Results sets Stores in a MOLAP
cube
Stores no results sets Stores results sets, but not all
Capacity Requires singi?cant ca-
pacity
Requires the least stor-
age capacity
Performance The fastest perfor-
mance
The slowest perfor-
mance
Dimensions Minimum number Maximum number
Compromise between performance,
capacity, and permutations of
dimensions available to a user
Vulnerability Provides poor storage
utilization, especially
when the data set is
sparse
Database design rec-
ommended by ER di-
agrams are inappropri-
ate for decision sup-
port systems
Advantages Fast query perfor-
mance; automated
computation of higher
level aggregates of the
data; array model pro-
vides natural indexing
No limitation on data
volume; leverage func-
tionalities inherent in
relational databases
Fast access at all levels of aggrega-
tion; compact aggregate storage; dy-
namically updated dimensions; easy
aggregate maintenance
Disadvantages Data redundancy;
querying models with
dimensions of high
cardinality is di?cult
Slow performance Complexity - a HOLAP server must
support MOLAP and ROLAP en-
gines, tools to combine storage en-
gines and operations. Functionality
overlap - between storage and opti-
mization techniques in ROLAP and
MOLAP engines.
speed is critical [45], because loading all data to
the multidimensional format does not require sig-
ni?cant time nor disk space. Nevertheless, MOLAP
systems have di?erent problems due to the complex-
ity, time-consuming and necessity of an expert for
cube rebuilding. If the user wants to change di-
mensions, the whole deployment process need to be
redone (datamart schema, ETL process, etc.) [56].
However, the cost of MOLAP tools does not ?t the
needs of small and middle-sized enterprises. In ad-
dition, MOLAP-based systems may encounter sig-
ni?cant scalability problems. Moreover, MOLAP
requires a cumbersome architecture, i.e., important
software and hardware needs, the necessity of signif-
icant changes in work process to generate substan-
tial bene?ts [32], and a considerable deployment
time.
3.2 Main Memory Databases
3.2.1 General information
Main Memory Databases (MMDBs) entirely re-
side in main memory [14] and only use a disk sub-
system for backup [16]. The concept of managing
an entire database in main memory has been re-
searched for over twenty years, and the bene?ts of
such approaches have been well-understood in cer-
tain domains, such as telecommunications, security
trading, applications handling high tra?c of data,
e.g., routers; real-time applications. However, it is
only recently, with decreasing memory prices and
the availability of 64-bit operating systems, that the
size restrictions on in-memory databases have been
removed and in-memory data management has be-
come available for many applications [27, 54]. When
the assumption of disk-residency is removed, com-
plexity is dramatically reduced. The number of ma-
chine instructions drops, bu?er pool management
disappears, extra data copies are not needed, in-
dex pages shrink, and their structure is simpli?ed.
Design becomes simpler and more compact, and
queries are executed faster [54]. Consequently, us-
age of main memory databases become advanta-
geous in many cases: for hot data (frequently ac-
44 SIGMOD Record, June 2010 (Vol. 39, No. 2)
cess, low data volume), for cold data (scarce access,
in the case of voluminous data), in application re-
quiring a short access/response time.
A second wave of applications using MMDB is cur-
rently appearing, e.g., FastDB, Dali from AT&T
Bell lab, TimesTen from Oracle. These systems
are widely used in many applications such as HP
intellect web ?at already, Cisco VoIP call Proxy,
the telecom system of Alcatel and Ericsson and so
on [12]. The high demand of MMDBs is provoked
by the necessity of high reliability, high real-time
capacity, high quantity of information throughput
[20].
MMDBs have some advantages, including short re-
sponse time, good transaction throughputs. MMDBs
also leverage the decreasing cost of main memory.
Contrariwise, MMDB size is limited by size of RAM
(Random Access Memory). Moreover, since data in
main memory can be directly accessed by the pro-
cessor, MMDBs su?er from data vulnerability, i.e.,
risk of data loss because of unintended accident due
to software errors[14], hardware failure or other haz-
ards.
3.2.2 MMDB issues
Although in-memory technologies provide high
performance, scalability and ?exibility to BI tools,
they are still some open issues. MMDBs work in
memory, therefore the main problems and challenges
are recovery, commit processing, access methods
and storage.
There is no doubt that backups of memory resident
databa- ses must be maintained on other storage
than main memory in order to insure data integrity.
In order to protect against failures, it is necessary
to have a backup copy and to keep a log of transac-
tion activity [14]. In addition, recovery processing is
usually the only MMDB component that deals with
disk I/O, so it must be designed carefully [20]. Ex-
isting research works do not share a common view
of this problem. Some authors propose to use a
part of stable main memory to hold the log. It pro-
vides short response time, but it causes a problem
when logs are large. So, it is used for the precom-
mit transactions. Group commits (e.g., a casual
commit protocol [27]) allow accumulating several
transactions in memory before ?ushing them to the
log disk. Nowadays, commit processing is especially
important in distributed database systems because
it is slow due to the fact that disk logging takes
place at several sites [27].
Several di?erent approaches of data storage exist for
MMDBs. Initially, there have been a lot of attempts
to use database partitioning techniques developed
earlier for other types of databases. Gruenwald and
Eich divide existing techniques as following: hori-
zontal partitioning, group partitioning, single ver-
tical partitioning, group vertical partitioning, and
mixed partitioning [16]. Only horizontal and single
vertical partitioning are suitable for MMDBs and,
as a result of this study, single vertical partitioning
was chosen as the most e?cient [10]. B-trees and
hashing are identi?ed also as appropriate storage
techniques for MMDBs. Hashing is not as space ef-
?cient as a tree, so it is rarely used [43]. Finally,
most researches agree to choose T-trees (a balanced
index tree data structure optimized for cases where
both the index and the actual data are fully kept
in memory) as the main storage technique [12, 14,
44]. T-trees indeed require less memory space and
fewer CPU cycles than B-trees, so indexes are more
economical.
Above-mentioned issues are important for BI envi-
ronment: data coherence is strategic, performance
is fundamental for on-line operations like OLAP.
Choices of right storage and recovering techniques
are crucial as it can damage data security and data
integrity.
3.2.3 MMDB Systems
In this section we give an overview of MMDB sys-
tems. We particularly focus our discussion on the
most recent systems such as Dali, FastDB, Kdb,
IBM Cognos TM1 and TimesTen.
Among studied systems, we can distinguish a stor-
age manager (the Dali system [20]) and complete
main memory data- base systems (FastDB, Kdb,
TM1, TimesTen). Interfaces can be based on zero-
footprint Web (IBM Cognos TM1
4
), standard SQL
(TimesTen)[54] or C++ (FastDB)[22]. Most MMDBS
feature SQL or SQL-like query language (FastDB,
TimesTen). Kdb system uses its own language ”q”
for programming and querying [24].
IBM Cognos 8 BI and TimesTen are aimed at de-
cision making in large corporations. Main MMDB
disadvantages are interprocess communication ab-
sence and high storage requirements (Dali system)
[9], limitation of server memory (TimesTen), client-
server architecture is unsupportable (FastDB).
3.2.4 Discussion
The main bene?t of using MMDBs is short ac-
cess/reponse time and good transaction through-
put. But MMDBs are hampered by data vulnera-
bility and security problems. Memory is not persis-
tent, which means data loss in case of failure on the
server. Security problems come from unauthorized
4
www-01.ibm.com/software/data/cognos/products/tm1
SIGMOD Record, June 2010 (Vol. 39, No. 2) 45
access to data aimed at data corruption or theft.
So far, MMDBs are mainly used in real-time ap-
plications, telecommunications, but not commonly
used for decision making.
In spite of a considerable research on MMDBs, there
are some unresolved issues such as data security and
safety and data processing e?ciency.
3.3 Vector Databases
3.3.1 General information
A vector table is built by transforming a ?le in
the following way: every record represents all col-
umn values in a vector. Vector databases (VDBs)
do not require indexes nor any complex database
structure. Di?erences between vector and relational
databases are summarized in table 2. In order to ac-
cess data, relational DBMSs provide only sequential
scan by columns and by rows. VDBs provide fast
data access. Besides, relational DBs store large vol-
umes of repeatable data due to data nature. For
example, in a table of students, French nationality
can be repeated in a great number. Contrariwise,
in VDBs, this data is present only once. It provides
signi?cant data compression.
The main principles of vector databases are data
associations and data access by pointers. Vector
database implementations allow elimination of data
redundancy, because any possible pice of data is
written once and it does not repeat itself. Such
metadata as keys in the relational data model loose
their interest in VDBs, because data associations
are provided by pointers. Hence, VDBs do not con-
sume as much space as relational DBs.
3.3.2 VDB-based BI
The main principle of vector database is that in-
stead of dimension associations with OLAP cube
there are associations between data. These associa-
tions are de?ned during data load process by match-
ing up table columns having the same name. Usage
of vector databases di?ers from classical warehous-
ing: there is no prede?nition of what a dimension
is. Any piece of data is available as dimension and
any piece of data is available as measure. So, it is
not necessary to reconstruct data schema in the case
of dimension change. As vector databases work in
memory, VDB-based BI are endowed with instant
data access. However, entreprises frequently hesi-
tate to use VDB-based BI because of noninteroper-
ability with SQL tools.
One BI tool that uses vector database deployment
is QlikView
5
. QlikView provides integrated ETL. It
5
www.qlikview.com
removes the need to pre-aggregate data. It is pos-
sible to change analysis axes any moment at any
level of query detailing. Despite QlikView capaci-
ties, it has some limitations and disadvantages such
as lack of a uni?ed metadata view and of predicting
models (QlikView’s statistical analysis features are
less developed than the in other BI tools). There
is no specialization in visualization: QlikView pro-
vides a clean interface to analysts but it lacks ad-
vanced visualization features to help them graphi-
cally wade through complicated data. One of the
QlikViews’s features is an ability to automatically
connect tables. But this can create some problems.
When there are ?elds, which represent the same
thing in di?erent tables and they do not have the
same name, it is necessary to rename them to con-
nect them. When there are ?elds in di?erent tables
that have the same name, but not the same content
and sense, a senseless connection is created. So it
is necessary to delete this connection and reana-
lyze all the ?elds with the same names in order to
distinguish the ones with di?erent sense. QlikView
provides a possibility for end-users to use integrated
ETL and to construct their data schema themselves,
which often leads to unsatisfactory results.
3.3.3 Discussion
Vector databases hold the same advantages as
others in-memory databases and are only limited
by memory size.
VDB-based BI is a relatively new direction, but it is
rather popular due to fast performance, great analy-
sis capacity, unlimited number of dimensions, tables
and measures and implementation easiness.
However, among features proposed by QlikView,
there are disputable ones: automatic table connec-
tion, possibility to create a data schema by end-
user. These characteristics do not cover di?erent
situations due to data aggregation complexity when
data come from di?erent sources. Such data have
di?erent re?nement levels, di?erent ?eld names, etc.
Consequently, providing to end-users the possibility
to create data schemas can provoke an inadequate
data schema, table connections, data loss as well as
false data presence in database. Moreover, VDB-
based BI tools are often blackboxes, meaning that
we do not know what happens inside. Such models
also lack ?exibility.
4. CONCLUSION
Nowadays, BI becomes an essential part of any
enterprise, even an SME. This necessity is caused by
the increasing data volume indispensable for deci-
sion making. Existing solutions and tools are mostly
46 SIGMOD Record, June 2010 (Vol. 39, No. 2)
Table 3: Relational and vector database chatacteristics
Characteristic Relational DB Vector DB
Access to data Sequential Parallel
Data integrity Foreign Keys Multi-dimensional
Data relations stored in Keys Vectors
Data reuse Not available Built-in
Metadata System tables None
Speed (high volume) Slow Fast
Uniqueness User Constraints Built-in
aimed at large-scaled enterprises; thereby they are
inaccessible or insu?cient for SMEs because of high
price, redundant functionality, complexity, and high
hardware and software requirements. SMEs require
solutions with light architectures that, moreover,
are cheap and do not require additional hardware
and software.
This survey discusses the importance of data ware-
housing for SMEs, presents the main characteristics
and examples of web-based data warehousing, MO-
LAP systems and MMDBs. All these approaches
have important disadvantages to be chosen as a
unique decision support system: cumbersome ar-
chitecture and complexity in MOLAP, data vulner-
ability in MMDBs, non-transparency and providing
too large powers for users in VDB-based systems,
security issues in cloud computing systems.
In this context, our research objective is to design
BI solutions that are suitable for SMEs and avoid
the aforementioned disadvantages.
Our idea is to work toward a ROLAP system that
operates in-memory, i.e., to add in OLAP opera-
tors on top of an SQL-based MMDB. This should
simplify a lot the in-memory OLAP architecture
with respect to MOLAP. Choosing an open source
MMDB system (such as FastDB) and using well-
known ETL, modeling and analysis processes should
also help avoid the ”black box issue” of VDBs. Fi-
nally, storing business data as close to the user as
possible mitigates security issues with respect to
cloud BI. Problems will still remain, though (e.g.,
data vulnerability and need for backup, the design
of adapted, in-memory indexes for OLAP), but we
are con?dent we can address them in our future re-
search.
5. ACKNOWLEDGMENTS
The authors would like to thank the French Am-
bassy in Ukraine for supporting this joint research
work of the Kharkiv National University of Eco-
nomics (Ukraine) and the University of Lyon 2 (France).
6. REFERENCES
[1] D. J. Abadi. Data Management in the Cloud:
Limitations and Opportunities. IEEE Data
Engineering Bulletin, 32(1):3–12, March 2009.
[2] M. Armbrust, A. Fox, R. Gri?th, A. D.
Joseph, R. H. Katz, A. Konwinski, G. Lee,
D. A. Patterson, A. Rabkin, I. Stoica, and
M. Zaharia. Above the Clouds: A Berkeley
View of Cloud Computing. Technical Report
UCB/EECS-2009-28, EECS Department,
University of California, Berkeley, February
2009.
[3] M. Banek, Z. Skocir, and B. Vrdoljak.
Logical Design of Data Warehouses from
XML . In ConTEL ’05: Proceedings of the 8th
international conference on
Telecommunications, volume 1, pages
289–295, 2005.
[4] X. Baril and Z. Bellahsene. Designinig and
Managing an XML Warehouse. In XML Data
Management: Native XML and XML-Enabled
Database Systems, chapter 16, pages 455–473.
Addison-Wesley Professional, 2003.
[5] K. Beyer, D. Chamberlin, L. S. Colby,
F.
¨
Ozcan, H. Pirahesh, and Y. Xu. Extending
XQuery for analytics. In SIGMOD ’05:
Proceedings of the 2005 ACM SIGMOD
international conference on Management of
data, pages 503–514, New York, NY, USA,
2005. ACM.
[6] M. Body, M. Miquel, Y. B´edard, and
A. Tchounikine. A multidimensional and
multiversion structure for OLAP applications.
In DOLAP ’02: Proceedings of the 5th ACM
international workshop on Data Warehousing
and OLAP, pages 1–6, New York, NY, USA,
2002. ACM.
[7] M. B¨ohnlein and A. U. vom Ende. Business
Process Oriented Development of Data
Warehouse Structures, pages 3–21. Physica:
Heidelberg 2000, 2000.
[8] O. Boussa¨?d, R. BenMessaoud, R. Choquet,
and S. Anthoard. X-Warehousing: an
SIGMOD Record, June 2010 (Vol. 39, No. 2) 47
XML-Based Approach for Warehousing
Complex Data. In ADBIS ’06: Proceedings of
the 10th East-European Conference on
Advances in Databases and Information
Systems, volume 4152 of Lecture Notes in
Computer Science, pages 39–54, Heidelberg,
Germany, September 2006. Springer.
[9] P. Burte, B. Aleman-meza, D. B. Weatherly,
R. Wu, S. Professor, and J. A. Miller.
Transaction Management for a Main-Memory
Database. The 38th Annual Southeastern
ACM Conference, Athens, Georgia, pages
263–268, January 2001.
[10] Y. C. Cheng, L. Gruenwald, G. Ingels, and
M. T. Thakkar. Evaluating Partitioning
Techniques for Main Memory Database:
Horizontal and Single Vertical. In ICCI ’93:
Proceedings of the 5th International
Conference on Computing and Information,
pages 570–574, Washington, DC, USA, 1993.
IEEE Computer Society.
[11] W. Chung and H. Chen. Web-Based Business
Intelligence Systems: A Review and Case
Studies. In G. Adomavicius and A. Gupta,
editors, Business Computing, volume 3,
chapter 14, pages 373–396. Emerald Group
Publishing, 2009.
[12] Y. Cui and D. Pi. SQLmmdb: An Embedded
Main Memory Database Management System.
Information Technology Journal,
6(6):872–878, 2007.
[13] S. C. E.F. Codd and C. Salley. Providing
OLAP to User-Analysts: An IT Mandate,
1993.
[14] H. Garcia-Molina and K. Salem. Main
memory database systems: An overview.
IEEE Transactions on Knowledge and Data
Engineering, 4:509–516, 1992.
[15] M. Golfarelli, S. Rizzi, and B. Vrdoljak. Data
warehouse design from XML sources. In
DOLAP ’01: Proceedings of the 4th ACM
international workshop on Data warehousing
and OLAP, pages 40–47, New York, NY,
USA, 2001. ACM.
[16] L. Gruenwald and M. H. Eich. Choosing the
best storage technique for a main memory
database system. In JCIT ’90: Proceedings of
the 5th Jerusalem conference on Information
technology, pages 1–10, Los Alamitos, CA,
USA, 1990. IEEE Computer Society Press.
[17] M. Hachicha, H. Mahboubi, and J. Darmont.
Expressing OLAP operators with the TAX
XML algebra. In DataX-EDBT ’08: 3rd
International Workshop on Database
Technologies for Handling XML Information
on the Web, pages 61–66, March 2008.
[18] H. A. A. Hafez and S. Kamel. Web-Based
Data Warehouse in the Egyptian Cabinet
Information and Decision Support Center. In
R. Meredith, G. Shanks, D. Arnott, and
S. Carlsson, editors, DSS’04: The IFIP
International Conference on Decision Support
in an Uncertain and Complex World, pages
402–409. Monash University, Australia (CD
Rom), July 2004.
[19] C. Hsieh and B. Lin. Web-based data
warehousing: current status and perspective.
The Journal of Computer Information
Systems, 43:1–8, January 2002.
[20] H. V. Jagadish, D. F. Lieuwen, R. Rastogi,
A. Silberschatz, and S. Sudarshan. Dal´?: A
High Performance Main Memory Storage
Manager. In VLDB ’94: Proceedings of the
20th International Conference on Very Large
Data Bases, pages 48–59, San Francisco, CA,
USA, 1994. Morgan Kaufmann Publishers Inc.
[21] R. Kimball and M. Ross. The Data Warehouse
Toolkit: the complet guide to dimensional
modeling. Wiley Computer Publishing, 2002.
[22] K. Knizhnik. FastDB Main Memory Database
Management System. Technical report,
Research Computer Center of Moscow State
University, Russia, March 1999.
[23] Y. Kudryavcev. E?cient algorithms for
MOLAP data storage and query processing.
In SYRCoDIS ’06: Proceedings of the 3rd
Spring Colloquium for Young Researchers in
Databases and Information Sytems, page 5,
Moscow, Russia, 2006.
[24] Kx Systems. The kdb+ Database White
Paper. A uni?ed database for streaming and
historical data, 2009. Retrieved September 1,
2010 fromhttp://kx.com/papers.
[25] L. V. S. Lakshmanan, J. Pei, and J. Han.
Quotient cube: how to summarize the
semantics of a data cube. In VLDB ’02:
Proceedings of the 28th international
conference on Very Large Data Bases, pages
778–789. VLDB Endowment, 2002.
[26] G. Lawton. Users Take a Close Look at Visual
Analytics. Computer, 42(2):19–22, 2009.
[27] I. Lee, H. Y. Yeom, and T. Park. A New
Approach for Distributed Main Memory
Database Systems: A Causal Commit
Protocol. IEICE Transactions on Information
and Systems, E87-D(1):196–204, January
2004.
[28] Y.-K. Lee, K.-Y. Whang, Y.-S. Moon, and
48 SIGMOD Record, June 2010 (Vol. 39, No. 2)
I.-Y. Song. An aggregation algorithm using a
multidimensional ?le in multidimensional
OLAP. Information Sciences, 152(1):121–138,
June 2003.
[29] D. Lemire. Wavelet-Based Relative Pre?x
Sum Methods for Range Sum Queries in Data
Cubes. In CASCON ’02: Proceedings of the
2002 conference of the Centre for Advanced
Studies on Collaborative research, page 6.
IBM Press, October 2002.
[30] Z. Luo, Z. Kaisong, X. Hongxia, and
Z. Kaipeng. The Data Warehouse Model
Based on Web Service Technology. Journal of
Communication and Computer, 2(1):26–31,
January 2005.
[31] E. Malinowski and E. Zim´anyi. Hierarchies in
a multidimensional model: from conceptual
modeling to logical representation. Data &
Knowledge Engineering, 59(2):348–377, 2006.
[32] M. McDonald. Light weight vs heavy weight
technologies, the di?erence matters. Gartner,
March 2010.
[33] A. Mehedintu, I. Buligiu, and C. Pirvu.
Web-enabled Data Warehouse and Data
Webhouse. Revista Informatica Economica,
1(45):96–102, 2008.
[34] R. Mullins, Y. Duan, D. Hamblin, P. Burrell,
H. Jin, G. Jerzy, Z. Ewa, and B. Aleksander.
A Web Based Intelligent Training System for
SMEs. The Electronic Journal of e-Learning,
5:39–48, 2007.
[35] V. Nassis, W. Rahayu, R. Rajugan, and
T. Dillon. Conceptual design of XML
document warehouses. In DaWak ’04:
Proceedings of the 6th International on Data
Warehousing and Knowledge Discovery, pages
1–14, 2004.
[36] K. Nørv?ag. Temporal XML Data Warehouses:
Challenges and Solutions. In Proceedings of
Workshop on Knowledge Foraging for
Dynamic Networking of Communities and
Economies(in conjunction with
EurAsia-ICT’2002), October 2002.
[37] K. Nørv?ag, M. Limstrand, and L. Myklebust.
TeXOR: Temporal XML Database on an
Object-Relational Database System. In
M. Broy and A. V. Zamulin, editors, Ershov
Memorial Conference, volume 2890 of Lecture
Notes in Computer Science, pages 520–530.
Springer, 2003.
[38] D. Nurmi, R. Wolski, C. Grzegorczyk,
G. Obertelli, S. Soman, L. Youse?, and
D. Zagorodnov. The Eucalyptus Open-Source
Cloud-Computing System. In CCGRID ’09:
Proceedings of the 2009 9th IEEE/ACM
International Symposium on Cluster
Computing and the Grid, pages 124–131,
Washington, DC, USA, 2009. IEEE Computer
Society.
[39] D. Pedersen, K. Riis, and T. B. Pedersen.
XML-Extended OLAP Querying. In SSDBM
’02: Proceedings of the 14th International
Conference on Scienti?c and Statistical
Database Management, pages 195–206,
Washington, DC, USA, 2002. IEEE Computer
Society.
[40] T. B. Pedersen and C. S. Jensen.
Multidimensional Database Technology.
Computer, 34(12):40–46, 2001.
[41] J. Pokorn´ y. XML Data Warehouse: Modelling
and Querying. In Baltic DB&IS ’02:
Proceedings of the 5th International Baltic
Conference on Databases and Information
Systems, pages 267–280. Institute of
Cybernetics at Tallin Technical University,
2002.
[42] D. J. Power and S. Kaparthi. Building
Web-based Decision Support Systems. Studies
in Informatics and Control, 11(4):291–302,
December 2002.
[43] J. Rao and K. A. Ross. Cache Conscious
Indexing for Decision-Support in Main
Memory. In VLDB ’99: Proceedings of the
25th International Conference on Very Large
Data Bases, pages 78–89, San Francisco, CA,
USA, 1999. Morgan Kaufmann Publishers Inc.
[44] R. Rastogi, S. Seshadri, P. Bohannon, D. W.
Leinbaugh, A. Silberschatz, and S. Sudarshan.
Logical and Physical Versioning in Main
Memory Databases. In VLDB ’97:
Proceedings of the 23rd International
Conference on Very Large Data Bases, pages
86–95, San Francisco, CA, USA, 1997.
Morgan Kaufmann Publishers Inc.
[45] P. Rob and C. Coronel. Database systems:
design, implementation, and management.
Cengage Learning, 2007.
[46] G. Sathe and S. Sarawagi. Intelligent Rollups
in Multidimensional OLAP Data. In VLDB
’01: Proceedings of the 27th International
Conference on Very Large Data Bases, pages
531–540, San Francisco, CA, USA, 2001.
Morgan Kaufmann Publishers Inc.
[47] K. Schlegel, M. A. Beyer, A. Bitterer, and
B. Hostmann. BI Applications Bene?t From
In-Memory Technology Improvements.
Gartner, October 2006.
[48] F. Silvers. Building and Maintaining a Data
SIGMOD Record, June 2010 (Vol. 39, No. 2) 49
Warehouse. Auerbach Publications, 2008.
[49] Y. Sismanis, A. Deligiannakis,
N. Roussopoulos, and Y. Kotidis. DWARF:
shrinking the PetaCube. In SIGMOD ’02:
Proceedings of the 2002 ACM SIGMOD
international conference on Management of
data, pages 464–475, New York, NY, USA,
2002. ACM.
[50] J. Staten. Is cloud computing ready for the
enterprise? Forrester Research,
March 2008. Retrieved September 1, 2010 fromhttp://www.forrester.com/rb/Research/is cloud
computing ready for enterprise/q/id/44229/t/2.
[51] M. Stonebracker and N. Hachem. The End of
an Architectual Era (It’s Time for a Complete
Rewrite). In VLDB’07: Proceedings of the
33rd international conference on Very Large
Data Bases, pages 1150–1160, 2007.
[52] X. Tan, D. C. Yen, and X. Fang. Web
warehousing: Web technology meets data
warehousing. Technology in Society,
25:131–148, January 2003.
[53] C. Thomsen and T. B. Pedersen. A Survey of
Open Source Tools for Business Intelligence.
International Journal of Data Warehousing
and Mining, 5(3):56–75, jul-sep 2009.
[54] C. TimesTen Team. In-memory data
management for consumer transactions the
timesten approach. SIGMOD Record,
28(2):528–529, 1999.
[55] N. Wiwatwattana, H. V. Jagadish, L. V. S.
Lakshmanan, and D. Srivastava. Xˆ3: A Cube
Operator for XML OLAP. IEEE 23rd
International Conference on Data
Engineering, pages 916–925, 2007.
[56] G. Xie, Y. Yang, S. Liu, Z. Qiu, Y. Pan, and
X. Zhou. EIAW: Towards a Business-friendly
Data Warehouse Using Semantic Web
Technologies. In K. Aberer, K.-S. Choi,
N. Noy, D. Allemang, K.-I. Lee, L. J. B.
Nixon, J. Golbeck, P. Mika, D. Maynard,
G. Schreiber, and P. Cudre-Mauroux, editors,
ISWC/ASWC ’07: Proceedings of the 6th
International Semantic Web Conference and
2nd Asian Semantic Web Conference, volume
4825 of LNCS, pages 851–904, Berlin,
Heidelberg, November 2007. Springer Verlag.
[57] L. Xyleme. A dynamic warehouse for XML
data of the Web. IEEE Data Engineering
Bulletin, 24:40–47, 2001.
[58] J. Zhang, W. Wang, H. Liu, and S. Zhang.
X-warehouse: building query pattern-driven
data. In WWW ’05: Special interest tracks
and posters of the 14th international
conference on World Wide Web, pages
896–897, New York, NY, USA, 2005. ACM.
50 SIGMOD Record, June 2010 (Vol. 39, No. 2)

doc_680883007.pdf

Business Intelligence for Small and Middle-Sized Entreprises

Attachments