White Paper on Converting Social Media Data into Business Intelligence

Description
Businesses that make more timely, accurate and informed decisions gain a significant competitive advantage. These decisions depend on relevant, real-time, quality data.

Five Key Lessons for Converting
Social Media Data into Business
Intelligence
White Paper

Five Key Lessons for Converting Social Media Data into
Business Intelligence
Transform real-time social media data into “noise-free,” actionable
intelligence
Businesses that make more timely, accurate and informed decisions gain a significant
competitive advantage. These decisions depend on relevant, real-time, quality data.
However, gathering the data, transforming it into understandable information and loading it into
business intelligence (BI) tools is easier said than done.
And today, there’s another issue at hand. How do
you transform an exponentially expanding cloud of
social media data into actionable intelligence?
The information exists. Data is refreshed in real
time, readily available as accurate economic
indicators and detailed views of reality. The key is
to figure out how to easily extract real-time
intelligence before your competitors do - and to
find clever ways to analyze these new sources.
This special report delves deeper into these issues, explores the challenges involved with
setting up a robust, reliable social media/BI integration system, and offers insights and best
practices from a leader in this space – Kapow Software.
Timely, Accurate Information Improves Business Performance
Fortunately, we are at a critical juncture where social media can be effectively integrated into
enterprise Business Intelligence (BI) platforms. Real-time data offers a compelling alternative to
traditional sources like government reports and labor department statistics that take months to
compile. The technology is available and mature, and companies – including high profile brands
and lower-profile innovators – are re-defining their businesses by leveraging real-time social
media data.
With more intelligent, more accurate real-time Web data processing, new methods for working
with business information are possible. Business analysts and decision makers can then spend
their time extracting greater intelligence from the data and less time worrying about collecting or
accessing the data.
Think about the impact to your business if you could automatically add high-value Web data to
your market intelligence, pricing intelligence, financial intelligence or any other business
intelligence application. Until recently, this seemed like an impossible feat, or at least cost
prohibitive based on the man hours involved.
Social Media Data
? Twitter grew 1444% year/year
? 50M tweets sent every day
? Facebook has 400M+ active users
? Every minute 600 new blog posts
are published

Traditional BI Processes Are Not Built for Agile Data Access
While many businesses use BI dashboards to measure performance and comb through
analytics data, a key piece of the overall BI strategy often goes missing. BI tools do an adequate
job of analyzing data, but data quality is less of a priority overall. What's needed is BI
functionality that delivers agile data access and high quality underlying data.
Typically, businesses employ expensive implementation cycles for gathering back-office data
and feeding it into BI systems. Data moves back and forth between systems via tedious Extract
Transform Load (ETL) methods. Complex master-data management (MDM) schemes are
employed to make sense of everything. This process taxes internal IT departments, adds layers
of expense, and consumes a lot of time.
Other data comes from reports and Excel spreadsheets
maintained by individual executives, and from ?siloed?
applications that require IT intervention to extract the
information. When employees use ?off-line? data for their
reports, data quality suffers and synchronization with
enterprise data becomes problematic. A better approach
would be to leverage data where it resides.
More than 5 billion Web sites crank out a vast amount of
useful information by the nano-second. To quantify this
in physical terms, a May 2010 EMC-sponsored IDC
study found that the amount of digital information that will be created in 2010 alone will be 1.2
zettabytes, which equals 75 billion fully-loaded 16 GB Apple iPads! This astronomical growth is
fueled by social media data.
This relevant, timely data is available, but it's not going to have an API any time soon – if ever.
Data feeds from blogs, forums, Facebook, twitter and other social communities on the Web are
predominantly unstructured text. This fountain of value contains some noise, and most
enterprises view it as a fire hose. It’s interesting, but there’s no way to consume it in an
organized fashion and gain useful insights. Employing IT staff to make sense of the stream by
developing custom APIs would be unrealistic at best and foolhardy at worst.
So, valuable information exists, it needs to be easily accessed, and structure needs to be
applied to make it consumable by BI tools.
Social Media – A New Frontier for Leading Edge Data Analysts
Economists, academics, governments and organizations of every kind have been using Web
data services to automate access to accurate data and integrate it into analysis systems as
soon as it's available. The Wall Street Journal documented the trend in their April 8, 2010 article
?New Ways to Read Economy – Experts Scour Oddball Data to Help See Trends Before Official
Wikipedia on "Text Analytics"
".. 80 percent of business-relevant
information originates in
unstructured form, primarily text.
These techniques and processes
discover and present knowledge –
facts, business rules, and
relationships – that is otherwise
locked in textual form,
impenetrable to automated
processing."

Information Is Available.? The report provided numerous examples of how economists use
unconventional data analytics to examine social and economic trends. They’re analyzing
everything from diesel fuel consumption to jobless claims.
The next logical step is to mine social media data. It’s now easier for analysts and business
decision makers to tap real-time data because a new, emerging class of ETL technologies
takes the complexity out of the process, leaving more time for analysis and decision
making (as opposed to information gathering). Advertising analysts are already doing this to
track media consumption trends. Which brings us to the subject of primetime TV.
Extracting noise-free, Actionable Intelligence from Facebook and twitter
If diesel fuel data and jobless claims data don't stir your imagination, think about what can be
learned from primetime TV. Kapow Software set up a novel Web data analytics system to do
just that. The tool, based on Kapow's Web Data Server (WDS), predicts the outcomes of
American Idol and Dancing with the Stars. It shows just how easy it is to gather real-time social
media data to make accurate predictions about future events based on the sentiment of online
conversations.

Real-time data gathered from Facebook, twitter, Google and the Dancing with the Stars
community is transformed via Kapow WDS and eventually delivered as rich analytic data
to a Facebook Fan page.

The simple ETL system uses core Kapow software to automatically collect, harvest, or scrape
real-time social media data. Automated scripts (robots) monitor a handful of keywords from
targeted Web sites (Facebook, twitter, search engines, forums and discussion groups). It took
only a few hours to build the robots for American Idol and Dancing with the Stars as Kapow's
powerful visual IDE requires no coding. Any data that can be seen in a Web browser can be
easily harvested without coding to APIs.
Search data based on keywords and contestants (e.g. DWTS and Kate) is extracted by Kapow
Katalyst where noise-filled data is surgically eliminated. Semantic analysis tools were then used

against the data to determine positive, negative or neutral sentiment. Analysts then reviewed
the data and posted their predictions to a Facebook Fan page called Reality Buzz. So far, the
system has an 80% accuracy rate.

Kapow's Reality Buzz easily eliminates noise and filters data into easy to understand
graphs showing fan sentiment

Reality Buzz predicted DWTS eliminations with 80% accuracy
Reality Buzz is a compelling example of using real-time social media data to enhance predictive
analytics based on data literally only minutes to hours old. The process of extracting, filtering,
and analyzing social media data to make a prediction can all be accomplished within a few
hours.

Kapow customers use similar solutions to measure retail trends, competitor pricing, customer
satisfaction, advertising campaign performance and much more. Other sophisticated customer
systems balance real-time Web data with internal information (sources include every imaginable
document, meta data, text from images and existing BI data). The Kapow platform goes beyond
simple Web scraping, converting unstructured data into structured data which your analysis
tools can recognize and utilize. This is a key Kapow differentiator. Other vendors offer customer
demos and Web site videos that only scratch the surface when it comes to data transformation.
Ultimately, they fail to deliver robust data quality and accuracy at this critical juncture.
5 Essential Lessons for Successfully Incorporating Social Media Data into
your Enterprise Intelligence
Kapow’s experience with the Reality Buzz project and a wide range of business customers has
uncovered several key insights. What follows are the five key lessons for working with and
making the most of social media data.
1. Data trumps gut feel. Data-driven decisions outweigh guesswork based on gut feel.
Dancing with the Stars offers a telling example. Even though Kate Gosselin racked up
the most negative sentiment of all the contestants (90% of negative comments for
everyone on the show at one point), she also owned a huge portion of positive
sentiment. She dominated the buzz early on, with 40% of all DWTS conversations
focusing on Gosselin (the show started with 11 contestants). As a result, the 10% of her
comments that were ?positive? indicated a large fan base that would keep her on the
show (people vote for whom they want to keep on the show and there is no vote for
kicking someone off). Most people would look at the 90% number and predict her
immediate demise – the gut prediction. But that wasn't the case. Her 10% numbers kept
her on the show for three more episodes even though she was clearly the worst dancer.
2. Timing is critical (and therefore real-time data is essential). On American Idol and
Dancing with the Stars, post-performance data carries more weight than the prior six
days of the week. This is when your sample is critical. This lesson applies when
measuring sentiment in the business world, as well. If GM were to buy a full page ad in
the Wall Street Journal then gauge sentiment a month later, the data would be
worthless. Measuring sentiment immediately before and after the ad runs is a most
reasonable approach.
3. Eliminate the noise. It’s easy to understand trends, changes in momentum, volume of
traffic, and ratio of positive to negative sentiment. However, popular shows with millions
of followers like American Idol and DWTS generate lots of collateral ?noise.? The bigger
the show, product, issue, or scandal, the more noise. That's a given. Businesses need to
carefully evaluate the noise factor and establish business rules or filters for specific
scenarios. The way results are presented is crucial, as well. Packaging data as graphs,
charts and tables makes it easy for users to quickly consume and digest the information.

4. All social media sentiment is not created equal. Varying degrees of sentiment for
reality show contestants, for example, do not translate directly or equally to votes.
Consider the following social data points: ?I just voted 100 times for Casey;? ?My top 3
are Lee, Michael and Casey;? a link, video or article re-tweet on a particular contestant.
Depending on the issue or question you are trying to answer, not all data is necessarily
needed or equal in weight. On American Idol and DWTS, the public votes for who they
want to keep on the shows. Negative sentiment has very little correlation to who is going
home. So you must understand your objective before analyzing the data.
5. Do not look at data in a vacuum. Having knowledge of events and circumstances is
critical to understanding the data and extracting the ?intelligence? in the data. Similarly,
manual review of data ensures data quality and consistency. A balance between
automated sentiment analysis and manual review needs to be struck. In the case of
Reality Buzz, it was helpful to watch the performance shows for added context. This
process helps companies raise other hypotheses and investigate further after they’ve
seen the initial data output. When using an automated sentiment analysis tool,
companies should also weigh keywords differently. Text analysis tools can’t yet
distinguish sentiment as functional, emotional or behavioral. For example, when you
monitor social media data, there’s a huge difference between ?I like my new Canon
camera? and ?I just told my friend to buy the new Canon camera.? While both are positive
sentiment, the later should be weighed much more heavily.
Kapow Solutions Are Designed to Bring Everything Together Intelligently
Kapow solutions help organizations of every shape and size access, enrich and serve Web data
regardless of its origin. The information could be live, unstructured data from Web sites. Or, it
could reside in existing, siloed data warehouses within the enterprise.
With Kapow, social media sites are easily accessible, as well. There used to be some doubt
about social media data accessibility. BI insiders complained that you would need to access the
site’s API in order to collect data. The issue is off the table with Kapow.
There’s no need for API access and no need for coding. There’s really only one requirement –
you need to be able to see the data in a browser. If you can navigate to and see it in a normal
Web browser like FireFox, IE or Safari, Kapow can easily capture it. This includes data behind
secure, password protected sites, and data on complex Web sites powered by AJAX and Flash.
With Kapow, you bypass data ownership issues, security obstacles and even the issue of
nonexistent APIs.
And, since data transformation and delivery are built into Kapow software, you can easily get
the real-time, noise-free, structured data into your BI environment.
The bottom line? Kapow customers successfully implement Kapow solutions in 1/10
th
of the
time and cost versus traditional IT methods thereby enjoying significant competitive advantages
because they access and process key information much more easily and quickly.

About Kapow Software

Kapow provides industry-leading technology for accessing, enriching, and serving real-time,
noise-free web data — no coding required.
The Kapow Katalyst Platform powers solutions in web and business intelligence, portal
generation, SOA/WOA enablement, and content migration. Kapow’s patented visual
programming and integrated development environment (IDE) technology enables business and
technical decision-makers to create innovative business applications. With Kapow, new
applications can be completed and deployed in a fraction of the time and cost associated with
traditional software development methods.

The leader in web data services, Kapow Software eliminates the barriers to accessing, enriching
and serving timely enterprise and public web data. The company’s Kapow Katalyst Platform is a
patented visual development and web data automation platform that enables Fortune 1000
companies to create high-value business applications with no coding required. Kapow currently
has 500 customers, including AT&T, Wells Fargo, Intel, Vodafone and Audi. The company is
headquartered in Palo Alto, California with additional offices in Denmark, Germany and the U.K.
For more information, log on to www.kapowsoftware.com or call toll-free at 1-800-805-0828.

doc_796831577.pdf
 

Attachments

Back
Top