Business Intelligence In Blogs Understanding Consumer Interactions And Communities

oneonone · Jan 22, 2016

Description
The increasing popularity of Web 2.0 has led to exponential growth of user-generated content in both volume and significance.

SPECIAL ISSUE: BUSINESS INTELLIGENCE RESEARCH
BUSINESS INTELLIGENCE IN BLOGS: UNDERSTANDING
CONSUMER INTERACTIONS AND COMMUNITIES
1
Michael Chau
School of Business, The University of Hong Kong, Pokfulam, HONG KONG {[email protected]}
Jennifer Xu
Computer Information Systems, Bentley University, Waltham, MA 02452 U.S.A. {[email protected]}
The increasing popularity of Web 2.0 has led to exponential growth of user-generated content in both volume
and significance. One important type of user-generated content is the blog. Blogs encompass useful informa-
tion (e.g., insightful product reviews and information-rich consumer communities) that could potentially be a
gold mine for business intelligence, bringing great opportunities for both academic research and business
applications. However, performing business intelligence on blogs is quite challenging because of the vast
amount of information and the lack of commonly adopted methodology for effectively collecting and analyzing
such information. In this paper, we propose a framework for gathering business intelligence from blogs by
automatically collecting and analyzing blog contents and bloggers’ interaction networks. Through a system
developed using the framework, we conducted two case studies with one case focusing on a consumer product
and the other on a company. Our case studies demonstrate how to use the framework and appropriate tech-
niques to effectively collect, extract, and analyze blogs related to the topics of interest, reveal novel patterns
in the blogger interactions and communities, and answer important business intelligence questions in the
domains. The framework is sufficiently generic and can be applied to any topics of interest, organizations, and
products. Future academic research and business applications related to the topics examined in the two cases
can also be built using the findings of this study.
Keywords: Business intelligence, Web mining, blog mining, social networks, design science
Introduction
1
There is an explosion of user-generated content on the Web,
attributable to the growth in popularity of Web 2.0 applica-
tions in recent years. The availability of a wide range of user-
friendly Web 2.0 applications allows users to post content on
the Web more easily than ever before. Blogs are one of the
earliest and most popular Web 2.0 applications. Bloggers can
write about almost anything: personal stories, ideas, reviews,
opinions, feelings, emotions, etc. They can also form social
links with other bloggers by joining groups, usually known as
blogrings, based on their shared interests or opinions and by
interacting with one another in different manners, such as by
subscribing to another blogger, commenting on a blog entry
(post), or citing the content of a blog entry. These activities
build the interaction relations between bloggers and their
readers (Lin and Kao 2010) and form a complex social net-
work, which is often called the blogosphere. Information,
ideas, propaganda, and opinions flow and spread in the blogo-
sphere through the interaction and communication between
bloggers (Adar and Adamic 2005; Ali-Hasan and Adamic
2007; Gruhl et al. 2004; Kumar et al. 2005; Nahon et al. 2011).
As a result, blogs have become an important type of online
media, potentially useful for various types of business intelli-
1
Hsinchun Chen, Roger Chiang, and Veda Storey were the accepting senior
editors for this paper. Ee-Peng Lim served as the associate editor.
MIS Quarterly Vol. 36 No. 4, pp. 1189-1216/December 2012 1189
Chau & Xu/Business Intelligence in Blogs
gence analysis. For instance, by analyzing the blog contents
of its stakeholders (e.g., customers or pressure groups), a
company can obtain first-hand knowledge of customers’
feedback about its products and services (Liang et al. 2009),
about its own or its competitors’ brand images (Chau et al.
2009; Pikas 2005), or about what is happening in the external
environment (Chung et al. 2005). In addition, by analyzing
characteristics and dynamics of blogger communities, it is
possible to study the formation, growth, and evolution of
online consumer networks and identify new ideas in the
blogosphere (Chau and Xu 2007). These insights enable com-
panies and organizations to make better decisions on critical
business matters such as investments (O’Leary 2011), mar-
keting (Kozinets et al. 2010), and planning (Lewis 2008).
Studying the linkage and social structure in the blogosphere
is an important topic for both researchers and business prac-
titioners. Academically, it is important to study the nature
and topology of the social networks of bloggers and compare
them with other online social networks. Such studies will
reveal their characteristics and help improve our under-
standing of information flow and dissemination in these
networks. In practice, companies can identify the clusters of
customers for their products and services and conduct target
marketing to these groups. For example, companies can find
the most influential people in their consumer networks and
devise more effective and efficient marketing strategies
accordingly (Yang and Counts 2010; Zhu and Tan 2007).
Although blogs provide considerable potential for business
intelligence, two unique characteristics of blogs present major
challenges for collecting blog data, evaluating blog content,
and analyzing the underlying social networks. First, blogs are
dynamic and are frequently updated. Contents and linkages
can be added or removed any time. Second, bloggers have
their own styles of linking to each other. These linkages,
which represent the interactions between bloggers, are dif-
ferent from traditional hyperlinks between Web documents.

Consequently, automated techniques are needed to collect and
analyze the sheer volume of blog data in order to have a good
understanding and make effective use of the underlying infor-
mation and structure. Most previous studies have focused on
traditional forms of online content such as Web pages or
forum posts (Abbasi and Chen 2008; Chen et al. 2001; Cooley
et al. 1997; Sack 2000; Viegas and Smith 2004). These
studies have shown that automated analysis and visualization
systems are very useful for obtaining a quick understanding
of the contents and social interactions in online communities.
This research intends to tackle the challenges of generating
business intelligence based on blogs. In this paper, we pre-
sent our design of a framework and a system for content and
social network analysis of blogs. Following the design
science methodology described in Hevner et al. (2004), we
incorporate automated data collection, content analysis, and
social network analysis of blogs in our design.
The rest of the paper is organized as follows. We first review
the characteristics of blogs and the blogosphere and their
potential value for business intelligence. We also discuss the
importance of conducting content and network analysis on
blogs and related techniques. The following section intro-
duces the proposed framework, and we discuss how the
framework was used to guide our design of the blog mining
system, which is then presented. To provide a proof-of-
concept evaluation for our framework, we present two case
studies in which we applied the framework for business intel-
ligence and report our findings. We conclude our research
and suggest some future research directions in the final
section.
Blogs and Blog Content Analysis
Blogs in the early days were primarily Web pages containing
links to other useful resources and were usually maintained
manually (Blood 2004). When free blog software and blog
hosting sites became widely available, the number of blogs
grew significantly. People use personal blogs to record their
daily lives and express their opinions and emotions (Gill et al.
2009; Nardi et al. 2004). Some corporations and organiza-
tions create and maintain corporate blogs to interact with their
customers, suppliers, and other stakeholders (Liang et al.
2009; Tsai et al. 2007). For example, Microsoft has created
blogs for MSDN to inform developers about the company’s
latest developments.
Early research on blogs has focused on studying the charac-
teristics of blogs and bloggers, such as the demographics of
bloggers (Adar and Adamic 2005; Ali-Hasan and Adamic
2007; Gruhl et al. 2004; Kumar et al. 2005), blogging
behavior (Nardi et al. 2004), or the blogging process (Blood
2004). To extract valuable knowledge from blogs, various
data and text mining techniques have been proposed to collect
and analyze blog contents. Different models have been
proposed for identifying blog topics (Agarwal et al. 2010;
Kumar et al. 2010; Tsai 2011) and opinions and sentiments
expressed in blogs written in English (Abbasi et al. 2008) and
non-English languages (Bautin et al. 2008; Feng et al. 2009).
The Text Retrieval Conference (TREC) has organized a blog
track and attracted researchers’ interest in blog content analy-
sis (Macdonald et al. 2010). TREC has created two large blog
1190 MIS Quarterly Vol. 36 No. 4/December 2012
Chau & Xu/Business Intelligence in Blogs
corpuses, and various classification-based and lexicon-based
techniques have been proposed for finding opinions and senti-
ments relevant to a given topic using these corpuses (e.g., He
et al. 2008; Lee et al. 2008; Zhang et al. 2009).
Blogger Communities and Social
Network Analysis
Bloggers are connected in various ways such as subscriptions,
comments, and citations, forming networks of bloggers.
Many blogger communities, which can be categorized into
explicit communities and implicit communities, exist in the
blogosphere. Explicit communities are often called blogrings
or groups. Most blog hosting sites allow bloggers to create a
new group or join any existing groups. In contrast, implicit
communities are not explicitly defined as groups or blogrings,
but are formed organically by the interactions among
bloggers. For instance, a blogger may subscribe to another
blog, hoping to get notifications when the subscribed blog is
updated. A blogger can also post a link to or leave comments
on another blog. These connections signify the social inter-
actions among bloggers. Because such interactions are rather
different from simple hyperlinks between Web pages, these
blogger communities, which involve social interactions
between online users and are characterized by memberships,
sense of belonging, relationships, shared values and practices,
and self-regulation (Erickson 1997; Roberts 1998), are more
similar to virtual communities of users than to the traditional
cyber communities of Web documents (Kumar et al. 1999).
An online survey (Ali-Hasan and Adamic 2007) revealed that
different types of relationships between blogs have different
characteristics and play different roles in facilitating inter-
actions between bloggers. By analyzing these relationships,
the hidden social structure that may represent the real social
relationships between bloggers can be extracted (Tang et al.
2012; Tang et al. 2009).
Social network analysis (SNA) is a sociological methodology
(Wasserman and Faust 1994) that can be used to reveal
patterns of relationships and interactions and discover the
underlying social structure in the blogger communities. In the
following, we will review the three major types of analyses in
SNA, namely topological analysis, centrality analysis, and
community analysis.
Topological analysis is used to find the structural properties
of a network, which is often represented by a set of nodes
connected by links. Some widely used statistics, such as the
average shortest path length, efficiency, clustering coefficient,
and degree distribution, can be used to characterize the net-
work (Albert and Barabási 2002; Crucitti et al. 2003). Three
models have been proposed to characterize the overall top-
ology of a network, namely, random graph model (Bollobás
1985), small-world model (Watts and Strogatz 1998; Xu and
Chau 2006), and scale-free model (Barabási and Albert 1999).
Different network topologies have different implications for
the functions of a network (Albert and Barabási 2002).
Centrality analysis aims to find the key nodes in a network.
Central nodes often play an important role by providing
leadership or bridging different communities. Traditional
centrality measures such as degree, betweenness, and close-
ness can be used (Freeman 1979). In the context of blogs,
degree centrality, which is defined as the number of direct
interactions a blogger has made, measures how active a
particular blogger is. “Popular” bloggers with high degree
scores are the leaders, experts, or hubs in a blogger network.
Betweenness centrality measures the extent to which a blog-
ger lies between other bloggers in a network. The between-
ness of a blogger is defined as the number of geodesics
(shortest paths between two nodes) passing through it.
Bloggers with high betweenness scores often serve as bridges
and brokers between different communities. They are impor-
tant communication channels through which information is
spread. Closeness centrality is the sum of the length of geo-
desics between a particular blogger and all other bloggers in
a network. A blogger with low closeness may find it very
difficult to communicate with other bloggers in the network.
Such nodes are thus more peripheral and can become outliers
in the network (Xu and Chen 2005).
Community analysis is intended to identify implicit commu-
nities in social networks. A subset of nodes in a network is
considered a community or a social group if nodes in this
group have denser links with nodes within the group than with
nodes outside the group (Wasserman and Faust 1994). Com-
munity analysis finds implicit communities in a network by
maximizing within-group link density while minimizing
between-group link density. In the context of blogger net-
works, these implicit communities represent the real inter-
actions (e.g., subscription and comment) between the bloggers
and may reveal more important business intelligence informa-
tion than the explicit groups. Researchers have proposed
techniques to detect implicit communities of bloggers. Lin
et al. (2006) defined blog communities based on mutual
awareness and extracted them using a PageRank-based algo-
rithm. Bulters and de Rijke (2007) utilized both link and
content information of blogs to identify communities. As
community detection can be seen as a graph problem, graph-
based algorithms have also been developed to find the experts
and communities in blogs (e.g., Lakshmanan and Oberhofer
2010; Liu et al. 2011).
MIS Quarterly Vol. 36 No. 4/December 2012 1191
Chau & Xu/Business Intelligence in Blogs
In addition, as relationships among blogs can enable and
facilitate various network functions and processes (e.g.,
information dissemination, innovation diffusion, and knowl-
edge sharing) in the blogosphere (Adar and Adamic 2005;
Gruhl et al. 2004), it is important to study how a blogger
network’s structural properties affect the outcomes of these
processes. Papagelis et al. (2009) proposed a model to study
the information dissemination in the blogosphere and the
effects of different factors on the diffusion process. Some
studies have attempted to identify the bloggers who are the
most important in the information dissemination process using
graph algorithms (Agarwal et al. 2008; Mathioudakis and
Koudas 2009).
A Design Science Approach
Our objective in this research is to design, implement, and
apply a framework for generating business intelligence based
on blog data. We adopt the design science methodology. In
this section, we present the design of our framework and
system (i.e., the artifacts that address the problem of business
intelligence analysis based on blogs). Hevner et al. (2004)
provided seven guidelines for conducting effective and high-
quality design science research in the field of information
systems. It is suggested that these guidelines be followed
closely to ensure that the research process and outcome are
scientific. In the following, we discuss how our current
research has followed and addressed these guidelines.;
• Design as an Artifact: Both our framework and our
system are artifacts for addressing the problem, which is
to gather and generate business intelligence from con-
sumer blogs. The artifacts can be applied to different
domains, and future research and applications can be
built upon them.
• Problem Relevance: As discussed earlier, business
intelligence based on user-generated contents is highly
useful for decision making at various managerial levels
in organizations in the current fast-evolving business
environment. Analyzing business intelligence can reveal
valuable information and discover novel knowledge
critical to the success of a business (Abbasi et al. 2008;
Chau et al. 2007; McGonagle and Vella 1999).
• Design Evaluation: A design must be evaluated in order
to show its usefulness and quality. In this research, we
use an observational evaluation method to evaluate the
design (Hevner et al. 2004). In particular, we conduct
two case studies, which will be reported in later sections,
as a proof-of-concept to demonstrate the feasibility of our
approach and the value of the design (Albert et al. 2004).
• Research Contributions: The main contributions of this
research are twofold. First, we demonstrate the feasi-
bility and usefulness of applying our framework to blog
mining for business intelligence. We investigate how
content analysis and social network analysis can reveal
useful information for business. We have created two
artifacts, namely the business intelligence analysis frame-
work and the blog mining system. Second, by con-
ducting the two case studies, we improved our under-
standing of the characteristics and networks of the blogs
on a consumer product and a company, and reported
some interesting findings.
• Research Rigor: This research relies on rigorous ele-
ments from multiple academic fields, including business
intelligence, marketing, information retrieval, social
network analysis, Web mining, and system design. Both
the construction and evaluation of the artifact are based
on the knowledge base from these fields.
• Design as a Search Process: Our framework design and
the application of techniques is a scientific process in
which we searched for a potential solution to address the
problem of generating business intelligence by mining
blog contents and blogger community structure. In the
early stages of our research, we obtained initial feedback
from users and the design was revised a number of times.
We iteratively revised the design in order to search for
the best artifact that served our purpose.
• Communication of Research: We present the research in
this paper to both technology-oriented and management-
oriented audiences. Both the artifacts and the evaluation
study are presented in this paper in the following sec-
tions, such that both can be easily replicated by
researchers or practitioners.
The Framework for Analyzing Business
Intelligence in Blogs
In this section, we present our proposed framework for con-
ducting business intelligence collection and analysis of blogs
on a topic of interest, such as a consumer product or an
organization. Our framework, shown in Figure 1, consists of
the following steps (components):
1192 MIS Quarterly Vol. 36 No. 4/December 2012
Chau & Xu/Business Intelligence in Blogs
Figure 1. The Framework for Collecting and Analyzing Business Intelligence in Blogs
1. Identify the explicit communities of a topic of interest.
2. Collect information about bloggers in the explicit com-
munities.
3. Analyze the content posted by the bloggers.
4. Analyze the interaction networks and implicit commu-
nities formed by the bloggers.
In the remainder of this section, we describe each step in this
framework.
Identify the Explicit Communities
of a Topic of Interest
After deciding on the topic of interest, one can begin finding
the explicit communities on this topic in the blogosphere.
These explicit communities are represented by blogrings or
interest groups, which are searchable on some blog hosting
sites. These communities’ information and membership can
be retrieved either manually or using a software program such
as a Web spider or crawler, depending on the number of com-
munities. Each community can be manually examined for its
relevance, authenticity, validity, and suitability to ensure data
quality. This involves reading the description and sampled
contents of these explicit communities and selecting the ones
to be retrieved and analyzed. The communities can also be
manually classified according to their characteristics, such as
their attitudes toward the topic of interest.
Collect Information about Bloggers
in the Explicit Communities
After the set of explicit communities has been examined, one
can gather their member lists and collect a massive amount of
data about these members such as profiles, blog entries, and
interaction patterns. Unless the number of members is very
small, it is nearly impossible to complete this task manually.
A blog spider program can be employed to automate this task.
The blog spider starts by collecting the description page and
retrieving the list of members of the explicit communities.
The bloggers’ URLs are then extracted and stored into a
queue for fetching. The blog spider can be designed to follow
1. Identify the explicit
communities of a topic of interest
2. Collect information
about bloggers in the
explicit communities
3. Analyze the
content posted by
the bloggers
4. Analyze the
interaction networks
and implicit commu-
nities formed by the
bloggers
MIS Quarterly Vol. 36 No. 4/December 2012 1193
Chau & Xu/Business Intelligence in Blogs
only links that are of interest, such as blogger profile pages,
blog entry pages, and comment pages, and to exclude pages
such as online advertisements. If the blog hosting site sup-
ports RSS (really simple syndication), it is possible for the
blog spider to easily retrieve blog information and contents in
the form of Web feeds. Similar to standard Web spiders,
multithreading or asynchronous I/O can also be used such that
multiple blog pages can be downloaded in parallel (Chau et
al. 2005). This can avoid bottlenecking the process if a
particular Web server is sending a malicious response or not
responding at all. After a page is downloaded, it can be
stored into a relational database or as a simple file. The
spider can terminate when a specific number of blog entries
have been retrieved or when the data of all bloggers of
interest have been collected.
Analyze the Content Posted by the Bloggers
A downloaded blog page has to be processed to extract useful
information. Blogs may be downloaded in HTML or XML
format, depending on the blog hosting site. As a blog page
may consist of more than one blog entry, the page is first
parsed into separate blog entries. This can be done by simple
string matching techniques. For example, some HTML for-
matting tags can be used to identify the beginning of a
particular blog entry or comment, depending on the format of
the blogs being analyzed. Useful information is then ex-
tracted, including the blogger’s age, gender, country of resi-
dence, and the blog creation date. As blogs, even those
hosted on the same site, may have different layouts, it is not
trivial to extract such information from blogs in HTML
format. For blogs in XML format, usually only partial infor-
mation is available. Fortunately, some standard data like
blogger name and blog entries are often put into specific
formats (e.g., as a sidebar or in a table) in the HTML files in
large blog hosting sites, and simple rules are often sufficient.
Text analysis and Web content mining algorithms, such as
linguistic analysis, text classification, or text clustering, can
then be applied on the blog entries. Simple analysis includes
term frequency analysis and extracting sentences that contain
particular keywords of interest. Other text mining techniques
such as topic and opinion analysis can also be applied on the
contents collected (Abbasi et al. 2008).
Analyze the Interaction Networks and Implicit
Communities Formed by the Bloggers
The profile page and blog entry pages downloaded contain
traces of interactions between bloggers. These interactions
can be found in different sections of a blog page. For
example, subscription links are often located in the blogroll
on the left sidebar of a page; comment links are found in the
comment section of a blog entry. These links can be extracted
from the HTML blog page based on simple pattern matching.
Based on all of the interaction links, one can automatically
construct the networks formed by these links using software
programs. Business intelligence information can be revealed
by conducting social network analysis on these networks.
Topological, centrality, and community analysis, as discussed
earlier, can be applied to these networks to find useful, novel
patterns. SNA statistics can be automatically calculated and
visualization programs can be used to display a graphical
notation of the networks. All of the analysis results can then
be manually interpreted by business analysts. Depending on
the purpose of the analysis, this may involve studying the
profiles of selected bloggers of interest, investigating their
link structures, and reading their blog entries.
Case Studies
Two case studies are presented as a proof-of-concept in
applying our framework. In the case studies, we show how
our framework can be applied to collect and analyze the char-
acteristics and structural properties of consumer communities
in blogs and help generate business intelligence. Apple’s
iPod music player and Starbucks are chosen as the topics of
our two case studies, which are discussed in detail in the
following sections..
In the case studies we demonstrate how we applied the pro-
posed framework to study the following important questions
for business intelligence:
1. What are the characteristics of the contents of the
consumer blogs? How are they related to the product or
company of interest?
2. What are the characteristics of the interaction networks
of bloggers?
3. Who are the central bloggers in these networks? Are
these bloggers effective in disseminating information?
4. Do the implicit communities formed by different types of
interactions demonstrate different properties? Which
types of interactions are more important in shaping the
communities?
1194 MIS Quarterly Vol. 36 No. 4/December 2012
Chau & Xu/Business Intelligence in Blogs
Case Study 1: iPod
The topic we selected for the first case study was Apple’s
iPod music player. We chose this product as the topic of
interest due to its popularity with young people, who are
major bloggers. Analyses of such consumer communities in
blogs may provide relevant companies and organizations
important insights into the characteristics of their current and
potential consumers (Baker and Green 2005; Chevalier and
Mayzlin 2006; Kozinets et al. 2010) and help them better
market their iPod-related products and services.
Data Set
We chose Xanga (www.xanga.com) as our source of blog
data. According to Alexa,
2
Xanga is the second most popular
blog hosting site after the Google-owned Blogger
(www.blogger.com). It is also ranked 17
th
in traffic (visit
popularity) among all Web sites in English. Xanga was
chosen over Blogger because Xanga has more prominent
features to support subscriptions and groups, and these fea-
tures are useful for identifying consumer groups in the blogs
and the interactions between bloggers.
We used the onsite search engine to manually identify the
online iPod consumer communities in blogs. We first
searched for all the blogrings on Xanga that contained the
word “iPod” in their titles or descriptions and retrieved 315
blogrings (groups). We then manually examined the details
of these blogrings and those that were irrelevant to iPods or
invalid were discarded. Groups with only a single member,
generally formed by one blogger with no one else joining,
were also removed from our list. Our final data set consisted
of 204 valid groups. For each group, we read its group
description and classified it as having a positive, negative, or
neutral attitude toward iPods. The top 20 largest groups are
shown in Table 1.
There were 3,493 bloggers in total in this data set. Each
blogger maintained one blog, which may contain multiple
blog entries. In total there were 75,445 blog entries. Our
system automatically fetched and extracted the bloggers’
basic information, their blog entries, and their relationships.
Because all blog pages were from the same blog host, we used
a program based on some simple pattern matching rules (e.g.,
based on occurrences of some particular HTML tags or
headings) to extract the required information from these
pages. The basic information about a blogger included the
user ID, name, date of birth, city, state, country, and date of
registration. Among the 3,493 bloggers, 2,603 indicated their
gender. Although these self-reported data may not be very
reliable, they provide a rough picture of the sample of
bloggers in the blogging consumer groups. The attitude of
each blogger toward iPods was also determined based on the
attitude of the groups to which the blogger belonged. In our
data we found 2,377 bloggers with a positive attitude toward
iPods, 225 with a negative attitude, and 891 were neutral.
Content Analysis
We examined the contents of the blogs to ascertain whether
and to what degree they were relevant to iPod, our topic of
interest. As a preliminary analysis, we measured the rele-
vance by looking at the number of times (word frequency)
that the word iPod was mentioned in each collected blog. We
found that the blog with the highest frequency mentioned the
word iPod 345 times. However, after careful examination of
this blog, we determined that this blog was a splog, which is
a type of spam blog used to trick search engines and
artificially boost the traffic to other Web sites.
After filtering out similar splogs, we found that the highest
word frequency in the legitimate blogs was 115 while the
lowest was 0. We plot the percentage of bloggers (i.e., blogs)
in logarithm scales against the word frequency in Figure 2. In
the chart, we can see that a large percentage of bloggers
mentioned the word iPod sparingly, if at all, in their blogs,
while only a small percentage of bloggers used the product’s
name frequently.
We found that 1,573 bloggers mentioned the word iPod at
least once in their blogs. The word iPod appears 10,572 times
in total in the postings of these bloggers. On the other hand,
1,920 bloggers never mentioned the word iPod in their blogs,
although they joined at least one of the iPod-related blogrings
identified in our study. This represents more than half
(55.0%) of all the bloggers in our data set. This finding is
intriguing because it implies that it is not possible to reach
these bloggers through standard keyword-based searches
(e.g., searching the word iPod in a blog search engine like
Technorati or Google Blog Search). These bloggers can only
be identified by their group memberships or other approaches.
We further examined the content of the blogs by extracting all
the sentences that contain the word iPod. We sampled a
random set of 300 of these sentences. Out of this set, 296
sentences talk about the blogger’s interaction with an iPod
and are neutral toward iPod (e.g., “I want an ipod,” “I spent
2
“Top English Language Sites” (http://www.alexa.com/site/ds/ top_sites?
ts_mode=lang&lang=en; accessed June 14, 2008).
MIS Quarterly Vol. 36 No. 4/December 2012 1195
Chau & Xu/Business Intelligence in Blogs
Table 1. The Top 20 Largest Groups for iPod
Group Title
Number of
Members Group Description Category
I Can't Live Without My
iPod!
677
You can't live without your iPod? You bring your iPod to wherever you
go? Do you listen to your iPod when you take a sh*t? You feel weird
and naked whenever your iPod is not with you? Do you feel like you
can't function normally without your iPod? Are you crazy about your
iPod? You're not alone. Join!
Positive
i

Business Intelligence In Blogs Understanding Consumer Interactions And Communities

Attachments