Empirical Comparison - Modeling Toothpaste Brand Choice

Description
The purpose of this study is to compare the performances of Artificial Neural Networks (ANN) and Multinomial Probit (MNP) approaches in modeling the choice decision within fast moving consumer goods sector.

1
Modeling Toothpaste Brand Choice: An Empirical Comparison of Artificial Neural Networks and
Multinomial Probit Model

Tolga Kaya
*

Management Engineering Department, Istanbul Technical University.
Macka, Besiktas, Istanbul, 34367, Turkey.

Emel Akta?
Industrial Engineering Department, Istanbul Technical University.
Macka, Besiktas, Istanbul, 34367, Turkey.

?lker Topçu
Industrial Engineering Department, Istanbul Technical University.
Macka, Besiktas, Istanbul, 34367, Turkey.

Burç Ülengin
Management Engineering Department, Istanbul Technical University.
Macka, Besiktas, Istanbul, 34367, Turkey.

Abstract
The purpose of this study is to compare the performances of Artificial Neural Networks (ANN) and Multinomial
Probit (MNP) approaches in modeling the choice decision within fast moving consumer goods sector. To do this,
based on 2597 toothpaste purchases of a panel sample of 404 households, choice models are built and their
performances are compared on the 861 purchases of a test sample of 135 households. Results show that ANN’s
predictions are better while MNP is useful in providing marketing insight.
Keywords: Brand choice modeling, artificial neural networks, multinomial probit, toothpaste, household panel

*
Corresponding author: [email protected], Tel: +90 212 2931300 ext. 2789.
1. Introduction
Due to the emergence of a strong trend towards the
utilization of behavioral-based knowledge of consumer
behavior, scanner panels which provide transactional
data and consumer profile databases have recently
gained more importance. The researchers which used to
focus on the impacts of subjective aspects like cultural
values, attitudes and psychological factors on the choice
behavior turned their focus on measurable parameters
like prices, purchase frequency, and average purchase
size. Consequently, the effort of using behavioral data
towards developing decision tools for planning
marketing activities have resulted in numerous different
modeling applications based on both statistical and non-
statistical approaches.
It is critical for businesses to have successful
estimations on the choices of their potential customers.
Market share forecasts are vital for not only producers
but also media planners and retailer companies.
Modeling studies may be quite useful as brand choice
decisions are usually associated with multiple variables
at the same time. These variables may differ from
relative prices, intensity of advertisements, levels of
customer loyalty, and consumer characteristics to the
usage and intensity of promotion activities (e.g. price
cuts, couponing, display etc.) offered by the producers
or retailers. There are compensatory linear models for
determining consumer preferences, attitudes, judgments,
and decision making process; namely regression
models, analysis of variance, discriminant analysis, and
structural equation modeling. The main issue with these
models is the fact that preference structure of the
T. Kaya et al. forthcoming in International Journal of Computational Intelligence Systems

consumers is not linear and their judgments are not
based on compensatory rules.
1-2

Multinomial logit model (MNL) is a non linear model
which has been found to be a robust modeling tool in
forecasting brand shares in terms of modeling the
consumers’ choice probabilities. However, as the
number of brands analyzed increase MNL may have
classification problems.
3
More importantly, MNL model
requires independence of irrelevant alternatives (IIA)
principle to hold. According to IIA assumption, the
probabilities of choosing the existing alternatives should
equally be affected by the entrance of a new alternative
to the choice set.
4
In practice, within fast moving
consumer goods (FMCG) industry, this principle rarely
holds since new brand launches in a specific FMCG
category seldom have the same effects on the existing
brands. This drawback of MNL limits the usage of the
model in many real life cases.
An alternative to MNL is the multinomial probit model
(MNP), which assumes the errors are distributed
multivariate normal with mean 0 and a covariance
matrix, thus, does not require IIA to hold. Despite this
advantage over MNL, MNP has also its own
disadvantages in computational difficulties. Until the
last decade, researchers had to deal with the multiple
integrals of MNP in order to make estimations. As the
number of alternatives increased, it became practically
impossible to handle the calculations. In recent years,
some statistical software packages like STATA and
LIMDEP started providing MNP estimations. Although
this increased the usage of the MNP modeling in
practice, it should be noted that, estimation of a MNP
model using econometric software may still take
thousands of times longer than that of a MNL model.
In order to overcome the limitations of MNL and MNP,
more general, non/semi-parametric, non-linear
regression models capable of modeling nonlinear utility
functions without a priori knowledge of relationships
can be used. ANN is one such model that can be used to
predict the consumer brand choice behavior. Despite
having a relatively short history in consumer behavior,
there are many studies on brand choice modeling using
ANN as an alternative analysis tool.
2,5
The advantage of
ANN is that it does not have specification bias and it
can be used to model highly complex relationships.
However, the difficulty in interpreting the results
combined with the fact that it does not provide an
explanation on how it finds the outcomes are the
reasons why it is regarded as a black box. When
studying consumer behavior, interpretability may often
be as much important as the prediction performance.
The aim of this study is to compare the performances of
ANN and MNP approaches in modeling the brand
choice decision within Turkish fast moving consumer
goods sector. In order to do this, initially, ANN and
MNP models of brand choice are built based on 2597
real toothpaste purchases of a model sample of 404
households. In these models, variables, which were
found to be significant in explaining the brand choice in
Turkish toothpaste market, namely relative prices,
socio-economical status, brand loyalty, and household
size were used as inputs. After the models were built
and the estimations were realized, the performances of
these models were compared in terms of hit-rates
(successful predictions of the actual choices) and market
share prediction on the 861 purchases of a randomly
selected test sample of 135 households. The
transactional data was obtained from a diary based
consumer panel company which keeps the tracks of
shopping behavior on more than 100 FMCG product
categories in Turkey since 1997.
Along with the theories of chaos, evidence and fuzzy
sets, neural networks and discrete choice probabilistic
computing are among the most widely used
methodologies in establishing computational
intelligence systems. This study makes use of two of
these methodologies, ANN and MNP, in order to model
the choice behavior of Turkish toothpaste consumers.
As ANNs are able to handle the nonlinearities within
the data structures, due to the nature of the sector under
consideration, they may provide better predictions than
probabilistic modeling. This gives birth to a necessity of
sector specific modeling applications conducted in a
comparative manner. Suggesting a solution to the
missing price data in diary mode panels, to the authors’
knowledge, this study is the first application of ANN
modeling based on diary based household panel data.
The rest of the paper is organized as follows: In section
two, a brief literature review on brand choice modeling
using multinomial models and ANN is given. In the
third section, theoretical backgrounds of MNP and ANN
methodologies are summarized. Section four contains a
comparative case study conducted in Turkish toothpaste
market based on consumer panel data. Finally, in the
fifth section concluding remarks are given.
Brand Choice Modeling

2. Literature Review
The object of consumer choice models is to model the
purchase behavior of consumers and more specifically,
to model the procedure of purchase decision. A question
of continuing interest to researchers and practitioners
has always been how marketing mix variables affect
different consumers’ buying behavior. With the
proliferation of scanner panel data usage in the middle
80s, an important number of statistical brand choice
models have been developed to determine the effects of
marketing tools such as pricing, promotions and
advertisements on the brand sales, shares, and profits.
6-8
One of the first attempts to build a multinomial logit
model of brand choice based on household scanner
panel data was the study of Guadagni and Little
9
the
success of which was attributed in part to the level of
detail and completeness of the consumer panel data
used, which had been gathered through scanning of the
barcodes in retailers. Following Guadagni and Little, a
number of researchers made important contributions to
the brand choice models based on scanner data, by
separating the purchase decision process into different
levels. Targeting to decompose sales increases, Gupta
6

proposed a method within which brand sales were
considered the result of consumer decisions about when,
what, and how much to buy. Leaning on the assumption
that “a customer decides to purchase a product category
first and, if so, buys a particular brand”, Guadagni and
Little
10
rebuilt a nested logit model with the same
ground coffee data they employed in their 1983 paper.
Bucklin et al.
11
developed a joint approach to segment
households on the basis of their response to price and
promotion in brand choice, purchase incidence, and
purchase quantity decisions. Most probably the biggest
portion of the statistical brand choice models literature
is devoted to the evaluation of the effectiveness of price
cuts and other promotional activities. In shaping this
story, Neslin et al.
12
, were one of the first researchers in
addressing the question of “borrowing from future
sales” via promotions. Mela et al.
13
examined the long
term effects promotion and advertising on consumers’
brand choice behavior. Another study extended the
analysis by taking the consumer stockpiling behavior
into consideration.
14
In the model of J edidi et al.
15
,
instead of brand sales or shares, the analysis unit was
profitability. Pauwels et al.
16
calculated the long term
equivalent of Gupta’s breakdown of promotional effects
and found a reversal of the importance of category
incidence and brand choice. While Klapper et al.
17
was
focusing on the loss aversion in brand choice data,
Silva-Risso and Bucklin
18
developed a logit modeling
approach to assess the effects of coupon promotions on
consumer brand choice. Leaning on scanner data, van
Heerde et al.
19
investigated the short-term and long-term
effects of the price war between retailers. When
studying the sensitivity of the consumers to the prices,
some researchers took both the demand and the supply
(manufacturers and retailers) sides into consideration.
20-
21

Although ANN has a relatively short history in
modeling brand choice and consumer behavior, it has
been widely used in consumer decision making to
predict shopping behavior. Agraval and Schorling
3

compared the forecasting ability of ANN with MNL in
the context of frequently purchased grocery products.
West et al.
22
explored the advantages and the
disadvantages of ANN relative to statistical modeling
procedures in predicting consumer choice. Bentz and
Merunka
5
developed a hybrid approach which combines
ANN and MNL into a single framework in the brand
choice modeling context. Hruschka et al.
23-24
specified
deterministic utility by means of a certain type of neural
net for discovering nonlinear effects on brands’ utilities
and compared the performance of this model with
different MNL models. Hu and Tsoukalass
25
used neural
network models and the ensemble technique of stacked
generalization to investigate the relative importance of
situational and demographic factors on consumer
choice. Fish et al.
26
introduced a new architectural
approach to ANN choice modeling and used a feed-
forward ANN trained with a genetic algorithm to model
individual consumer choices and brand share in a retail
coffee market. Vroomen et al.
27
proposed a two step
ANN choice modeling framework in the first step of
which they took consideration sets of the households
into account. Hruschka
28
introduced a MNP model
which combines heterogeneity across households with
flexibility of the deterministic utility function which is
approximated by a multilayer perceptron neural net.
3. Methodology
In this research, diary based household panel data is
used to build MNP and ANN models of consumer
choice. Initially, MNP model is established in order to
determine the relevant and significant variables of
consumer choice. Secondly an ANN based on the same
T. Kaya et al. forthcoming in International Journal of Computational Intelligence Systems

inputs (independent variables) is built to predict the
consumer choice. Thirdly, performances of these two
models are compared in terms of hit rates and market
share estimations. Finally, a sensitivity analysis is
conducted to see the change in choice probabilities with
respect to different price and socio-economical status
levels. A framework of the methodology is given in Fig.
(1).

Figure 1 Framework of the proposed methodology
In the following subsections, a brief theoretical
background on MNP and ANN will be provided.
3.1. Multinomial Probit Model
Modeling the brand choice, researchers have to adopt
the appropriate models of consumer decisions among
multiple product alternatives. In many cases
multinomial logit (MNL) and multinomial probit (MNP)
statistical models meet this requirement as each may be
derived from economic theories of utility maximization.
In a multi-brand category, assume household i’s utility
for brand j, U
ij
(i = 1, …, n; j = 1, …, p); is a function
of household attributes and a stochastic error. A typical
representation is:
29

ij i j ij
X U c | + ' =
, (1)
where X
i
is a vector of household characteristics. The
probability that a particular consumer will choose a
particular alternative is given by the probability that the
utility of that alternative to that consumer is greater than
the utility of all other alternatives to that consumer.
4
The
probability that a household i will choose brand j is
given by:
29

¿
=
'
'
= =
p
k
i k
i j
i j
X
X
X j choice P
1
) exp(
) exp(
) , (
|
|
|
(2)
A well known specification test for determining the
validity of the IIA property is the Hausman test. The test
statistics is asymptotically ?
2
distributed. The IIA
assumption is rejected for large values of Hausman
statistics.
30-31
In case of rejection, alternative models
such as MNP or nested logit will be needed.
32
On the
other hand, the MNP assumes that the errors are
multivariate normally distributed, with mean 0 and
covariance matrix ?. The probabilities are written:
*
1
*
1
*
1
*
1
*
,..., ) ,..., (
... ) , , (
* *
1
* *
1
÷ ÷
· ÷ · ÷
c c
= E =
} }
÷
ij i ij i
X X
i j
f
X j choice P
j
c c c c
|
| |
(3)
where f(
.
) is the probability density function of the
multivariate normal distribution.
29

In choice models, accuracy can be measured either in
terms of the fit between the calculated probabilities and
observed frequencies or in terms of the model’s
performance of forecasting observed responses.
33
One

Determination of the category
and panel sample

Multinomial
probit model

Artificial Neural
Network

Performance comparisons using
test sample
- Hit rates
- Overall and monthly market
share predictions

Sensitivity Analysis based on
price and SES levels
Generation of complete price
data

Random divison of
sample into model and
test samples

Brand choice
modeling using model
sample
Brand Choice Modeling

of the most widely used goodness of fit measures in
brand choice models is the ?
2
statistic suggested by
McFadden. Given that the loglikelihoods of the
restricted and unrestricted models are LL
0
and LL
F

respectively, the ?
2
statistic can be written as:
0
2
1
LL
LL
F
÷ = µ
(4)
As the ?
2
statistic increases, the accuracy level of the
model in question increases.
34

In probabilistic choice models, it is also useful to look at
the proportion of successful predictions of the choices
made. A table of success can be prepared for a case of
m alternatives. Using this table, given that N
ii
is the
number of correct predictions for alternative i, a
commonly used statistics can be calculated:
) (
1
1 ..
1 ¿
=
=
m
i
ii
N
N
S
(5)
This statistics is simply the total number of choices that
were predicted correctly divided by all choices.
33

Finally, keeping in mind that a choice model predicts a
probability of purchase for each observation and any
given brand, Guadagni and Little
9
(letting s denote the
predicted share and n the number of observations)
suggests a calculation of standard error of the predicted
share as below:
n p s
n
i
t
/
1
¿
=
=

n p p s SE
n
i
t t
/ ) 1 ( ) (
2 / 1
1
(
¸
(

¸

÷ =
¿
=
(6)
3.2. Artificial Neural Networks
A variety of problem areas are modeled using ANN
35-37

and, in many instances, ANN has provided superior
results compared to the conventional modeling
techniques.
38
It is published by several researchers that
ANN performs excellently on pattern recognition tasks
and its potential advantages have been addressed in the
literature.
39-41
ANN performs better in the presence of
extreme values and its estimation process can be
automated. However regression and ARIMA models
must be re-estimated periodically as new data is
obtained. ANN outperforms the traditional methods in
problem domains with non-linear relationships
42
; in fact,
it could be said that ANN is primarily used for complex
non-linear mapping purposes
43
.
The basic model of ANN consists of computational
units, which as a whole mimic the human brain. ANN is
regarded as a black box that takes a weighted sum of all
inputs and computes an output value using a
transformation or output function (Figure 2). The output
value is then propagated to many other units via
connections between units.

Figure 2 Conceptual operation of ANN models
In general, the output function is a linear function – a
threshold function in which a unit becomes active only
when its net input exceeds the threshold of the unit, or a
sigmoid function which is a non-decreasing and
differentiable function of the input. Computational units
in an ANN model are hierarchically structured in layers
and depending upon the layer in which a unit resides,
the unit is called an input, a hidden or an output unit. An
input (output) unit is similar to an independent
(dependent) variable in a statistical model. A hidden
unit is used to augment the input data in order to support
any required function from input or output. In the ANN
literature, the process of computing appropriate weights
is known as ‘‘learning’’ or ‘‘training’’. The learning
process of ANN can be thought of as a reward and
punishment mechanism
40
, whereby when the system
reacts appropriately to an input, the related weights are
strengthened. As a result, it is possible to generate
outputs, which are similar to those corresponding to the
previously encountered inputs. Contrarily, when
undesirable outputs are produced, the related weights
are reduced. The model learns to give a different
reaction when similar inputs occur, thus gearing the
system towards producing desirable results, whilst the
undesirable ones are ‘‘punished’’.

Comparison
Target
ANN including
connections (weights)
between neurons
Output
Adjust weights
Input
T. Kaya et al. forthcoming in International Journal of Computational Intelligence Systems

In this study, a feedforward backpropagation network is
used to model the consumer choice. The training
algorithm was selected to be trainscg, which is a
supervised learning algorithm based on a class of
optimization techniques known as conjugate gradient
methods
44
. The trainscg may require more iterations to
converge than the other conjugate gradient algorithms,
but the number of computations in each iteration is
significantly reduced because no line search is
performed. This algorithm is too complex to explain in a
few lines, see Ref. 44 for a detailed explanation of the
algorithm.
4. Case Study
4.1. Data
Consumer panel data for toothpaste category is used in
the MNP and ANN models. The raw data covers 7,681
toothpaste transactions in approximately 90% (6,943) of
which three main brands were purchased by a panel of
1,955 households. Finally, 3,458 toothpaste purchases
of 539 frequent category buyers are used for the study.
Frequent category buyer is defined as a household who
purchased toothpaste 5 times or more during the
analysis year (2004).
Table 1 Demographic characteristics of the households in the
sample
Socio-economical
status %

Primary shopper
age %
AB 30.1

25- 7.3
C1 33.9

26-35 21.5
C2 20.5

36-45 48.8
DE 15.4

46-55 17.6

56+ 4.8
Primary shopper
education %

Household size %
Illeterate 2.3

2- 3.9
Literate 1.9

3 15.4
Primary school 41.2

4 40.7
Middle school 14.3

5 22.4
High school 34.8

6 7.7
University 5.5

7+ 9.8

The set contains records of complete purchase
information for each household in the panel (e.g.,
household id, brand purchased, price, quantity, place,
time, etc.). In addition, data set includes household
specific information such as socio-economical status,
family size, age, education level, previous brands
purchased, and total FMCG spending. The data does not
have censored observations. In other words, panel
members who either entered or left the panel during the
study period are excluded from the data set. Table 1
gives a summary of the demographic profiles of the
households used in the study:
According to 2004 panel records, three biggest brands
represent more than 90% of the purchase occasions in
toothpaste category. Among these three brands, market
leader (Brand 1) has a share of 55.5% among all the
purchases. Purchase shares of Brand 2 and Brand 3 are
22.2 % and 27.3 %, respectively. There are a number of
small and private label brands competing in Turkish
toothpaste sector, however these brands are not included
in the analysis as they have a limited distribution and
are not supported by similar marketing activities as of
the three biggest brands. Another reason for the
exclusion of the small brands is the difficulty of
generating reliable price and loyalty information due to
limited statistical base.
Table 2 Number of households and purchase observations
before/after data reduction
Number of
households
Number of
purchase
observations
Toothpaste buyers 2030 7681
Buyers of the three
main brands
1955 6943
Frequent buyers
(households
employed in the
study)
539 3458
Model (training)
sample
404 2597
Test (holdout)
sample
135 861

The households in the sample have been randomly
divided into two groups: Model and test samples (Table
2). MNP and ANN models of brand choice are built
based on 2,597 purchase occasions of the model sample
which includes 404 households. The performances of
these models are tested on the 861 purchases of a test
sample consisting of 135 households (25% of the total
frequent buyers’ sample).
Brand Choice Modeling

4.2. Variables
Socio-economical status: Socio-economical status levels
of the households are determined due to the results of a
questionnaire filled and periodically updated by the
households. The index takes the education level,
occupation, ownership of certain items, and
accommodation area of the household members into
consider. The data set contains 2 different levels of
socio-economical status: High SES and Low SES. If
the SES level of the household is high, then the variable
(SES High) takes the value of 1, otherwise 0.
Household Size: Household size (HHSize) represents
the number of people living in the household according
to the panel records during the study period.
Loyalty: Operationally, loyalty is defined as the
weighted average of the last three purchases of the
brand. The relative coefficient sizes of 0.5, 0.3, and 0.2
were used when weighting the first, second, and third
prior purchases. As the sum of loyalties across brands
equals 1 for a household and there are 3 alternatives
(Brand 1, 2, and 3), two variables (Loyalty1, Loyalty2)
are employed in the model.
Relative Prices: Price information for the brand
purchased at a particular trip is simply generated by
dividing the toothpaste spending made in Turkish liras
(TL) by the quantity bought. On the other hand, as
mentioned above, in diary based consumer panels,
households do not record the prices of all the alternative
brands displayed in the shelves of a store. Therefore,
there is no direct price information available for the
brands which are not purchased but present in the store
during the shopping trip.
In order to generate unit price information for the
alternative brands, in this study, a two stage procedure
is implemented (Figure 3). Initially, the price
information is generated according to the Stage 1. Based
on 96% of the transactions (6,671 out of 6,943), unit
prices of alternative brands are generated in this stage.
When there is no transaction fulfilling the conditions
suggested in stage 1, stage 2 is implemented. In stage 2,
price for 272 observations are estimated.
After maintaining the purchase price and the prices for
the alternatives that are not purchased, the relative
prices are calculated. Finally, by computing the natural
logarithms of these ratios, price variables employed in
the model (log(Price1/Price3)) and log(Price2/Price3))
are obtained.

Figure 3 A two staged method of price data generation for the
brands that are not purchased
4.3. The MNP Model
As it is seen in Table 3, estimation results show that
price coefficients are significant and have expected
signs. As the relative price of Brand 1 over Brand 3
increases, the probability of being chosen for Brand 1
over Brand 3 decreases which is in accordance with
microeconomics theory. Similarly, as the relative price
of Brand 2 over Brand 3 increases, the probability of
being chosen for Brand 2 over Brand 3 decreases.
Loyalty coefficients are positive and highly significant.
As expected, if the loyalty of Brand 1 (Loyalty1) is
higher, then it is more probable that Brand 1 is chosen
instead of Brand 3. Similar findings are valid for other
brands. Table 3 shows that there is an association
between the SES levels and purchase decisions of the
households between Brand 1 and Brand 3.
As the SES level increases, the probability of being
purchased for Brand 1 against Brand 3 diminishes.
Finally, as the household size increases choice
probability of Brand 1 and Brand 2 over Brand 3
increases.
Wald and ?
2
statistics are computed as 1,030 and 0.227,
respectively. Both of the statistics are highly significant
at 1 ‰level. Using Eq. (5) the hit rates (S
1
) of the
model are calculated as 66% and 63% for the model and
test samples, respectively.

Stage 1
Use the unit price information derived from the
alternative brand purchases which were made,
- at the same type of retailer,
- in the same month
with the brand purchased.
Stage 2
Use the unit price information derived from the
alternative brand purchases which were made,
- at the same type of retailer,
- in the previous or next months
with the brand purchased.
T. Kaya et al. forthcoming in International Journal of Computational Intelligence Systems

Table 3 Estimation results for the MNP model

Brand 1 Brand 2
Constant
-1.252**
(.170)
-1.415**
(.184)
log (Price1/Price3)

-.960**
(.161)
-.175
(.157)
log (Price2/Price3)
.243
(.162)
-.579**
(.154)
Loyalty 1
3.073**
(.133)
1.162**
(.143)
Loyalty 2

1.279**
(.165)

2.221**
(.160)
SES High
-.199**
(.093)
-.073
(.098)
HHSize
.089**
(.029)
.059*
(.033)
**p
 

Attachments

Back
Top