Assessing analytical robustness in cross cultural comparisons

jasminepvk · Feb 1, 2016

Description
The purpose of this paper is to critically review past recommendations to correct for
cultural biases in empirical survey data sets, and propose a framework that enables the researcher to
assess the robustness of empirical findings from culture-specific response styles (CSRS).

International Journal of Culture, Tourism and Hospitality Research
Assessing analytical robustness in cross-cultural comparisons
Sara Dolnicar Bettina Grün
Article information:
To cite this document:
Sara Dolnicar Bettina Grün, (2007),"Assessing analytical robustness in cross-cultural comparisons",
International J ournal of Culture, Tourism and Hospitality Research, Vol. 1 Iss 2 pp. 140 - 160
Permanent link to this document:http://dx.doi.org/10.1108/17506180710751687
Downloaded on: 24 January 2016, At: 22:02 (PT)
References: this document contains references to 40 other documents.
To copy this document: [email protected]
The fulltext of this document has been downloaded 494 times since 2007*
Users who downloaded this article also downloaded:
Sara Dolnicar, Bettina Grün, (2007),"Cross-cultural differences in survey response patterns", International
Marketing Review, Vol. 24 Iss 2 pp. 127-143http://dx.doi.org/10.1108/02651330710741785
Sara Dolnicar, (2007),"Management learning exercise and trainer's note for market segmentation in
tourism", International J ournal of Culture, Tourism and Hospitality Research, Vol. 1 Iss 4 pp. 289-295 http://
dx.doi.org/10.1108/17506180710824172
Zhiyi Ang, Peter Massingham, (2007),"National culture and the standardization versus adaptation
of knowledge management", J ournal of Knowledge Management, Vol. 11 Iss 2 pp. 5-21 http://
dx.doi.org/10.1108/13673270710738889
Access to this document was granted through an Emerald subscription provided by emerald-srm:115632 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as
providing an extensive range of online products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee
on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.
*Related content and download information correct at time of download.
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Assessing analytical robustness
in cross-cultural comparisons
Sara Dolnicar
School of Management & Marketing and Marketing Research Innovation
Centre (MRIC), University of Wollongong, Wollongong, Australia, and
Bettina Gru¨n
Department of Statistics and Probability Theory,
Vienna University of Technology, Vienna, Austria
Abstract
Purpose – The purpose of this paper is to critically review past recommendations to correct for
cultural biases in empirical survey data sets, and propose a framework that enables the researcher to
assess the robustness of empirical ?ndings from culture-speci?c response styles (CSRS).
Design/methodology/approach – The paper proposes to analyze a set of derived data sets,
including the original data as well as data corrected for response styles using theoretically plausible
correction methods for the empirical data at hand. The level of agreement of results across correction
methods indicates the robustness of ?ndings to possible contamination of data by cross-cultural
response styles.
Findings – The proposed method can be used to inform researchers and data analysts about the
extent to which the validity of their conclusions is threatened by data contamination and provides
guidance regarding the results that can safely be reported.
Practical implications – Response styles can distort survey ?ndings. CSRS are particularly
problematic for researchers using multicultural samples because the resulting data contamination can
lead to inaccurate conclusions about the research question under study.
Originality/value – The proposed approach avoids the disadvantages of ignoring the problem and
interpreting spurious results or choosing one single correction technique that potentially introduces
new kinds of data contamination.
Keywords Cross-cultural studies, Research, Standardization
Paper type Research paper
Introduction
The existence of response styles is a well-known and much-studied phenomenon in
various disciplines within the empirical social sciences (Baumgartner and Steenkamp,
2001; Bhalla and Lin, 1987; Hui and Triandis, 1989; Paunonen and Ashton, 1998;
Sekaran, 1983). Different respondents have different ways of using the answer formats
that researchers offer them, independent of the content. Paulhus (1991, p. 17) claims
that this behavior results in a response bias that has “a systematic tendency to respond
to a range of questionnaire items on some basis other than the speci?c item content (i.e.
what the items were designed to measure).” He also claims, “To the extent that an
individual displays the bias consistently across time and situations, the bias is said to
be a response style.”
The tendency to make more use of extreme answer options is one possible response
style. In the case of the much-used multi-category answer format with ?ve answer
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1750-6182.htm
IJCTHR
1,2
140
Received November 2006
Revised January 2007
Accepted February 2007
International Journal of Culture,
Tourism and Hospitality Research
Vol. 1 No. 2, 2007
pp. 140-160
qEmerald Group Publishing Limited
1750-6182
DOI 10.1108/17506180710751687
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
options, respondents with such a tendency tick the ?rst and the ?fth option more
frequently than the others. For instance, when asked to rate satisfaction with the food
in a hotel on a scale from “highly satis?ed” to “highly dissatis?ed,” a respondent
displaying an extreme response style or ERS (Baumgartner and Steenkamp, 2001)
would favor the two end points independently of the content.
Other respondents feel more comfortable avoiding extreme answers and make more
use of the middle answer categories, such as “mildly satis?ed” in the above example
(Roster et al., 2003). Even if a respondent with a mild response style and a respondent
with an ERS feel the same level of satisfaction, their answers on the multi-category
answer format are likely to differ, wrongly leading researchers to conclude that their
satisfaction levels differ (Kozak et al., 2003). In other words, interpretation of response
scales is subjective.
Studies repeatedly show that the cultural background of respondents has a
systematic effect on their response style. Respondents from different cultural
backgrounds tend to use survey answer formats in different ways (see, for instance,
Hui and Triandis, 1989; the section on culture-speci?c response styles (CSRS) below
includes a review). This effect does not in?uence the results of empirical studies within
one discrete cultural area. However, if the sample consists of individuals from varied
cultural backgrounds with signi?cantly different styles of using survey formats, the
results derived from this data set could be distorted. Such data sets will henceforth be
referred to as “multicultural data sets.”
This study provides a replication-based approach to assessing the danger of
misinterpretation due to CSRS for each of the items used to compare respondents from
different cultural backgrounds. This paper achieves this by:
.
reviewing prior work studying CSRS;
.
critically reviewing proposed techniques to correct for CSRS;
.
proposing a robustness-based approach to assessing cross-cultural ?ndings; and
.
illustrating both possible misinterpretations if data are not corrected or are
inappropriately corrected.
The approach here differs distinctly from prior propositions by building on robustness
analysis of cross-cultural research ?ndings across various conditions in correcting for
bias. The main advantage of this approach (as opposed to prior recommendations) is to
minimize the risk of either misinterpreting raw data contaminated by response styles,
or choosing an incorrect transformation of original answers and drawing the wrong
conclusions from the corrected data. Hence, the solution proposed in this study takes
the perspective of robustness, and assesses the degree to which ?ndings actually
depend on the potential sources of contamination.
If substantial differences between cultures derive from analyses based on corrected
and uncorrected data, the researcher can have more con?dence when reporting such
?ndings, because the result cannot be an artifact resulting from the expected response
styles or data transformations undertaken to correct the data. However, if the analysis
of uncorrected data leads to ?ndings entirely different from the analyses of corrected
data sets, the researcher should interpret such results with care.
Assessing
analytical
robustness
141
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Prior work
The central problem of cross-cultural studies is that if variant response styles are
present in data sets, the researcher can no longer interpret differences in group mean
values (Chun et al., 1974). This represents a major problem where, for instance, the
central research aim is to test cultural differences, and when using mean-based tests
(such as t-tests and F-tests).
The literature review draws on sources from the ?elds of psychology (for instance,
Arce-Ferrer and Ketterer, 2003), sociology (for instance, Watson, 1992), and marketing
research (for instance, Greenleaf, 1992a). Prior work falls into four classi?cations:
(1) Articles that discuss the kinds of methodological problems that may distort
cross-cultural research ?ndings in general, and response styles speci?cally.
(2) Empirical studies that investigate the existence and nature of response styles in
cross-cultural studies.
(3) Publications in which techniques propose to detect whether or not data sets are
contaminated by response styles.
(4) Work that proposes correction techniques.
Methodological problems in cross-cultural research
The literature on methodological and theoretical problems of cross-cultural research
(research stream 1) shows a vast number of potential pitfalls awaiting cross-cultural
research. A comprehensive review by Sekaran (1983) lists nine different areas capable
of affecting the validity of ?ndings, and argues that researchers must ensure
equivalence at different levels, ranging from conceptual/functional over construct
operationalization, to item and scalar equivalence. Sekaran’s review illustrates the
many potential pitfalls in empirical cross-cultural research, and thus provides a useful
reference point for the present study, which is interested in one particular aspect
among those discussed by Sekaran: measurement bias. Drasgow (1987, p. 19) claims
that measurement bias occurs if individuals with equal standing on the trait measured
by the test (but who are sampled from different subpopulations) fail to have equal
expected observed test scores.
Bhalla and Lin’s (1987) article on methodological requirements of cross-cultural
studies discusses measurement bias in a section called “scalar equivalence,” where
they describe how “Cultures differ in their response set characteristics, such as social
desirability, acquiescence, and evasiveness, which in?uence response scores” (Bhalla
and Lin, 1987, p. 278). Smith and Reynolds’ (2002) review discusses the aspect of
measurement bias in more detail, and differentiates between two sources of bias:
response sets and response styles. Response sets imply that respondents wish to paint
a certain picture of themselves, such as the Japanese not wanting to boast about their
achievements. Response styles are systematic differences in responses that result from
the format of the questions presented to respondents. Smith and Reynolds (2002, p. 450)
conclude that:
Failure. . . to detect differences in cross-national response bias will. . . affect data
comparability, may invalidate the research results and could therefore lead to incorrect
inferences about attitudes and behaviors across national groups.
The present study focuses on response styles.
IJCTHR
1,2
142
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Empirical evidence for response styles in cross-cultural research
Smith and Reynolds (2002, p. 463) also point out that “There is already evidence to
suggest that, at a minimum, extreme and neutral response styles differ
cross-culturally.” Research stream 2 offers such evidence, which consists of studies
empirically investigating the existence and nature of response bias between cultures.
Early work in this area investigated whether answers from black and white
respondents differed systematically, concluding that black respondents tended to use
the extreme points of the scale (for instance, the “strongly agree” or “strongly disagree”
options on a Likert scale) more frequently than white respondents (Bachman and
O’Malley, 1984). Several other empirical studies detect systematic differences in
response styles between respondents from Asian and Western cultures. Chen et al.
(1995) conclude that individuals from collectivist cultures tend to avoid extremes; while
Shiomi and Loo (1999) and Si and Cullen (1998) ?nd that respondents from Asian
countries use middle categories more than do Western respondents; Das and Dutta
(1969) identify a moderate response style among Indian respondents; and Chun et al.
(1974) detect a stronger ERS (the tendency to use the extreme values as exempli?ed
above) among American students compared to Korean students.
The present study contradicts this general tendency. Roster et al. (2003) ?nd that US
and Filipino respondents are more likely to respond to attitudinal scales with extreme
answers compared to Chinese or Irish respondents. Another set of response style
studies focuses on Hispanic respondents, and consistently ?nds that these respondents
tend to an ERS (Hui and Triandis, 1989; Marin et al., 1992). Triandis and Triandis
(1962) demonstrate the same effect for Greek students in comparison to American
students. Very few reports show that no differences exist between cultures: Cheung
and Rensvold (2000) and Yates et al. (1997) ?nd no difference in response styles
between American and Taiwanese students.
Such systematic differences are not random occurrences; cultural in?uences lead to
differences in how people respond to a question. For instance, Hui and Triandis (1989,
p. 298) explain the differences between Hispanic and non-Hispanic respondents thus:
In cultures around the Mediterranean, by contrast an extreme response style is used because
people consider such a response sincere. To use the middle of the scale would be considered
trying to hide one’s feelings, which is normatively disapproved.
Hui and Triandis also suggest that modesty and caution drive Asian cultures to make
less use of extreme points on an answer scale.
Detecting response styles
Research stream 3 consists of studies that propose detection methods of response
styles. Most researchers derive measures for speci?c kinds of response styles to
quantify the contamination of the data. For instance, Chun et al. (1974) use individual
standard deviations to detect ERS. Greenleaf (1992b) and van Herk et al. (2004) propose
further methods to detect and measure ERS, and use the proportion of extreme
responses. Johnson et al. (2005) use the number of items with extreme responses. Hui
and Triandis (1989) use the individual means to detect acquiescence response style
(ARS, the tendency to agree independently of the content of the question), standard
deviation for response range (RR, a measure to assess how wide is the range of answer
options used), and the number of times the respondent ticked the end points, to assess
Assessing
analytical
robustness
143
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
the level of ERS. Baumgartner and Steenkamp (2001) give a detailed discussion of
seven different response-style types, and the determination of corresponding measures
that can be used to assess the different response style types. Cheung and Rensvold
(2000) take a different approach, and use multi-group con?rmatory factor analysis to
check for contamination of the data with ARS and ERS. They propose a stepwise
procedure, in which they ?rst check for form invariance between two cultures, where
they test if the same items are associated with each construct. They test for ERS by
checking factorial invariance, that is, for equality of the factor loadings. They then test
for ARS, by checking intercept invariance, that is, for the equality of the item values
where the latent variable is equal to zero. If invariance is con?rmed in each test, the
differences in latent means indicate substantive differences between cultures. If
variance is not con?rmed, ERS and ARS could be present in the data. Furthermore,
checking the factorial invariance fails to detect uniform ERS differences (that is, the
same bias exists with respect to all items associated with a construct).
Correcting for response styles
Work classi?ed as research stream 4 contains techniques proposed to correct for
response styles, once identi?ed. Standardization is the most commonly used correction
technique. If the researcher believes that CSRS are present in the data and that they
might distort research ?ndings, they will standardize the data. Standardization is
based on the assumption that response styles are constant over time, and that the
differences in aggregated answer patterns actually re?ect differences in response
styles and not content-related differences. This assumption might be justi?ed by
checking if the differences are consistent over different, unrelated constructs, which
reduces the possibility that the differences are content related, because for a single
construct, overall differences may exist in attitude towards the construct under study
between the respondents. While standardization leads to the removal of response bias,
the danger associated with standardization is that it also eliminates content-related
differences. Fischer (2004) reviews standardization methods commonly used to adjust
for response styles in cross-cultural research and provides a classi?cation of the
different methods. He distinguishes between different forms and different units of
adjustment, and an overview of these is given in Table I.
The different units in Table I are within subject, within group, within culture, and
double. If the unit of adjustment is the subject, standardization occurs using all
variables for each individual. This compares to within-group standardization, which
uses all individuals for each variable. Within-culture standardization uses all variables
and individuals in each culture. If a study includes double standardization, the research
combines within-subject standardization with standardization within group for each
culture.
The forms of adjustments in Table I differ with respect to the measure they use for
correction: means, dispersion indices, both means and dispersion indices, or covariates.
Means help correct for a possible ARS in the data, while dispersion indices (commonly
standard deviations) can account for ERS. However, subtracting the mean leads to
ipsatized scores known to re?ect only intra-unit (relative) differences (Chan, 2003), and
furthermore, forces the row and column sums in the correlation matrix to zero. This
affects all correlation-based analysis techniques, such as factor analysis. Hence, the
generally recommended use of ipsatization is only for scales with low inter-item
IJCTHR
1,2
144
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
correlations (Bartram, 1996). Therefore, within-subject standardization using means is
based on the assumption that no content-related difference exists between respondents.
If any content-related differences exist, these would be lost in the standardization
process.
Within-group standardization means that the transformation is made across
variables, thus ensuring that each variable has the same overall mean and/or the same
variance. This step is important for multivariate analyses, and if aggregated scores are
determined. Within-group standardization assumes that the overall average score
and/or variance are comparable over variables. Researchers undertake within-culture
standardization if they have assumed that the response styles differ between cultures,
but are equal within culture. Under this assumption, within-culture standardization
might be preferable to within-subject standardization. The estimates for the different
forms of adjustments are more reliable because they take into account all respondents
in a cultural group. However, the assumption of homogeneity in response styles within
culture might be questionable, given that other socio-demographic characteristics are
associated with response styles.
Leung and Bond (1989) propose double standardization for individual analysis.
Within-subject standardization removes individual response style, and within-culture
standardization for each item removes differences between the average responses of
the individuals of each culture. These differences in average responses are also called
differences in positioning.
If a researcher assumes that CSRS might distort the results of the analysis and
wants to correct the data to account for this contamination, they cannot randomly
choose a correction technique. The choice of correction technique should consider what
assumptions each of the different standardization methods makes, in order to identify
the one most appropriate for the research problem and the data conditions faced. If
such identi?cation were possible, the researcher could transform the data using the
appropriate method, and the corrected data set becomes the bases for all analyses.
Category Alternatives Comments
Unit Within-subject Across variables for each individual, assumes no
content-related differences between respondents
Within-group Across individuals for each variable, assumes
overall average scores and/or variance is comparable
over groups
Within-culture Across variables and individuals for each culture,
assumes equality of response styles within culture
Double Within-subject followed by within-group for each
culture
Adjustment using Means Removes ARS Leads to ipsatization
Dispersion indices Removes ERS. Needs a balanced design with
negative and positive items
Means and dispersion
indices
See the separate discussion of means and dispersion
indices
Covariates Assumes that correlation between covariates and
other items is due to response style
Table I.
CSRS
standardization-based
correction techniques
Assessing
analytical
robustness
145
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Unfortunately, and in most cases, deciding which correction technique is most
appropriate is not simple. Often, no criteria are available to assess whether the
assumptions each of the possible transformations makes are appropriate for the data
and the problem. The researcher essentially has two choices. They can either ignore the
possibility of CSRS contamination, arguing that any transformation would lead to a
different form of contamination of the data anyway (for instance, loss of content-related
differences); or they can choose the correction technique that appears to be most
suitable, and accept that the assumptions made by this correction technique may be
inappropriate, and thus cause unwanted new distortion effects on the data.
A robustness-based approach
Neither of the presently available solutions for dealing with CSRS is entirely
satisfactory for a researcher. This problem is due to the high levels of insecurity about
how best to trade off original data contamination with data contaminations potentially
introduced by inappropriate correction. Essentially, the problem faced is one where the
true values of the answers given by respondents cannot be retrieved any more, and are
therefore unknown. Consequently, the challenge is to assess whether differences
postulated between respondents from different cultures are true, or merely artifacts of
response styles or response style corrections.
Motivation
The replication approach is successful in dealing with precisely this situation, where
true values are unknown and an assessment of the reliability of conclusions is needed.
For instance, in market segmentation studies, the true segment membership is
unknown a priori and not directly observable. The researcher can segment a market in
myriad ways, none of whose memberships may be the best or most appropriate. One
solution proposed for this problem is replication. Extensive repetition and comparison
of results can extract the most reliable results – or the most reliable ?ndings.
Furthermore, replication allows distinguishing between stable segments that represent
“natural” clusters and unstable segments that represent “arti?cial” clusters (Kruskal,
1977).
The most common problem of this nature is that true empirical values for any
problem can in general only be estimated. This is the basis of inferential statistics: the
signi?cance level informs the researcher how likely any particular result is true and not
a random result. While ?nding what the true values of respondents are is not a trivial
problem, powerful ways exist to assist the empirical researcher to assess the dangers of
misinterpretation and interpret ?ndings only that have a fairly low probability of being
wrong. For example, using a typical signi?cance level of 95 percent, an empirical
researcher takes a ?ve percent risk of claiming a ?nding that is not true. Using a
similar approach for the problem of CSRS offers a promising avenue for dealing with
the dangers of potential misinterpretation based on cultural differences of respondents,
and the fact that true views of respondents are typically not known.
The prior discussion clari?es that each data set requires customized assessment,
and that a general deterministic solution cannot offer the optimal way of dealing with
potential CSRS-related misinterpretation of results. Even if determining correction
factors for certain nationalities were possible, these would have to be different for
different constructs under study. For instance, questions about satisfaction are more
IJCTHR
1,2
146
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
likely to trigger different response style effects compared to questions about vacation
activities undertaken, with respondents likely to perceive the latter as personal or even
con?dential in nature. Even if determining a set of bias values for a range of commonly
studied constructs were achievable, CSRS are dynamic phenomena, and likely to
change over time. Optimally, researchers should develop a technique that allows them
to assess for their data sets the extent to which CSRS biases results.
The problem this study targets does not encompass all possible mistakes a
researcher might make in the context of a cross-cultural study. Speci?cally, problems
arising from badly operationalized constructs, or the lack of structural equivalence of
the construct across cultural backgrounds, cannot be solved with the proposed
approach. Both these problems essentially make the responses unusable, because each
respondent (or each culture) may have entirely different perceptions about what the
question is about. Social science research typically uses many constructs that are
ill-de?ned, as Kampen and Swyngedouw (2000) discuss in detail. Many other studies
discuss the problem of structural equivalence of constructs across respondents from
different cultural backgrounds, emphasizing the need for extensive exploratory work
before questionnaire development (Kozak et al., 2003; Sekaran, 1983; Bhalla and Lin,
1987). Issues of operationalization and structural equivalence should be addressed at
an earlier stage of the research project. The problem dealt with in the present study,
CSRS, occurs during the quantitative phase, and occurs even if the construct under
study has perfect structural equivalence.
Classi?cation of variables with respect to robustness of ?ndings
The robustness of results is useful for assessing the reliability of ?ndings because
whether raw or corrected answers are closer to the truth of the matter investigated is
not known, as discussed above. Robustness in this context means independence of
CSRS and corrections for CSRS. Researchers can consider robust a result from an
empirical study that includes respondents from different cultural backgrounds if the
original answers, as well as various suitable corrections for response styles, lead to the
same conclusion. In the worst case, the original values, and each of the alternative
ways of correcting for CSRS, lead to different results, indicating a very low level of
robustness of ?ndings.
Figure 1 shows the alternative outcomes of this scenario. The term “corrected”
indicates that a wide variety of different corrections is possible. For simplicity of
illustration, Figure 1 only compares the results of one correction technique to the
results for the raw data. However, in general, several correction techniques might be
assumed suitable in addition to the raw data.
Figure 1 also distinguishes between the analysis of corrected (vertical dimension)
answers and the analysis of raw (horizontal dimension) answers. In each case, the
research question (whether differences exist between respondents from different
cultural backgrounds) can be tested for each of the variables in the data set. The test
result can be signi?cant (indicating that a difference exists) or not signi?cant
(indicating that no difference exists). The combinations of test results based on
corrected and raw data can be used to assess the danger of misinterpretation of
multicultural data sets in general, as well as the classi?cation of each variable into a
high-or low-risk category.
Assessing
analytical
robustness
147
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Figure 1 shows a case where only two results are compared. However, the proposed
approach would typically include a set of corrected data sets derived by applying all
theoretically suitable transformations. Different results can occur for different
variables because each respondent’s answer consists of a true value component and a
response style component. While the response style component is assumed constant
across all answers by an individual respondent (they may always tend to use extreme
answer options), the true values are assumed to be different (they may be more
interested in relaxing vacations than in action-packed or culture-oriented vacations).
The relation to true values and response styles across a culture will determine whether
the true and corrected data will lead to the same or different conclusions.
The top right-hand corner in Figure 1 shows high-risk items (HRIs) because the two
tests do not lead to the same conclusions. The test based on raw data leads to the
conclusion that a difference exists (for instance, that French tourists are signi?cantly
more interested in city packages than German tourists, and therefore may be the better
target groups for such offers). The test based on corrected data indicates that no
difference is evident (that French and German tourists are equally interested in city
packages). The bottom left-hand quadrant illustrates the opposite situation: raw data
leads to insigni?cant differences, whereas corrected data leads to signi?cant
differences.
Both these situations can lead to misinterpretations. If only the corrected or
uncorrected data is interpreted, differences are or are not claimed, which may well be
Figure 1.
Classi?cation of variables
based on robustness of
test results
Results from analyses based on corrected data
Significant difference
between cultures
No significant difference
between cultures
Significant
difference
between
cultures
Low-risk items (LRI):
items that reliably
discriminate between
cultures
High-risk items (HRI): items
that could be misinterpreted
on the basis of systematic
data contamination
Results from
analyses based
on uncorrected
data
No
significant
difference
between
cultures
High-risk items (HRI):
items that could be
misinterpreted on the
basis of systematic data
contamination
Very low-risk items (vLRI):
items that reliably do not
discriminate between cultures
IJCTHR
1,2
148
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
artifacts of response styles or response style corrections only. Items of this nature are
therefore referred to as high risk. Two possible ways of dealing with such HRIs exist:
one is to omit reporting on the ?ndings on these items, possibly a data-dumping
exercise that may not be possible, given that clear answers are needed. However, this is
quite a usual procedure, particularly in psychological studies. Items for which freedom
from structural or scalar inequivalence cannot be established are often omitted to
“purify” the scale (see, for instance, Cheung and Rensvold, 2000; Huang et al., 1997).
Two other kinds of items are not endangered by potential misinterpretations based
on response styles. If signi?cance tests based on the raw and corrected data lead to the
same result, then either a difference between the two cultures or countries of origin
exists, or does not. These results can be reported with a reasonable amount of
con?dence, given that they are based both on uncorrected and corrected variable
values.
The bottom right-hand corner of the ?gure shows the case where both signi?cance
tests indicate no difference between the cultures or countries of origin; variables of this
nature can be referred to as very low-risk items (vLRIs). Alternatively, if both
signi?cance tests indicate differences in responses (the top left-hand quadrant of the
framework), the only possible misinterpretation would be that one test states that
respondents from one country have higher values, and the other test indicates that
respondents from the other country have higher values – a rather unlikely outcome,
but one that should be checked. These variables are therefore referred to as low-risk
items (LRIs). An asymmetry exists in the evaluation of the items introduced where all
signi?cance tests agree. In order to be able to draw unambiguous conclusions, the
researcher must additionally check that the differences for the LRIs have the same sign.
Outline of the procedure
A wide variety of different recommendations exists as to how to assess and correct for
CSRS. These include many viable ways of dealing with the problem of CSRS
contamination in the empirical analysis of multicultural data which are based on
different assumptions. However, choosing any of the available approaches has at least
one major drawback: the researcher assumes – without knowing the true nature of
contamination by CSRS – that the chosen transformation leads to values closer to the
true views of respondents than the rawdata. This may or may not be the case. Achosen
transformation may well lead to values that are further away from the true views
of consumers. Typically, no way exists to determine which of the possible scenarios
is the case.
Any attempt to ?nd the transformation that recovers the true values is, by its very
nature, a process that cannot be ?rmly validated. Hence, the proposed procedure aims
not at prescribing a way to transform the data, but rather, determines the robustness of
?ndings to CSRS. Assessing robustness uses the original values and a set of possible
transformations, and the researcher undertakes the analyses required to answer the
research questions for all data sets. They will classify as robust results that lead to the
same conclusions under all data conditions. Therefore, CSRS robustness is an indicator
that the rejection of a hypothesis was likely correct, despite the contamination with
CSRS. This test functions as a guide for the researcher as to which results they can
reliably report, and which they should interpret with care because the effects of CSRS
could lead to misinterpretations.
Assessing
analytical
robustness
149
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
The procedural proposal consists of four steps. However, these steps do not include
other necessary steps of ?eldwork design for cross-cultural studies, such as the
assessment of structural equivalence of the construct under study. The starting point is
any multicultural data set where the researcher must assume that CSRS is presentable:
(1) Selection of a set of correction techniques appropriate for the problem and data
at hand.
(2) Correction of raw values according to all chosen correction techniques.
(3) Testing of cross-cultural differences based on the raw data and all transformed
data sets.
(4) Computing of CSRS robustness indices – the proportion of identical analyses
results to deviating results across all pairs of data sets. For each analysis, one
value is derived.
Empirical illustration
For the illustration, this section uses a data set from the tourism area where
multicultural samples and comparisons naturally arise and are thus frequently
encountered. The same data set applies for the purpose of this illustration to:
.
demonstrate the potential misinterpretations that can result either from ignoring
the existence of response styles or choosing inappropriate correction techniques,
as discussed above; and
.
demonstrate how the proposed robustness-based approach can help researchers
reduce the level of uncertainty in interpreting results from cross-cultural
analyses.
Imagine the National Tourism Organization (NTO) of Austria would like to allocate
their advertising budget to one country of origin, because the budget is insuf?cient for
campaigns in several countries. The NTO therefore undertakes a study to compare
countries of origin with respect to their travel motivations, in order to determine which
country’s travellers are most interested in certain aspects, for instance, culture, health,
and beauty, or an unspoiled environment.
Description of the data set
The data set used resulted from the national guest survey conducted in the summer
season of 1994 by the NTO, the so-called O
¨
sterreich Werbung. While the original data
set includes respondents from 14 different countries, the selection was only of a
sub-sample of 1,351 respondents from four areas of origin, in order to keep the
illustration simple (France: n ¼ 312, Italy: n ¼ 340, USA: n ¼ 246, Vienna: n ¼ 453).
This illustration also assumes that all equivalence criteria that need to be assured
during the survey development and data collection phase have been evaluated and
found satisfactory. However, as previously noted, these equivalences are not the focus
of this paper. Also, assessment of them ex post, using statistical techniques, might not
be possible. For example, Cheung and Rensvold (2000) state: “one type of form
noninvariance, known as construct bias cannot be detected statistically.”
The survey includes a set of 21 questions on vacation motivations, including asking
respondents to state to what extent each of the listed aspects was a motivating factor for
them on the vacation. The four-point ordinal answer format used the labels “not at all,”
IJCTHR
1,2
150
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
“a little bit,” “to some extent,” and “exactly.” The questions covered all relevant, but
different, aspects of vacations which might in?uence the destination choice. One
example is: “On holiday I want to exert myself physically and play sports.” (The full set
of motivation statements is available from the authors.) Raw scores are determined by
assigning equidistant values from 0 to 1 to the four levels of agreement.
Illustration of possible misinterpretations
The ?rst step is to illustrate which misinterpretations could be made based on this data
set if the possible existence of response styles is ignored. For this purpose, two
independent sets of computations are undertaken, one based on the raw, uncorrected
data (thus, ignoring the possible contamination of the data by response styles) and one
with corrected data. The data were corrected using within-subject standardization.
This is achieved by subtracting the individual mean and dividing through the
individual standard deviation. This particular correction technique was chosen
because of its high popularity, and has been recommended in the past for removing
both ARS and ERS. Using the corrected data as the basis of analysis assumes that
either ARS or ERS contaminate the data, which eliminates response styles of these
kinds; however, doing so also enables elimination of some of the actual content of
respondents’ answers.
Table II shows a comparison of the mean answers of the raw and standardized
scores for each region, together with the mean answers of the total population. For the
raw data, the mean values are between 0 and 1, where 0 indicates no motivation at all,
and 1 absolute agreement with the statement. The corrected data can take negative and
positive values, indicating if the motivation for an item is above or below the average
motivation. Child care, for example, is of very low importance for all areas of origin, as
indicated by having the smallest mean value.
The raw data attracts the conclusion that on average, the respondents have a
motivation between not at all and a little, because the observed values are comparable
to the original scale. In contrast, for the corrected data, the only conclusion is that this
item is of least importance for each area of origin, and deviates most from the average
motivation.
The cross-cultural comparison of the results of the national guest survey data
shown in Table II reveals that for the French respondents, the level of agreement with
the motivation statements is higher for the raw than for the corrected data, for those
motivations that are more important for the French. For those motivations not so
important for the French respondents, the levels of agreement are lower when raw,
uncorrected data is used. The opposite is true for the Italian respondents. No obvious
differences between the raw and corrected data are observable for the American
respondents. For the Viennese respondents, the appreciation for “atmosphere” might
be underestimated when comparisons are computed on the basis of the raw scores. The
importance of “sports” emerges as much higher for the Viennese respondents than for
the average population in the corrected data; but this is not the case in the raw data.
A super?cial assessment only is the basis for the above interpretation. When
respondents’ answers to all 21 travel motives are tested for differences between each of
the countries and the overall mean value using a t-test, 60 percent of the differences (50
out of 84) are signi?cant for the raw scores and 57 percent for the standardized scores
(48 out of 84), at a signi?cance level of 0.05. A cross-tabulation of the test results shows
Assessing
analytical
robustness
151
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
R
a
w
C
o
r
r
e
c
t
e
d
F
r
a
n
c
e
I
t
a
l
y
U
S
A
V
i
e
n
n
a
A
v
e
r
a
g
e
F
r
a
n
c
e
I
t
a
l
y
U
S
A
V
i
e
n
n
a
A
v
e
r
a
g
e
R
e
s
t
a
n
d
r
e
l
a
x
0
.
7
6
*
0
.
6
6
0
.
5
1
*
0
.
7
6
*
0
.
6
9
0
.
5
6
0
.
5
4
2
0
.
0
4
*
0
.
7
5
*
0
.
5
1
C
o
m
f
o
r
t
0
.
5
4
*
0
.
4
2
*
0
.
4
8
0
.
5
0
0
.
4
8
2
0
.
0
4
2
0
.
1
6
2
0
.
1
4
0
.
0
1
*
2
0
.
0
7
S
p
o
r
t
s
0
.
4
1
0
.
3
9
0
.
3
6
0
.
4
2
0
.
4
0
2
0
.
4
0
2
0
.
2
9
2
0
.
4
7
*
2
0
.
1
7
*
2
0
.
3
1
E
x
c
i
t
e
m
e
n
t
0
.
4
4
*
0
.
3
1
*
0
.
6
7
*
0
.
2
9
*
0
.
4
0
2
0
.
2
8
2
0
.
4
7
*
0
.
4
3
*
2
0
.
5
2
*
2
0
.
2
8
C
r
e
a
t
i
v
i
t
y
0
.
2
9
0
.
2
9
0
.
3
6
*
0
.
2
6
*
0
.
2
9
2
0
.
7
0
*
2
0
.
5
5
2
0
.
4
8
*
2
0
.
6
1
2
0
.
5
9
C
u
l
t
u
r
e
0
.
7
0
*
0
.
6
4
*
0
.
6
9
*
0
.
4
0
*
0
.
5
8
0
.
3
9
*
0
.
4
7
*
0
.
5
3
*
2
0
.
2
4
*
0
.
2
2
F
u
n
0
.
5
3
0
.
5
3
0
.
6
1
*
0
.
4
4
*
0
.
5
1
2
0
.
0
5
*
0
.
1
6
*
0
.
2
6
*
2
0
.
0
9
*
0
.
0
5
G
o
o
d
c
o
m
p
a
n
y
0
.
5
8
0
.
6
3
*
0
.
6
2
*
0
.
4
4
*
0
.
5
5
0
.
0
8
0
.
4
2
*
0
.
2
9
*
2
0
.
1
0
*
0
.
1
4
U
n
s
p
o
i
l
e
d
n
a
t
u
r
e
0
.
8
6
*
0
.
7
7
0
.
7
6
0
.
7
5
*
0
.
7
8
0
.
8
0
0
.
8
5
*
0
.
6
9
0
.
7
0
0
.
7
6
H
e
a
l
t
h
a
n
d
b
e
a
u
t
y
0
.
3
2
0
.
3
3
0
.
2
6
*
0
.
4
5
*
0
.
3
6
2
0
.
6
0
*
2
0
.
4
4
2
0
.
7
6
*
2
0
.
1
3
*
2
0
.
4
3
S
u
r
r
o
u
n
d
i
n
g
s
0
.
8
4
*
0
.
5
7
*
0
.
7
6
*
0
.
7
2
0
.
7
2
0
.
7
4
*
0
.
2
7
*
0
.
7
1
*
0
.
6
1
0
.
5
7
F
r
e
e
a
n
d
e
a
s
y
g
o
i
n
g
0
.
7
6
*
0
.
4
9
*
0
.
6
9
0
.
7
3
*
0
.
6
7
0
.
5
3
*
0
.
0
3
*
0
.
4
7
0
.
6
4
*
0
.
4
3
E
n
t
e
r
t
a
i
n
m
e
n
t
0
.
4
4
*
0
.
3
7
0
.
4
7
*
0
.
2
8
*
0
.
3
7
2
0
.
3
1
2
0
.
3
1
2
0
.
1
3
*
2
0
.
5
5
*
2
0
.
3
6
A
t
m
o
s
p
h
e
r
e
0
.
4
3
0
.
3
7
0
.
4
7
*
0
.
3
7
*
0
.
4
0
2
0
.
3
4
2
0
.
3
2
2
0
.
1
1
*
2
0
.
3
4
2
0
.
2
9
L
o
c
a
l
s
0
.
6
9
*
0
.
5
8
0
.
6
9
*
0
.
4
7
*
0
.
5
9
0
.
3
2
*
0
.
2
7
0
.
4
8
*
2
0
.
0
6
*
0
.
2
1
S
u
n
a
n
d
w
a
t
e
r
/
s
n
o
w
0
.
4
7
0
.
4
3
0
.
3
1
*
0
.
4
8
*
0
.
4
3
2
0
.
2
3
2
0
.
1
6
2
0
.
6
1
*
2
0
.
0
5
*
2
0
.
2
2
C
o
z
y
n
e
s
s
0
.
4
8
*
0
.
5
3
0
.
4
3
*
0
.
6
5
*
0
.
5
4
2
0
.
2
0
*
0
.
1
5
2
0
.
2
7
*
0
.
4
1
*
0
.
0
8
O
r
g
a
n
i
z
e
d
0
.
2
9
0
.
2
6
0
.
3
4
*
0
.
2
9
0
.
2
9
2
0
.
7
1
*
2
0
.
6
5
2
0
.
5
4
2
0
.
5
6
2
0
.
6
1
C
h
i
l
d
c
a
r
e
0
.
1
7
*
0
.
1
2
0
.
0
6
*
0
.
1
2
0
.
1
2
2
1
.
0
5
2
1
.
0
2
2
1
.
3
9
*
2
1
.
0
2
*
2
1
.
1
0
M
a
i
n
t
a
i
n
n
a
t
u
r
e
0
.
8
3
*
0
.
7
0
0
.
6
4
*
0
.
7
0
0
.
7
2
0
.
7
1
*
0
.
6
3
0
.
3
4
*
0
.
5
5
0
.
5
7
S
a
f
e
t
y
0
.
8
4
*
0
.
6
7
*
0
.
7
9
0
.
7
7
0
.
7
6
0
.
7
5
0
.
5
5
*
0
.
7
7
0
.
7
7
0
.
7
1
N
o
t
e
:
*
S
i
g
n
i
?
c
a
n
t
l
y
d
i
f
f
e
r
e
n
t
f
r
o
m
t
h
e
o
v
e
r
a
l
l
a
v
e
r
a
g
e
a
t
t
h
e
p
¼
0
.
0
5
l
e
v
e
l
Table II.
Raw and standardized
mean answers of
respondents by region of
origin and overall
average
IJCTHR
1,2
152
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
that in 24 of the possible differences between countries (27 percent), neither raw nor
standardized data renders signi?cant results. Thirty-eight differences (45 percent) are
signi?cant when tested on the basis of both raw and standardized data. In 10 tests
(12 percent), the standardized data renders signi?cant results, while the raw data does
not; and the opposite holds for 12 comparisons (14 percent). This leads to an overall
proportion of 74:26 with respect to the number of tests that returned the same results
(low risk) as opposed to those that returned different results (high risk).
While these differences may seem academic, based on the above discussion, the
practical relevance becomes very clear, considering that national tourism
organizations typically use these kinds of nation pro?les to develop communication
strategies to attract tourists from certain countries. If market structure analysis used
the raw data as a basis, the test would indicate that French tourists are signi?cantly
more interested in unspoiled nature than other tourists. The national tourist
organization might use this information to develop a large, expensive “unspoiled
nature” advertising campaign. However, the standardized data shows no signi?cance,
indicating that the French are no more or less interested in unspoiled nature than are
other tourists. This comparison illustrates that the contamination of empirical data by
CSRS is a serious problem that can lead to misinterpretation of results, because
the conclusions drawn could depend on the chosen correction technique – or possibly
the incorrect decision to ignore possible contamination by response styles and
analyze the raw data only.
Application of the proposed robustness-based approach
Using the same data set, the next illustration demonstrates how the proposed
robustness-based approach can help researchers to reduce the uncertainty revealed
above, which is inherent in any data-analytic problem where the true values are
unknown. The method follows uses the four steps outlined above: ?rst (Step 1), the
researcher chooses a set of correction techniques appropriate for the problem. Because,
the aim of this analysis is a cross-cultural analysis, where the mean values on the
attributes are compared for the different cultures, double standardization removes the
very differences that are of interest, making the process inappropriate. Because, only
univariate comparisons are made, within-group standardization does not in?uence the
results. The set of appropriate correction techniques therefore contains within-subject
and within-culture standardization, where correction uses either means and/or
standard deviations.
Next (Step 2), the researcher corrects the raw values with respect to these six
techniques. Then (Step 3), the researcher tests cross-cultural differences using t-tests
for the raw data and all transformed data sets.
Finally, (Step 4), the researcher computes the CSRS robustness indices and classi?es
each variable (motivation item contained in the questionnaire) as LRI, vLRI or HRI.
This classi?cation provides a quick insight into how problematic CSRS are for the
given dataset, and allows decision making for the subset of items that are classi?ed as
LRI or vLRI. A further investigation of the HRIs will be necessary if they either include
items central for the strategy to be chosen, or if most items are classi?ed as HRI. In this
case, the researcher might subjectively decide also to include HRIs with a very high
robustness index, that is, when most datasets indicate a signi?cant difference.
Assessing
analytical
robustness
153
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Classi?cation as an LRI indicates that the cross-cultural comparison of this
particular item has rendered signi?cant differences across the four regions for all data
sets, the raw data and all six data sets containing different appropriate corrections for
response styles. The researcher must additionally check that the differences for these
items have the same sign for all datasets. If so, then these cross-cultural differences can
be interpreted safely as ?ndings from the study.
Classi?cation as a vLRI indicates that none of the analyses based on different
underlying data sets led to the conclusion that the four regions differ with respect to
this particular travel motivation. Again, the researcher can safely conclude the
non-existence of difference in this case.
Classi?cation as an HRI indicates that the tests based on the raw data and/or
different corrections did not lead to the same results with respect to whether
differences exist among the four regions. Thus, cross-cultural differences with respect
to these items should be interpreted with care, because response styles or chosen
corrections thereof are likely to in?uence the results. Classi?cation as an HRI allows
the researcher to make statements about how many cases the tests for difference
between cultures led to signi?cant results, and in how many cases they did not. This
information is contained in Figure 2 in parentheses. For instance, in the case of the
motivational item “rest and relax” the French respondents differed from the other three
regions three times, and did not differ four times. This is the worst possible quota,
because half the analyses claim differences and half do not. In the case of “creativity,”
Viennese respondents differed from the remaining regions once, and did not differ six
times, which indicates only one disagreeing test result.
Regardless, HRIs present a challenge, because the researcher cannot safely draw
any clear conclusions to report in an academic publication or to a client such as the
NTO. In such a situation, the researcher has the option to report the discrepancy
openly, or to undertake further qualitative research work for the regions under study
with respect to the relevant HRIs. The qualitative study would have to try to determine
the nature and extent of response styles with respect to the particular items in order to
be able to validate the conclusion externally. Such an approach is expensive, and
requires much time for further investigation. Therefore, such qualitative work is not
reasonable for all items and all regions a priori. However, for selected items that
emerge as endangered by misinterpretation, such a follow-up study may well be a
viable option.
Figure 2 shows all results for the Austrian NTO illustration. To enable a quick
overview of the ?ndings, cells in this table are shaded according to these
classi?cations: LRIs are black, vLRIs are grey and HRIs are white. Optimally, this
table would contain no white cells, indicating that conclusions with respect to
cross-cultural differences can be safely drawn for all items. The higher the proportion
of black cells in the table, the higher the extent of cross-cultural differences across all
items and all regions.
In the empirical case shown in Figure 2, 36 motivation items are classi?ed as LRIs
(43 percent), and 19 as vLRIs (23 percent). The sign of the co-ef?cients are checked for
all LRIs, and they are consistent over all data sets, which means that all the tests that
found differences found the same regions to perceive a particular motivational item as
more important. Therefore, the LRIs can be safely interpreted and used as a basis for
marketing activities. Twenty-nine motivational items (35 percent) are classi?ed as HRI,
IJCTHR
1,2
154
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Figure 2.
Robustness of
cross-cultural ?ndings
France Italy USA Vienna
Rest and relax HRI (3:4) vLRI LRI LRI
Comfort HRI (1:6) HRI (3:4) vLRI HRI (2:5)
Sports vLRI vLRI HRI (4:3) HRI (4:3)
Excitement HRI (1:6) LRI LRI LRI
Creativity HRI (4:3) vLRI HRI (6:1) HRI (2:5)
Culture LRI LRI LRI LRI
Fun HRI (1:6) HRI (4:3) LRI LRI
Good company vLRI LRI LRI LRI
Unspoiled nature HRI (4:3) HRI (1:6) HRI (2:5) HRI (3:4)
Health and beauty HRI (5:2) vLRI LRI LRI
Surroundings LRI LRI HRI (5:2) vLRI
Free and easygoing LRI LRI HRI (2:5) LRI
Entertainment HRI (3:4) vLRI LRI LRI
Atmosphere vLRI vLRI LRI HRI (3:4)
Locals LRI vLRI LRI LRI
Sun and water/snow vLRI vLRI LRI LRI
Cozyness LRI vLRI LRI LRI
Organized HRI (4:3) vLRI HRI (3:4) vLRI
Child care HRI (3:4) HRI (2:5) LRI HRI (1:6)
Maintain nature LRI vLRI LRI vLRI
Safety HRI (3:4) LRI HRI (2:5) HRI (2:5)
Note: For HRIs the number of significant (S) to insignificant (I )
differences are indicated in brackets by S:I. The seven different
correction techniques are used: raw, within-subject and within-
culture using means and/or standard deviations
Assessing
analytical
robustness
155
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
and should not be interpreted without explicitly reporting the potential impact of
response styles and/or response style corrections on the conclusions, or without
conducting a qualitative follow-up study to assess in detail the nature of the response
style at work.
Reviewing the motivational items discussed in the context of the illustration of
possible misinterpretation, Figure 2 shows helpful insights to the Austrian NTO
managers: “health and beauty” for the French tourists and “surroundings” for the
American tourists cannot be considered safe advertising messages, because both are
classi?ed as HRIs. For the American respondents, “child care” emerges as a motivation
item that can safely be interpreted as being signi?cantly different from those of other
regions. Unfortunately, American tourists are signi?cantly less interested in child care,
thus, making this item useless from a marketing point of view. This is not the case for
“free and easygoing” for the Viennese tourists. This aspect is signi?cantly more
important to the Viennese than to other tourists, and would therefore be suitable for an
advertising campaign targeting the Viennese.
Conclusions
Response style effects are a serious concern in empirical research throughout all
disciplines of the social sciences, and are systematically associated with the
respondents’ country of origin or cultural background. Consequently, data sets that
consist of respondents from different countries of origin are in danger of
misinterpreting differences between countries of origin as substantial differences
with respect to the construct under study, rather than as the result of a CSRS.
Spanish respondents answers may be much more satis?ed with a hotel and have a
much higher intention to return than do Chinese respondents. However, this view likely
re?ects a CSRS, given that respondents of Hispanic background often prefer extreme
answers, whereas Chinese respondents have the opposite tendency.
Being aware of the existence and potential distorting effect of CSRS on empirical
study results, knowing how to detect them and – if necessary – correcting
respondents’ answers for CSRS, is essential for any empirical marketing researcher
engaging in the study of cross-cultural comparisons, or any other empirical research
based on multicultural data.
Currently, two frequently used ways of dealing with CSRS exist: to ignore them and
analyze uncorrected data, or to choose one of many correction techniques, transform
the original data accordingly and analyze the transformed data. The problem with the
?rst approach is that CSRS will probably contaminate the raw data, which would
signi?cantly in?uence the results of the analyses. The problem with the second
approach is that any transformation of data is based on assumptions that may not
actually be assured or appropriate, in which case the transformation is likely either to
introduce new systematic contamination, or eliminate content-related information that
would have been needed for the cross-cultural comparison.
This paper proposes a robustness-based approach requiring four steps:
(1) The selection of a set of correction techniques appropriate for the problem and
data at hand.
(2) The correction of raw values according to all chosen correction techniques.
IJCTHR
1,2
156
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
(3) Testing of cross-cultural differences on the basis of the raw data and all
transformed data sets.
(4) Computing of CSRS robustness indices.
These steps should enable researchers to assess which cross-cultural conclusions can
be safely drawn and which are endangered by the unclear effects of CSRS. Findings,
therefore, should either not be reported as ?rm, or be further evaluated in a follow-up
qualitative study.
Reliability is not a substitute for validity. However, the correction for response
styles does not address the problem of validity. Validity needs to be ensured by asking
the correct questions and testing the constructs under investigation for equivalence.
The approach does not compensate for bad survey questions; rather, the proposal aims
to assess analytical robustness of ?ndings, assuming that the questions asked were
well developed and suf?ciently valid to measure what they were intended to measure.
CSRS robustness indices are useful for grouping items into four categories; ?ndings
concluding differences between countries may be not signi?cant based on both raw
and corrected answers of respondents. This represents the lowest-risk situation, and
attracts the assumption that the two countries’ respondents do not differ in, for
instance, travel motivations.
Tests could also render signi?cant results for both the raw and corrected values.
While still a low-risk situation, the only insecurity here is that one test may indicate
that, for instance, French respondents are more motivated by “rest and relax,” while
the other test may indicate that French respondents are less motivated by “rest and
relax.” After ensuring that the direction of the difference is the same, the researcher can
assume that a difference exists which is not merely the result of the contamination of
data with CSRS. All remaining situations are more problematic, because one test result
indicates a difference and the other states the opposite. If the proportion of
questionnaire items identi?ed as HRI is low, the analysis can proceed without drawing
too strong conclusions about HRIs, and instead focus on insights based on vLRIs and
LRIs. However, if the proportion of HRIs is high, the researcher might also decide to
include HRIs with a high CSRS robustness index, that is, where a difference is
indicated for nearly all datasets.
Regardless, researchers should describe the chosen option in detail in any report in
order to ensure that readers do not overestimate the value of conclusions based on
HRIs.
The empirical example based on real data from a national guest survey illustrate the
gravity of potential mistakes, while simultaneously demonstrating how empirical
researchers could assess which variables allow reliable conclusions, and which may be
contaminated by CSRS using the proposed robustness-based approach.
Making more use of binary answer formats that are not as susceptible to CSRS as
are ordinal answer formats is another option for reducing the contamination level of
data. Cronbach (1950, p. 21) is the most prominent proponent of this option: “Since,
response sets are a nuisance, test designers should avoid forms of items which
response sets infest.” Binary data format is unsuitable for some constructs (such as the
evaluation of one’s own personality), where the rater is highly familiar with the rating
object, and reasonably assumes that a very precise evaluation is possible. However,
other constructs, such as intentions to visit, are much more suitable for binary scales,
Assessing
analytical
robustness
157
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
because a binary choice forms the underlying construct. Advantages of binary scales
have been reported by numerous researchers (Dolnicar, 2003; Jacoby and Matell, 1971;
Komorita and Graham, 1965; Matell and Jacoby, 1971; Mazanec, 1984; Peabody, 1962),
and should not be discarded as an option just because ordinal (typically, Likert-scaled
answer formats) were more popular in the past, given the large number of reported
methodological shortcomings of ordinal sales (Kampen and Swyngedouw, 2000).
References
Arce-Ferrer, A.J. and Ketterer, J.J. (2003), “The effect of scale tailoring for cross-cultural
application on scale reliability and construct validity”, Educational and Psychological
Measurement, Vol. 63 No. 3, pp. 484-501.
Bachman, J.G. and O’Malley, P.M. (1984), “Yea-saying, nay-saying and going to extremes:
black-white differences in response style”, Public Opinion Quarterly, Vol. 48, pp. 491-509.
Bartram, D. (1996), “The relationship between ipsatized and normative measures of personality”,
Journal of Occupational & Organizational Psychology, Vol. 69, pp. 25-39.
Baumgartner, H. and Steenkamp, J.B.E.M. (2001), “Response styles in marketing research:
a cross-national investigation”, Journal of Marketing Research, Vol. 38 No. 2, pp. 143-56.
Bhalla, G. and Lin, L.Y.S. (1987), “Cross-cultural marketing research: a discussion of equivalence
issues and measurement strategies”, Psychology & Marketing, Vol. 4 No. 4, pp. 275-85.
Chan, W. (2003), “Analyzing ipsative data in psychological research”, Behaviormetrika, Vol. 30
No. 1, pp. 99-121.
Chen, C., Lee, S. and Stevenson, H.W. (1995), “Response style and cross-cultural comparison of
rating scales among East Asian and North American students”, Psychological Science,
Vol. 6 No. 3, pp. 170-5.
Cheung, G.W. and Rensvold, R.B. (2000), “Assessing extreme and acquiescence response sets in
cross-cultural research using structural equation modeling”, Journal of Cross-Cultural
Psychology, Vol. 31 No. 2, pp. 187-212.
Chun, K.T., Campbell, J.B. and Yoo, J.H. (1974), “Extreme response style in cross-cultural
research”, Journal of Cross-Cultural Psychology, Vol. 5 No. 4, pp. 465-80.
Cronbach, L.J. (1950), “Further evidence on response sets and test design”, Educational and
Psychological Measurement, Vol. 10, pp. 3-31.
Das, J.P. and Dutta, T. (1969), “Some correlates of extreme response set”, Acta Psychologica,
Vol. 29 No. 1, pp. 85-92.
Dolnicar, S. (2003), “Simplifying three-way questionnaires – do the advantages of binary answer
categories compensate for the loss of information?”, ANZMAC CD Proceedings 2003.
Drasgow, F. (1987), “Study of the measurement bias of two standardized psychological tests”,
Journal of Applied Psychology, Vol. 72 No. 1, pp. 19-29.
Fischer, R. (2004), “Standardization to account for cross-cultural response bias – a classi?cation
of score adjustment procedures and review of research”, Journal of Cross-Cultural
Psychology, Vol. 35 No. 3, pp. 263-82.
Greenleaf, E.A. (1992a), “Improving rating scale measures by detecting and correcting bias
components in some response styles”, Journal of Marketing Research, Vol. 29, pp. 176-88.
Greenleaf, E.A. (1992b), “Measuring extreme response style”, Public Opinion Quarterly, Vol. 56
No. 3, pp. 328-51.
IJCTHR
1,2
158
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Huang, C.D., Church, A.T. and Katigbak, M.S. (1997), “Identifying cultural differences in items
and traits: differential item functioning in the NEO personality inventory”, Journal of
Cross-Cultural Psychology, Vol. 28 No. 2, pp. 192-218.
Hui, C.H. and Triandis, H.C. (1989), “Effects of culture and response format on extreme response
style”, Journal of Cross-Cultural Psychology, Vol. 20 No. 3, pp. 296-309.
Jacoby, J. and Matell, M.S. (1971), “Three-point Likert scales are good enough”, Journal of
Marketing Research, Vol. 8, pp. 495-500.
Johnson, T., Cho, Y.I. and Shavitt, S. (2005), “The relation between culture and response styles –
evidence from 19 countries”, Journal of Cross-Cultural Psychology, Vol. 36 No. 2, pp. 264-77.
Kampen, J. and Swyngedouw, M. (2000), “The ordinal controversy revisited”, Quality & Quantity,
Vol. 34 No. 1, pp. 87-102.
Komorita, S.S. and Graham, W.K. (1965), “Number of scale points and the reliability of scales”,
Educational and Psychological Measurement, Vol. 25 No. 4, pp. 987-95.
Kozak, M., Bigne, E. and Andreu, L. (2003), “Limitations of cross-cultural customer satisfaction
research and recommending alternative methods”, Journal of Quality Assurance in
Hospitality and Tourism, Vol. 4 Nos 3/4, pp. 37-59.
Kruskal, J. (1977), “The relationship between multidimensional scaling andclustering”, inRyzin, J.V.
(Ed.), Classi?cation and Clustering, Academic Press Inc., New York, NY, pp. 17-44.
Leung, K. and Bond, M.H. (1989), “On the empirical identi?cation of dimensions for cross-cultural
comparisons”, Journal of Cross-Cultural Psychology, Vol. 20 No. 2, pp. 133-51.
Marin, G., Gamba, R.J. and Marin, B.V. (1992), “Extreme response style and acquiescence among
Hispanics – the role of acculturation and education”, Journal of Cross-Cultural Psychology,
Vol. 23 No. 4, pp. 498-509.
Matell, M.S. and Jacoby, J. (1971), “Is there an optimal number of alternatives for Likert scale
items? Study 1: reliability and validity”, Educational and Psychological Measurement,
Vol. 31, pp. 657-74.
Mazanec, J.A. (1984), “How to detect travel market segments: a clustering approach”, Journal of
Travel Research, Vol. 23 No. 1, pp. 17-21.
Paulhus, D.L. (1991), “Measurement and control of response bias”, in Robinson, J.P., Shaver, P.R.
and Wrightsman, L.S. (Eds), Measures of Personality and Social Psychological Attitudes,
Academic Press, San Diego, CA, pp. 17-59.
Paunonen, S.V. and Ashton, M.C. (1998), “The structured assessment of personality across
cultures”, Journal of Cross-Cultural Psychology, Vol. 29 No. 1, pp. 150-70.
Peabody, D. (1962), “Two components in bipolar scales: direction and extremeness”,
Psychological Review, Vol. 69 No. 2, pp. 65-73.
Roster, C.A., Rogers, R. and Albaum, G. (2003), “A cross-cultural/national study of respondents’
use of extreme categories for rating scales”, Proceedings of the Ninth Annual Cultural
Research Conference 2003.
Sekaran, U. (1983), “Methodological and theoretical issues and advancements on cross-cultural
research”, Journal of International Business Studies, Vol. 14 No. 2, pp. 61-73.
Shiomi, K. and Loo, R. (1999), “Cross-cultural response styles an the Kirton
adaptation-innovation inventory”, Social Behaviour and Personality, Vol. 27 No. 4,
pp. 413-20.
Si, S.X. and Cullen, J.B. (1998), “Response categories and potential cultural biases: effects of an
explicit middle point in cross-cultural surveys”, International Journal of Organizational
Analysis, Vol. 6 No. 3, pp. 218-30.
Assessing
analytical
robustness
159
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
Smith, A.M. and Reynolds, N.L. (2002), “Measuring cross-cultural service quality: a framework
for assessment”, International Marketing Review, Vol. 19 Nos 4/5, pp. 450-81.
Triandis, H.C. and Triandis, L.M. (1962), A Cross-Cultural Study of Social Distance, American
Psychological Association, Washington DC.
van Herk, H., Poortinga, Y.H. and Verhallen, T.M.M. (2004), “Response styles in rating scales –
evidence of method bias in data from six EU countries”, Journal of Cross-Cultural
Psychology, Vol. 35 No. 3, pp. 346-60.
Watson, D. (1992), “Correcting for acquiescent response bias in the absence of a balanced scale:
an application to class consciousness”, Sociological Methods & Research, Vol. 21 No. 1,
pp. 52-88.
Yates, J.F., Lee, L.W. and Bush, J.G. (1997), “General knowledge overcon?dence: cross-national
variations, response style and reality”, Organizational Behaviour and Human Decision
Processes, Vol. 70 No. 2, pp. 87-94.
Corresponding author
Sara Dolnicar can be contacted at: [email protected]
To purchase reprints of this article please e-mail: [email protected]
Or visit our web site for further details: www.emeraldinsight.com/reprints
IJCTHR
1,2
160
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)
This article has been cited by:
1. Bettina Grün, Sara Dolnicar. 2015. Response style corrected market segmentation for ordinal data.
Marketing Letters . [CrossRef]
2. Romana Trusinová. 2014. No two ageisms are the same: testing measurement invariance in ageism
experience across Europe. International Journal of Social Research Methodology 17, 659-675. [CrossRef]
3. Natalia D. Kieruj, Guy Moors. 2013. Response style behavior: question format dependent or personal
style?. Quality & Quantity 47, 193-211. [CrossRef]
4. Nuray Selma Ozdipciner, Xiangping Li, Muzaffer Uysal. 2012. Cross?cultural differences in purchase
decision?making criteria. International Journal of Culture, Tourism and Hospitality Research 6:1, 34-43.
[Abstract] [Full Text] [PDF]
5. N. D. Kieruj, G. Moors. 2010. Variations in Response Style Behavior by Response Scale Format in Attitude
Research. International Journal of Public Opinion Research 22, 320-342. [CrossRef]
6. Sara Dolnicar. 2007. Management learning exercise and trainer's note for market segmentation in tourism.
International Journal of Culture, Tourism and Hospitality Research 1:4, 289-295. [Abstract] [Full Text]
[PDF]
D
o
w
n
l
o
a
d
e
d

b
y

P
O
N
D
I
C
H
E
R
R
Y

U
N
I
V
E
R
S
I
T
Y

A
t

2
2
:
0
2

2
4

J
a
n
u
a
r
y

2
0
1
6

(
P
T
)

doc_752108153.pdf

Assessing analytical robustness in cross cultural comparisons

Attachments