Description
We examine judgmental effects of the balanced scorecard’s organization. The balanced scorecard contains a large
number of performance measures divided into four categories. We examine whether the scorecard’s organization
results in managerial performance evaluation judgments consistent with a recognition of the potential relations (i.e.
nonindependence) of measures within a category. Supporting this idea, we find that performance evaluations are
affected by organizing the measures into the balanced scorecard categories when multiple below-target (or above-target)
measures are contained within a category but that evaluations are not affected when the above/below-target measures
are distributed across the scorecard’s four categories.
A note on the judgmental e?ects of the balanced scorecard’s
information organization
Marlys Gascho Lipe
a,
*, Steven Salterio
b
a
University of Oklahoma, Price College of Business, Norman, OK 73019, USA
b
University of Waterloo, School of Accountancy, Waterloo, Ontario, Canada N2L 3G1
Abstract
We examine judgmental e?ects of the balanced scorecard’s organization. The balanced scorecard contains a large
number of performance measures divided into four categories. We examine whether the scorecard’s organization
results in managerial performance evaluation judgments consistent with a recognition of the potential relations (i.e.
nonindependence) of measures within a category. Supporting this idea, we ?nd that performance evaluations are
a?ected by organizing the measures into the balanced scorecard categories when multiple below-target (or above-tar-
get) measures are contained within a category but that evaluations are not a?ected when the above/below-target mea-
sures are distributed across the scorecard’s four categories. # 2002 Elsevier Science Ltd. All rights reserved.
1. Introduction
In the early 1990s, Robert Kaplan and David
Norton (1992) developed a management and
measurement tool called the Balanced Scorecard
(BSC). The BSC lists a diverse set of performance
measures grouped in four categories: ?nancial
performance, customer relations, internal business
processes, and learning and growth activities
(Kaplan & Norton, 1992). Kaplan and Norton
(1996a) encourage the inclusion of 4–7 measures in
each category. Thus, ?rms adopting the BSCusually
increase the number of performance measures they
use and identify a much broader group of mea-
sures than those they have traditionally used.
The stated purpose in developing a managerial
tool that includes a large number and broad group
of performance measures is to improve managerial
decision making. While determining whether the
BSC improves managers’ judgments and decisions
can be di?cult, a reasonable starting point is to
determine whether and how the BSC a?ects these
judgments. Prior judgment and decision making
research provides evidence of human information
processing limitations and decision strategies. We
describe and test how these will a?ect use of the
BSC and resulting judgments.
Research in cognitive psychology shows that
people are generally unable to process more than
7–9 items of information simultaneously (Badde-
ley, 1994; Miller, 1956). The BSC contains many
more measures than this limit, suggesting that man-
agers will ?nd it di?cult to utilize the information in
the scorecard. However, the four category organi-
zation of the BSC may assist managers’ use of this
large volume of measures by suggesting a way to
combine and use the data. Speci?cally, decision
makers may use a ‘divide and conquer’ strategy
(Shanteau, 1988) where measures within each
0361-3682/01/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved.
PI I : S0361- 3682( 01) 00059- 9
Accounting, Organizations and Society 27 (2002) 531–540
www.elsevier.com/locate/aos
* Corresponding author. Tel.: +1-405-325-2293; fax: +1-
405-325-7348.
E-mail addresses: [email protected] (M.G. Lipe), sesalterio@
uwaterloo.ca (S. Salterio).
category are used to make an assessment of the
category and these four assessments are then com-
bined. In assessing each category, decision makers
are primed to see relations among the measures
within each group (Hopkins, 1996). When perfor-
mance on measures within a group is consistent
(e.g. consistently above-target), the decision maker
may perceive that the measures are related (i.e. not
independent) and consequently, reduce the impact
of the individual measures on his or her judgment.
In contrast, when the same measures are presented
without the organizing BSC categories (or are
scattered across BSC categories), the perception of
relations among these measures and the resulting
reduction in decision weights are less likely.
Our results show that when multiple measures
within a BSC category show consistent perfor-
mance (e.g. above-target), managers’ evaluation
judgments are reliably di?erent from evaluations
made using these same measures without the BSC
format. These judgment di?erences disappear
when the measures indicating strong performance
are distributed throughout the four BSC cate-
gories instead of being found in a single BSC
category. Although it is di?cult to state with cer-
tainty that the BSC results in judgment improve-
ments, this study provides evidence that the BSC
has predictable and understandable e?ects on
judgment. While these grouping e?ects may occur
with other types of categorizations, other group-
ings have not received the same kind of attention
as those in the BSC.
The remainder of the paper is organized as fol-
lows. In the next section we will brie?y describe
the BSC, review applicable judgment and decision
making research, and present a two-part research
prediction. Section three describes the experi-
mental work used to test our research predictions
and the ?nal section summarizes the conclusions
that can be drawn from the study.
2. Background
2.1. The balanced scorecard
In a best-selling book Kaplan and Norton
(1996a) describe the methods and procedures
necessary for implementing a BSC. The BSC,
according to Kaplan and Norton, should contain
measures related to ?nancial performance (e.g.
return on assets), customer relations (e.g. custo-
mer satisfaction surveys), internal business pro-
cesses (e.g. process e?ciency measures), and
learning and growth in the organization (e.g.
employee capability measures). Kaplan and Nor-
ton (1993, 1996b) view the scorecard as a strategic
management tool that should explicate the drivers
of performance, as well as provide measures of
performance. This study focuses on the score-
card’s use in evaluation and decision making.
2.2. Cognitive limitations and the divide and
conquer decision strategy
The balanced scorecard with its large number of
performance measures presents a complex task to
a manager asked to use the scorecard to evaluate a
division’s performance. The manager could, theo-
retically, weight and combine the many measures
into an overall evaluation of the business unit but
this is, cognitively, a very di?cult thing to do.
Research in cognitive psychology has repeatedly
shown that humans are able to retain and use only
a small number of items in working memory
(Baddeley, 1994; Miller, 1956). With this limit on
working memory, holding 20 or more individual
measures in one’s head and mentally manipulating
them simultaneously is extremely di?cult, if not
impossible. Thus, the volume of data in a balanced
scorecard suggests that it may overload human
decision makers with information.
The balanced scorecard’s four categories suggest
a way for managers to mentally organize the large
number of performance measures that may miti-
gate this cognitive di?culty. Prior studies show
that information processing and judgments are
a?ected by information organization (Bettman &
Kakkar, 1977; Payne, Bettman, & Johnson, 1993)
and by the hierarchies or relations among infor-
mation items contained in a decision task (Klein-
muntz & Schkade, 1993). For example, Hopkins
(1996) showed that placing an item (e.g. preferred
stock) in a particular category (e.g. liabilities)
caused experienced professionals to perceive that
the item was related to others in the category.
532 M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540
These studies suggest that when data items are
grouped in ways meaningful to the decision
maker, they may be combined prior to further use
(Chase & Simon, 1973). Shanteau (1988) describes
this method of using information as ‘divide and
conquer.’ The information is divided into groups,
an assessment can be made of each group, and
these assessments can then be combined. The
organization of the BSC lends itself quite naturally
to this kind of mental approach.
2.3. Perceived relations among measures
When using the BSC, the initial stage of the
divide and conquer decision strategy is to use
measures within a category to assess performance
in that area (e.g. ?nancial performance). Since the
measures have been grouped together, the decision
maker will be expecting and seeking relations
between them (Hopkins, 1996; Maines & McDan-
iel, 2000). If performance on these measures
con?rms this expectation (e.g. by indicating
consistently good performance), the decision
maker may reasonably reduce the decision weight
placed on each individual measure due to per-
ceived correlations (nonindependence) of the
measures (Banker & Datar, 1989; Feltham & Xie,
1994). In contrast, if measures indicating good
performance are scattered across BSC categories
(or contained in uncategorized lists of measures),
the decision maker is less likely to expect and per-
ceive these measures to be correlated and to make
consequent reductions to their decision weights.
This is consistent with ?ndings in psychology that
people ?nd it di?cult to recognize that correla-
tions exist unless they have theories suggesting
such relations (Jennings et al., 1982) and with
Maines’ (1990) ?ndings in accounting that judg-
mental discounting for information redundancy
(i.e. correlation) does not occur unless the judge is
alerted to the presence of such relations (see,
especially, her experiment three).
This suggests that judgments made with the
BSC will di?er from those made with uncategor-
ized lists of measures in particular situations:
those cases where performance on measures within
a category are consistent (i.e. consistently above-
target or consistently below-target). Additionally,
the above discussion suggests that judgments made
with the BSC will not di?er from those made with
uncategorized lists of measures in situations where
multiple above-target (or below-target) measures
are scattered across the BSC categories.
Although performance results on the twenty or
more performance measures in a BSC may take on
any number of patterns, we will test the impact of
only the two extreme patterns described above.
That is, we will consider a situation where multiple
above-target (or below-target) measures are con-
tained within one BSC category and then we will
contrast that with the situation where the above-
target (below-target) measures are distributed
across categories. For these two situations, we will
compare the judgments for decision makers with
the BSC to those of decision makers using the
same measures without the BSC categories. Our
research predictions are:
Evaluations using the balanced scorecard will
di?er from evaluations based on the same mea-
sures without the scorecard organization,
depending on the pattern of performance across
categories. Speci?cally:
1. judgments are likely to be moderated when
multiple above-target (or below-target)
measures are contained in a single BSC
category but,
2. judgments are unlikely to be a?ected when
multiple above-target (or below-target)
measures are distributed throughout the
BSC categories.
The next section describes the experiments and the
test results.
3. Method and results
3.1. Overview of experiments
Participants are presented with a case where
they are asked to take the role of a senior executive
of WCS Incorporated, a ?rm specializing in retail-
ing women’s apparel. WCS has multiple divisions,
the two largest of which are the focus of the case
M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540 533
materials. The case introduces the managers of the
two business units and the strategies of the units
are described. Multiple performance measures are
presented in patterns and formats depending on
the experimental treatment as described below.
The participant is then asked to evaluate the per-
formance of each of the two unit managers on a
scale with seven descriptive labels and numerical
endpoints of ‘‘0’’ and ‘‘100’’ (see Table 1 for a
sample evaluation form).
After providing the manager evaluations, the
participants complete a questionnaire. This
questionnaire asks for demographic information,
provides manipulation checks (discussed further
in Sections 3.2.3 and 3.3.2), and gathers data
regarding task di?culty, realism, and under-
standability.
In both experiments the two divisions described
are RadWear and PlusWear, retail divisions spe-
cializing in clothing for the urban teenager and in
large-sized clothing, respectively. The participants
are informed that management believes the per-
formance measures for each division are appro-
priate for retailers and capture the two di?erent
strategies.
3.2. Experiment one
In experiment one we focus on whether the BSC
format makes a di?erence in divisional manager
performance evaluation when particularly good or
bad performance is contained in one BSC cate-
gory. In this situation, we predict that the BSC
categorization primes the evaluator to perceive
consistent performance as evidence of correlation
among measures, which may reduce the impact of
the individual good or bad measures. This per-
ceived correlation will moderate judgments rela-
tive to those made without the BSC organization.
3.2.1. Subjects, design, and procedures
Seventy-eight MBA students served as experi-
mental participants. The students had, on average,
4 years of work experience and 62% were male.
All participants received a diverse set of per-
formance measures, a description of how the
Table 1
Sample evaluation form employed in both experiments
WCS Inc.
Initial Evaluation Form
Year: 1996
Manager: Chris Peters
Division: RadWear
Evaluator:
1. Indicate your initial performance evaluation for this manager by placing an ‘X’ somewhere on the scale below. Note that some label
interpretations are provided below.
Excellent: far beyond expectations, manager excels
Very good: considerably above expectations
Good: somewhat above expectations
Average: meets expectations
Poor: somewhat below expectations, needs some improvement
Very Poor: considerably below expectations, needs considerable improvement
Reassign: su?cient improvement unlikely
534 M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540
measures were calculated, and the comparison of
each measure to its expectation or target for each
of the two divisions (see Table 2 for the BSC ver-
sion of the task).
1
Further, all participants were
told that the performance measures were ‘‘care-
fully chosen to represent important aspects of a
business unit[’s performance]’’ and were ‘‘drivers
of the unit’s success and linked to its strategy and
mission.’’
The between-subjects (Ss) manipulation was the
organization of the performance measures. The
BSC group received the 20 measures divided into
the four BSC categories (?nancial measures, cus-
tomer satisfaction measures, operational mea-
sures, and learning measures) while other
participants received the same set of 20 measures
without the BSC format (NOFORM group). For
the NOFORM group the measures were presented
in one of two orders, alphabetical or random.
2
In
addition to the format manipulation across sub-
ject groups, the order of presentation of the two
divisions (i.e. RadWear and PlusWear) was coun-
terbalanced across subjects within each format
group.
For all participants, the ?nancial measures indi-
cated that performance was somewhat above
expectations for both divisions (note in Table 2
that two ?nancial measures were above-targets,
Table 2
RadWear balanced scorecard
a
(PlusWear items in parentheses)
Measure Target Actual
Financial
1. Return on sales 24% (22) 25% (23)
2. Sales growth 35% (30) 38% (33)
3. New store sales (new lines sales) 30% (25) 26% (22)
4. Market share relative to retail space $80 (70) $80 (70)
5. Return on expenses 42% (36) 42% (36)
Customer-related
1. Repeat sales 30% (40) 33% (36)
2. Customer satisfaction rating 95 (97) 96 (96)
3. Mystery shopper program rating 96 (96) 98 (94)
4. Returns by customers as % of sales 10% (7) 9% (8)
5. Out of stock items 10% (14) 10% (14)
Internal business processes
1. Average major brand names/store (average % of product range) 32 (88%) 34 (90%)
2. Sales from new market leaders (sales from top brand names) 25% (28) 22% (25)
3. Returns to suppliers 5% (3) 5% (3)
4. Average markdowns 15% (12) 15% (12)
5. Voided sales transactions 3 (2) 3 (2)
Learning and growth
1. Hours of employee training/employee 10 (8) 11 (9)
2. Average tenure of sales personnel 1.4 (2.1) 1.2(1.9)
3. Employee suggestions/employee 2 (2) 2 (2)
4. Sales personnel taking manager test 30% (36) 30% (36)
5. Stores computerizing 85% (85) 85% (85)
a
DIFFerent measures are indicated here in bold.
1
Participants received separate exhibits for RadWear and
PlusWear (and none of their measures were shown in bold).
Measures for both divisions are included in Table 2 for e?-
ciency of exposition.
2
The order of measures for the latter was chosen by ran-
dom draw with the only proviso that adjacent measures should
not come from the same BSC category. Two orders were used
for the NOFORM group to increase the generalizability of
results.
M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540 535
one below-target, and two on-target). Further, for
all participants, one division was above expecta-
tions in its customer related measures and the sec-
ond division was below expectations in the
customer-related measures (note that Table 2
shows four RadWear customer measures better
than target and four PlusWear worse than target;
these items are shown in bold in the table). The
two remaining groups of measures (internal busi-
ness processes and learning and growth) were
approximately at expectations for all participants
(note that Table 2 shows one measure above-tar-
get, one below-target, and three on-target).
Therefore, there was one within-subjects manip-
ulation: the division’s being above (positive per-
formance) or below (negative performance) the
customer-related performance measures targets.
3
As noted above, performance relative to target
was similar across the two divisions for all perfor-
mance measures except for four customer-related
measures (shown in bold in Table 2). We will refer
to these as the DIFFerent measures. In the BSC
format, these four measures were grouped togeth-
er in the second category. Thus, in the BSC for-
mat the two divisions were performing equally on
three of the four dimensions, with RadWear
superior on the other. In contrast, for the
NOFORM group, the 20 measures were not
grouped into categories. Instead the measures
were listed in an alphabetical or random order,
4
neither of which suggests that particular measures
are correlated. These two NOFORM orders
resulted in the DIFFerent measures being in posi-
tions 4, 8, 11, and 12 (out of 20) for the alphabet-
ical listing and in positions 2, 4, 12, and 19 for the
random listing.
3.2.2. Dependent measure
All subjects evaluated each manager using the
evaluation form and scale shown in Table 1. We
expect that there will be a main e?ect for division,
showing that di?erential divisional performance
on the customer-related measures a?ects their
managers’ evaluations. Additionally, we expect an
interaction of organization and division, showing
that the BSCorganization moderates the evaluations
of the two divisional managers relative to eval-
uations without the BSC organization, given that
multiple below-target (for PlusWear) or above-tar-
get (for RadWear) measures are contained in one
BSC category (i.e. customer-related measures).
5
3.2.3. Results
Checks on the e?ectiveness of the manipulations
revealed that participants receiving the BSC for-
mat felt that the performance measures were more
logically organized and usefully categorized than
those receiving the NOFORM performance mea-
sures (both P-values0.10). Within the NOFORM group,
no di?erences were found for subjects with the
Table 3
ANOVA results for experiment one manager evaluations
Variable df SS MS F P
Between Ss
Organization 1 41.25 41.25 0.14 0.71
Order 1 4.10 4.10 0.01 0.91
Organ.ÂOrder 1 0.52 0.52 0.00 0.97
Error 74 22,567.86 304.97
Within Ss
Division 1 13,917.31 13,917.31 97.14 0.00
Div.ÂOrganization 1 817.27 817.27 5.70 0.02
Div.ÂOrder 1 1513.64 1513.64 10.57 0.00
Div.ÂOrgan.ÂOrder 1 344.02 344.02 2.40 0.13
Error 74 10,601.94 143.27
3
While academic research has produced mixed results
regarding the impact of customer satisfaction on pro?tability
(e.g. Foster & Gupta, 2000; Ittner & Larcker, 1998), managers
generally believe that customer satisfaction is a key perfor-
mance driver, especially in the retail sector (Rucci, Kirn, &
Quinn, 1998). In our experiment, participants were told that all
measures chosen for the BSC were drivers of the unit’s success.
4
It should be noted, however, that in either case, after each
?ve measures, a blank line was inserted in the list so that read-
ability and eye fatigue would not di?er for the NOFORM and
BSC formats.
5
Since judgments are strongly a?ected by comparison cases
(Hsee, 1996, 1998), we expect that information organization
will most likely a?ect the comparative or relative judgments
regarding the two managers.
536 M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540
alphabetic versus the randomorder for any of these
questions (all P-values>0.10) or for the manage-
rial evaluations. Also, the order of the presentation
of divisions had no e?ects on responses to the
manipulation check questions (all P-values>0.10).
Although division order was not related to the
hypotheses, it did interact with division in a?ect-
ing performance evaluations (F=10.57, P
We examine judgmental effects of the balanced scorecard’s organization. The balanced scorecard contains a large
number of performance measures divided into four categories. We examine whether the scorecard’s organization
results in managerial performance evaluation judgments consistent with a recognition of the potential relations (i.e.
nonindependence) of measures within a category. Supporting this idea, we find that performance evaluations are
affected by organizing the measures into the balanced scorecard categories when multiple below-target (or above-target)
measures are contained within a category but that evaluations are not affected when the above/below-target measures
are distributed across the scorecard’s four categories.
A note on the judgmental e?ects of the balanced scorecard’s
information organization
Marlys Gascho Lipe
a,
*, Steven Salterio
b
a
University of Oklahoma, Price College of Business, Norman, OK 73019, USA
b
University of Waterloo, School of Accountancy, Waterloo, Ontario, Canada N2L 3G1
Abstract
We examine judgmental e?ects of the balanced scorecard’s organization. The balanced scorecard contains a large
number of performance measures divided into four categories. We examine whether the scorecard’s organization
results in managerial performance evaluation judgments consistent with a recognition of the potential relations (i.e.
nonindependence) of measures within a category. Supporting this idea, we ?nd that performance evaluations are
a?ected by organizing the measures into the balanced scorecard categories when multiple below-target (or above-tar-
get) measures are contained within a category but that evaluations are not a?ected when the above/below-target mea-
sures are distributed across the scorecard’s four categories. # 2002 Elsevier Science Ltd. All rights reserved.
1. Introduction
In the early 1990s, Robert Kaplan and David
Norton (1992) developed a management and
measurement tool called the Balanced Scorecard
(BSC). The BSC lists a diverse set of performance
measures grouped in four categories: ?nancial
performance, customer relations, internal business
processes, and learning and growth activities
(Kaplan & Norton, 1992). Kaplan and Norton
(1996a) encourage the inclusion of 4–7 measures in
each category. Thus, ?rms adopting the BSCusually
increase the number of performance measures they
use and identify a much broader group of mea-
sures than those they have traditionally used.
The stated purpose in developing a managerial
tool that includes a large number and broad group
of performance measures is to improve managerial
decision making. While determining whether the
BSC improves managers’ judgments and decisions
can be di?cult, a reasonable starting point is to
determine whether and how the BSC a?ects these
judgments. Prior judgment and decision making
research provides evidence of human information
processing limitations and decision strategies. We
describe and test how these will a?ect use of the
BSC and resulting judgments.
Research in cognitive psychology shows that
people are generally unable to process more than
7–9 items of information simultaneously (Badde-
ley, 1994; Miller, 1956). The BSC contains many
more measures than this limit, suggesting that man-
agers will ?nd it di?cult to utilize the information in
the scorecard. However, the four category organi-
zation of the BSC may assist managers’ use of this
large volume of measures by suggesting a way to
combine and use the data. Speci?cally, decision
makers may use a ‘divide and conquer’ strategy
(Shanteau, 1988) where measures within each
0361-3682/01/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved.
PI I : S0361- 3682( 01) 00059- 9
Accounting, Organizations and Society 27 (2002) 531–540
www.elsevier.com/locate/aos
* Corresponding author. Tel.: +1-405-325-2293; fax: +1-
405-325-7348.
E-mail addresses: [email protected] (M.G. Lipe), sesalterio@
uwaterloo.ca (S. Salterio).
category are used to make an assessment of the
category and these four assessments are then com-
bined. In assessing each category, decision makers
are primed to see relations among the measures
within each group (Hopkins, 1996). When perfor-
mance on measures within a group is consistent
(e.g. consistently above-target), the decision maker
may perceive that the measures are related (i.e. not
independent) and consequently, reduce the impact
of the individual measures on his or her judgment.
In contrast, when the same measures are presented
without the organizing BSC categories (or are
scattered across BSC categories), the perception of
relations among these measures and the resulting
reduction in decision weights are less likely.
Our results show that when multiple measures
within a BSC category show consistent perfor-
mance (e.g. above-target), managers’ evaluation
judgments are reliably di?erent from evaluations
made using these same measures without the BSC
format. These judgment di?erences disappear
when the measures indicating strong performance
are distributed throughout the four BSC cate-
gories instead of being found in a single BSC
category. Although it is di?cult to state with cer-
tainty that the BSC results in judgment improve-
ments, this study provides evidence that the BSC
has predictable and understandable e?ects on
judgment. While these grouping e?ects may occur
with other types of categorizations, other group-
ings have not received the same kind of attention
as those in the BSC.
The remainder of the paper is organized as fol-
lows. In the next section we will brie?y describe
the BSC, review applicable judgment and decision
making research, and present a two-part research
prediction. Section three describes the experi-
mental work used to test our research predictions
and the ?nal section summarizes the conclusions
that can be drawn from the study.
2. Background
2.1. The balanced scorecard
In a best-selling book Kaplan and Norton
(1996a) describe the methods and procedures
necessary for implementing a BSC. The BSC,
according to Kaplan and Norton, should contain
measures related to ?nancial performance (e.g.
return on assets), customer relations (e.g. custo-
mer satisfaction surveys), internal business pro-
cesses (e.g. process e?ciency measures), and
learning and growth in the organization (e.g.
employee capability measures). Kaplan and Nor-
ton (1993, 1996b) view the scorecard as a strategic
management tool that should explicate the drivers
of performance, as well as provide measures of
performance. This study focuses on the score-
card’s use in evaluation and decision making.
2.2. Cognitive limitations and the divide and
conquer decision strategy
The balanced scorecard with its large number of
performance measures presents a complex task to
a manager asked to use the scorecard to evaluate a
division’s performance. The manager could, theo-
retically, weight and combine the many measures
into an overall evaluation of the business unit but
this is, cognitively, a very di?cult thing to do.
Research in cognitive psychology has repeatedly
shown that humans are able to retain and use only
a small number of items in working memory
(Baddeley, 1994; Miller, 1956). With this limit on
working memory, holding 20 or more individual
measures in one’s head and mentally manipulating
them simultaneously is extremely di?cult, if not
impossible. Thus, the volume of data in a balanced
scorecard suggests that it may overload human
decision makers with information.
The balanced scorecard’s four categories suggest
a way for managers to mentally organize the large
number of performance measures that may miti-
gate this cognitive di?culty. Prior studies show
that information processing and judgments are
a?ected by information organization (Bettman &
Kakkar, 1977; Payne, Bettman, & Johnson, 1993)
and by the hierarchies or relations among infor-
mation items contained in a decision task (Klein-
muntz & Schkade, 1993). For example, Hopkins
(1996) showed that placing an item (e.g. preferred
stock) in a particular category (e.g. liabilities)
caused experienced professionals to perceive that
the item was related to others in the category.
532 M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540
These studies suggest that when data items are
grouped in ways meaningful to the decision
maker, they may be combined prior to further use
(Chase & Simon, 1973). Shanteau (1988) describes
this method of using information as ‘divide and
conquer.’ The information is divided into groups,
an assessment can be made of each group, and
these assessments can then be combined. The
organization of the BSC lends itself quite naturally
to this kind of mental approach.
2.3. Perceived relations among measures
When using the BSC, the initial stage of the
divide and conquer decision strategy is to use
measures within a category to assess performance
in that area (e.g. ?nancial performance). Since the
measures have been grouped together, the decision
maker will be expecting and seeking relations
between them (Hopkins, 1996; Maines & McDan-
iel, 2000). If performance on these measures
con?rms this expectation (e.g. by indicating
consistently good performance), the decision
maker may reasonably reduce the decision weight
placed on each individual measure due to per-
ceived correlations (nonindependence) of the
measures (Banker & Datar, 1989; Feltham & Xie,
1994). In contrast, if measures indicating good
performance are scattered across BSC categories
(or contained in uncategorized lists of measures),
the decision maker is less likely to expect and per-
ceive these measures to be correlated and to make
consequent reductions to their decision weights.
This is consistent with ?ndings in psychology that
people ?nd it di?cult to recognize that correla-
tions exist unless they have theories suggesting
such relations (Jennings et al., 1982) and with
Maines’ (1990) ?ndings in accounting that judg-
mental discounting for information redundancy
(i.e. correlation) does not occur unless the judge is
alerted to the presence of such relations (see,
especially, her experiment three).
This suggests that judgments made with the
BSC will di?er from those made with uncategor-
ized lists of measures in particular situations:
those cases where performance on measures within
a category are consistent (i.e. consistently above-
target or consistently below-target). Additionally,
the above discussion suggests that judgments made
with the BSC will not di?er from those made with
uncategorized lists of measures in situations where
multiple above-target (or below-target) measures
are scattered across the BSC categories.
Although performance results on the twenty or
more performance measures in a BSC may take on
any number of patterns, we will test the impact of
only the two extreme patterns described above.
That is, we will consider a situation where multiple
above-target (or below-target) measures are con-
tained within one BSC category and then we will
contrast that with the situation where the above-
target (below-target) measures are distributed
across categories. For these two situations, we will
compare the judgments for decision makers with
the BSC to those of decision makers using the
same measures without the BSC categories. Our
research predictions are:
Evaluations using the balanced scorecard will
di?er from evaluations based on the same mea-
sures without the scorecard organization,
depending on the pattern of performance across
categories. Speci?cally:
1. judgments are likely to be moderated when
multiple above-target (or below-target)
measures are contained in a single BSC
category but,
2. judgments are unlikely to be a?ected when
multiple above-target (or below-target)
measures are distributed throughout the
BSC categories.
The next section describes the experiments and the
test results.
3. Method and results
3.1. Overview of experiments
Participants are presented with a case where
they are asked to take the role of a senior executive
of WCS Incorporated, a ?rm specializing in retail-
ing women’s apparel. WCS has multiple divisions,
the two largest of which are the focus of the case
M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540 533
materials. The case introduces the managers of the
two business units and the strategies of the units
are described. Multiple performance measures are
presented in patterns and formats depending on
the experimental treatment as described below.
The participant is then asked to evaluate the per-
formance of each of the two unit managers on a
scale with seven descriptive labels and numerical
endpoints of ‘‘0’’ and ‘‘100’’ (see Table 1 for a
sample evaluation form).
After providing the manager evaluations, the
participants complete a questionnaire. This
questionnaire asks for demographic information,
provides manipulation checks (discussed further
in Sections 3.2.3 and 3.3.2), and gathers data
regarding task di?culty, realism, and under-
standability.
In both experiments the two divisions described
are RadWear and PlusWear, retail divisions spe-
cializing in clothing for the urban teenager and in
large-sized clothing, respectively. The participants
are informed that management believes the per-
formance measures for each division are appro-
priate for retailers and capture the two di?erent
strategies.
3.2. Experiment one
In experiment one we focus on whether the BSC
format makes a di?erence in divisional manager
performance evaluation when particularly good or
bad performance is contained in one BSC cate-
gory. In this situation, we predict that the BSC
categorization primes the evaluator to perceive
consistent performance as evidence of correlation
among measures, which may reduce the impact of
the individual good or bad measures. This per-
ceived correlation will moderate judgments rela-
tive to those made without the BSC organization.
3.2.1. Subjects, design, and procedures
Seventy-eight MBA students served as experi-
mental participants. The students had, on average,
4 years of work experience and 62% were male.
All participants received a diverse set of per-
formance measures, a description of how the
Table 1
Sample evaluation form employed in both experiments
WCS Inc.
Initial Evaluation Form
Year: 1996
Manager: Chris Peters
Division: RadWear
Evaluator:
1. Indicate your initial performance evaluation for this manager by placing an ‘X’ somewhere on the scale below. Note that some label
interpretations are provided below.
Excellent: far beyond expectations, manager excels
Very good: considerably above expectations
Good: somewhat above expectations
Average: meets expectations
Poor: somewhat below expectations, needs some improvement
Very Poor: considerably below expectations, needs considerable improvement
Reassign: su?cient improvement unlikely
534 M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540
measures were calculated, and the comparison of
each measure to its expectation or target for each
of the two divisions (see Table 2 for the BSC ver-
sion of the task).
1
Further, all participants were
told that the performance measures were ‘‘care-
fully chosen to represent important aspects of a
business unit[’s performance]’’ and were ‘‘drivers
of the unit’s success and linked to its strategy and
mission.’’
The between-subjects (Ss) manipulation was the
organization of the performance measures. The
BSC group received the 20 measures divided into
the four BSC categories (?nancial measures, cus-
tomer satisfaction measures, operational mea-
sures, and learning measures) while other
participants received the same set of 20 measures
without the BSC format (NOFORM group). For
the NOFORM group the measures were presented
in one of two orders, alphabetical or random.
2
In
addition to the format manipulation across sub-
ject groups, the order of presentation of the two
divisions (i.e. RadWear and PlusWear) was coun-
terbalanced across subjects within each format
group.
For all participants, the ?nancial measures indi-
cated that performance was somewhat above
expectations for both divisions (note in Table 2
that two ?nancial measures were above-targets,
Table 2
RadWear balanced scorecard
a
(PlusWear items in parentheses)
Measure Target Actual
Financial
1. Return on sales 24% (22) 25% (23)
2. Sales growth 35% (30) 38% (33)
3. New store sales (new lines sales) 30% (25) 26% (22)
4. Market share relative to retail space $80 (70) $80 (70)
5. Return on expenses 42% (36) 42% (36)
Customer-related
1. Repeat sales 30% (40) 33% (36)
2. Customer satisfaction rating 95 (97) 96 (96)
3. Mystery shopper program rating 96 (96) 98 (94)
4. Returns by customers as % of sales 10% (7) 9% (8)
5. Out of stock items 10% (14) 10% (14)
Internal business processes
1. Average major brand names/store (average % of product range) 32 (88%) 34 (90%)
2. Sales from new market leaders (sales from top brand names) 25% (28) 22% (25)
3. Returns to suppliers 5% (3) 5% (3)
4. Average markdowns 15% (12) 15% (12)
5. Voided sales transactions 3 (2) 3 (2)
Learning and growth
1. Hours of employee training/employee 10 (8) 11 (9)
2. Average tenure of sales personnel 1.4 (2.1) 1.2(1.9)
3. Employee suggestions/employee 2 (2) 2 (2)
4. Sales personnel taking manager test 30% (36) 30% (36)
5. Stores computerizing 85% (85) 85% (85)
a
DIFFerent measures are indicated here in bold.
1
Participants received separate exhibits for RadWear and
PlusWear (and none of their measures were shown in bold).
Measures for both divisions are included in Table 2 for e?-
ciency of exposition.
2
The order of measures for the latter was chosen by ran-
dom draw with the only proviso that adjacent measures should
not come from the same BSC category. Two orders were used
for the NOFORM group to increase the generalizability of
results.
M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540 535
one below-target, and two on-target). Further, for
all participants, one division was above expecta-
tions in its customer related measures and the sec-
ond division was below expectations in the
customer-related measures (note that Table 2
shows four RadWear customer measures better
than target and four PlusWear worse than target;
these items are shown in bold in the table). The
two remaining groups of measures (internal busi-
ness processes and learning and growth) were
approximately at expectations for all participants
(note that Table 2 shows one measure above-tar-
get, one below-target, and three on-target).
Therefore, there was one within-subjects manip-
ulation: the division’s being above (positive per-
formance) or below (negative performance) the
customer-related performance measures targets.
3
As noted above, performance relative to target
was similar across the two divisions for all perfor-
mance measures except for four customer-related
measures (shown in bold in Table 2). We will refer
to these as the DIFFerent measures. In the BSC
format, these four measures were grouped togeth-
er in the second category. Thus, in the BSC for-
mat the two divisions were performing equally on
three of the four dimensions, with RadWear
superior on the other. In contrast, for the
NOFORM group, the 20 measures were not
grouped into categories. Instead the measures
were listed in an alphabetical or random order,
4
neither of which suggests that particular measures
are correlated. These two NOFORM orders
resulted in the DIFFerent measures being in posi-
tions 4, 8, 11, and 12 (out of 20) for the alphabet-
ical listing and in positions 2, 4, 12, and 19 for the
random listing.
3.2.2. Dependent measure
All subjects evaluated each manager using the
evaluation form and scale shown in Table 1. We
expect that there will be a main e?ect for division,
showing that di?erential divisional performance
on the customer-related measures a?ects their
managers’ evaluations. Additionally, we expect an
interaction of organization and division, showing
that the BSCorganization moderates the evaluations
of the two divisional managers relative to eval-
uations without the BSC organization, given that
multiple below-target (for PlusWear) or above-tar-
get (for RadWear) measures are contained in one
BSC category (i.e. customer-related measures).
5
3.2.3. Results
Checks on the e?ectiveness of the manipulations
revealed that participants receiving the BSC for-
mat felt that the performance measures were more
logically organized and usefully categorized than
those receiving the NOFORM performance mea-
sures (both P-values0.10). Within the NOFORM group,
no di?erences were found for subjects with the
Table 3
ANOVA results for experiment one manager evaluations
Variable df SS MS F P
Between Ss
Organization 1 41.25 41.25 0.14 0.71
Order 1 4.10 4.10 0.01 0.91
Organ.ÂOrder 1 0.52 0.52 0.00 0.97
Error 74 22,567.86 304.97
Within Ss
Division 1 13,917.31 13,917.31 97.14 0.00
Div.ÂOrganization 1 817.27 817.27 5.70 0.02
Div.ÂOrder 1 1513.64 1513.64 10.57 0.00
Div.ÂOrgan.ÂOrder 1 344.02 344.02 2.40 0.13
Error 74 10,601.94 143.27
3
While academic research has produced mixed results
regarding the impact of customer satisfaction on pro?tability
(e.g. Foster & Gupta, 2000; Ittner & Larcker, 1998), managers
generally believe that customer satisfaction is a key perfor-
mance driver, especially in the retail sector (Rucci, Kirn, &
Quinn, 1998). In our experiment, participants were told that all
measures chosen for the BSC were drivers of the unit’s success.
4
It should be noted, however, that in either case, after each
?ve measures, a blank line was inserted in the list so that read-
ability and eye fatigue would not di?er for the NOFORM and
BSC formats.
5
Since judgments are strongly a?ected by comparison cases
(Hsee, 1996, 1998), we expect that information organization
will most likely a?ect the comparative or relative judgments
regarding the two managers.
536 M.G. Lipe, S. Salterio / Accounting, Organizations and Society 27 (2002) 531–540
alphabetic versus the randomorder for any of these
questions (all P-values>0.10) or for the manage-
rial evaluations. Also, the order of the presentation
of divisions had no e?ects on responses to the
manipulation check questions (all P-values>0.10).
Although division order was not related to the
hypotheses, it did interact with division in a?ect-
ing performance evaluations (F=10.57, P