A COMPARISON OF COMPENSATORY AND NON COMPENSATORY MODELS OF JUDGMENT

jasminepvk · Feb 8, 2016

Description
We employ a bankruptcy prediction task to compare the validity of linear compensatory models of
judgment to a nonlinear, noncompensatory model of judgment (hierarchical partitioning) given two
levels each of task predictability and degrees of freedom. We compare these modeling methods on three
measures of validity

Pergamon Accounting, Organizations and Sockty, Vol. 21, No. 1, pp. 3-22, 1996
Copy-right 0 1995 Elsevier Science Ltd
Printed in Great BrimIn. AU rights reserved.
0361~3682/9f, $15.00+0.00
03613682(95)000174
A COMPARISON OF COMPENSATORY AND NONCOMPENSATORY MODELS OF
JUDGMENT: EFFECTS OF TASK PREDICTABILITY AND DEGREES OF FREEDOM*
LINDA G. SCHNEIDER
UnivetWy of Mhnesota
and
THOMAS I. SELLING
Amertcan Graduate School of I nternational Management
Abstract
We employ a bankruptcy prediction task to compare the validity of linear compensatory models of
judgment to a nonlinear, noncompensatory model of judgment (hierarchical partitioning) given two
levels each of task predictability and degrees of freedom. We compare these modeling methods on three
measures of validity (predictive, diagnostic, and structural). To aid model comparison, information
acquisition protocols are used to develop benchmark measures of cue importance. We find that
hierarchical partitioning methods perform significantly better than the linear method with respect to
predictive validity and structural validity when task predictability is higher, and particularly when
degrees of freedom are low. Measures of diagnostic validity are consistently higher for the hierarchical
model, but no pairwise comparison is found to be statistically significant at acceptable levels.
Decision-making aids can be used to increase
organizational efficiency with the objectives of
helping managers make better decisions more
quickly, and/or providing a vehicle for training
novice managers. In this vein, Dawes & Cor-
rigan (1974) identify two primary uses for
such models: normative and descriptive. The
former aids the decision maker “in reaching a
good decision” (p. %), and is arguably more
important to the management scientist who is
interested ln improving the practice of manage-
ment. The latter focuses on representing the
decision-maker’s behavior, and may be partlcu-
larly important for training purposes. More-
over, correct modeling of decision makers is a
precondition for valid normative recommenda-
tions when an optimal decision-making algo-
rithm is not known. The process of
constructing an effective expert-based deci-
sion aid is driven by valid insights into the
judgment process of experts-both model
structure and estimates of cue importance.
Several approaches are available for repre-
senting the judgment processes of experts.
For example, standard multivariate statistical
methods will provide a linear compensatory
representation of the judgment process. Man-
agement of a financial institution may apply
this technique to begin to understand how a
particularly expert financial analyst makes pre-
dictions of financial solvency. The resulting
model may be used for training new employ-
* The authors thank hnran S. Currim of the University of California, Irvine, Eric J ohnson of the University of Pennsylvania,
Wiiam Messier of the University of Florida, Scott Neslin of Dartmouth College, and two anonymous reviewers for
comments on an earlier draft.
3
4 L. G. SCHNEIDER and T. I. SELLING
ees, or for developing a decision aid to be used
by all analysts. However, while the model may
result in satisfactory predictive parameters over
historical cases, it could fail to capture comple-
tely the lending officer’s judgment process. For
example, the financial analyst may recognize
that beyond a certain level, return on total
assets will no longer compensate for poor
cash flow to total debt and/or cash to total
assets. Research suggests that linear compensa-
tory models tend to mask such configuralities in
the judgment process when judgment cues are
monotonically related to the outcome @awes
& Corrigan, 1974); this is the case with the cues
in our example and financial data in general. An
incorrect description of the process by which
an acknowledged expert predicts financial dis-
tress could result in poor training of novice
analysts, as well as inefficient decisions.
Alternatives to standard multivariate statisti-
cal methods, including expert systems, are avail-
able for developing a set of rules leading to a
particular judgment outcome. Expert systems
focus on the codification of the process used
by an expert in arriving at a particular judg-
ment. These models can be highly idiosyncratic
and configural, thereby avoiding the potential
misspecitication inherent in statistically-based
compensatory models that can occur in some
situations. Expert systems are not without their
limitations, however, including potential errors
in the communication process between model
builder and expert (Barr & Feigenbaum, 1982;
Biggsetal., 1987;MeservyetaZ., 1986),andthe
time and effort required to acquire knowledge
(Michalski & Chilausky, 1980).
An interesting compromise between stan-
dard multivariate statistical approaches to mod-
eling judgment and expert systems is provided
by modeling methods based on hierarchical
partitioning. The Concept Learning System
(Cut-rim et al ., 1988; Hunt et al ., 1966) is an
example of a modeling system based on hier-
archical partitioning. In contrast to expert sys-
tems, in which an attempt is made to codify a
human judge’s rules explicitly and through
direct communication with the judge, the Con-
cept Learning System sets out to infer the jud-
ge’s rules from the past behavioral record. This
provides important benefits in terms of model
development, as the models are free from error
and/or bias due to failure in the communica-
tion process.
Which modeling approach an analyst should
recommend (compensatory versus noncom-
pensatory) in a given situation depends on
the underlying structure of the judgment pro-
cess. In practice, it may be difficult, if not
impossible, to determine a priori which form
of the underlying judgment process is the more
valid one. The difficulty arises because of the
lack of an appropriate benchmark of the under-
lying process. Indeed, if a satisfactory bench-
mark were available, then the model structure
choice problem would be solved!
A key goal in this research is to compare the
relative performance of compensatory and non-
compensatory models of judgment. We strive
in this research to add to the knowledge
obtained from simulation studies in several
ways, all of which serve to increase external
validity. While past research (Currim et al .,
1985) has employed a simulation methodology
to compare models, there is little assurance
that manipulation of factors affecting relative
model performance (e.g. judgment consis-
tency) are either (a) simulated realistically, or
(b) occur within realistic ranges.
In this research we address these two limita-
tions in the following ways. First, we develop
stimuli from real data (actual financial ratios of
existing or bankrupt firms). Second, past
research has suggested that judgment model-
ing may be affected by variations in task pre-
dictability when estimating model parameters.
Task predictability is one means of introducing
error into the relationship between the criter-
ion (or dependent variable) and the judgment
cues. Past research based on simulated data
introduced error into the stimuli in a some-
what restricted and artificial manner (often by
changing the value of the dependent variable).
We introduce error in a much more realistic
way. Specifically, in the low task predictability
scenario, the financial ratios used to predict
firm status are one year removed from the cur-
rent status. Third, we model the judgments
made by actual people. Fourth, past research
MODELS OF JUDGMENT 5
has also pointed to the importance of degrees
of freedom in obtaining valid models of judg-
ment (Currim et al., 1988). Currim et al.
(1988) using a simulation methodology, found
that hierarchical partitioning models can, to
some extent, compensate for low degrees of
freedom. We replicate and extend previous
research by varying the number of observa-
tions used in developing judgment models.
The background section to follow provides
our motivation for choosing these two factors
(task predictability and degrees of freedom) for
investigation, and describes other design
choices: use of information acquisition proto-
cols as a benchmark for model comparison, the
linear and hierarchical modeling methods used
in the study, and our bases for comparing mod-
els. The remaining sections describe hypoth-
eses, method, results of hypothesis tests, and
conclusions.
BACKGROUND
Models of human judgment processes
The judgment model that has received the
most attention to date is the linear compensa-
tory model. Its popularity may be attributed to
the model’s intuitive appeal and the ease with
which it can be estimated. Standard regression
methods can be used for an intervally-scaled
dependent variable, and discriminant or logis-
tic regression modeling methods can be used
for nominally-scaled dependent variables.
Evidence for the configural, or interdepen-
dent, use of cues has been sought by adding
interaction terms to various regression-based
methods. However, if the independent vari-
ables are monotonically related to the depen-
dent variable, then nonlinear use of cues will
be difficult to detect @awes & Corrigan, 1974).
Hoffman et al. (1968) tind the contribution to
total variance explained by interactions to be
disappointingly low and that certain types of
configui%lities can be well approximated by
main effects. They are nonetheless reluctant
to reject the hypothesis that humans use cues
conflguraIly.
Compensatory models have come increas-
ingly into question as a model of choice or
judgment (Bettman, 1979; Newell & Simon,
1972). Use of nonlinear heuristics has been
observed by Biggs & Mock (1983) Brown &
Solomon (1990) Larker & Lessig (1983) and
Selling & Shank (1989) among others.
Further, the use of nonlinear heuristics may
increase with the complexity of a task (Bill-
ings & Marcus, 1983; Payne, 1976; Shields,
1980) time or other external pressures
imposed on the decision maker (Biggs, 1978)
and cue consistency (Brehmer, 1972; Slavic,
1966). Payne (1982) reports other conditions
which may lead to contingent decision beha-
vior such as the number of alternatives to eval-
uate, information display, format, and framing,
to name a few. Brown St Solomon (1990) sug-
gest one reason previous studies have been
unsuccessful in uncovering configural cue
usage is that the stimuli developed for the stu-
dies did not include examples in which cues
would be used configurally. These authors
included such examples in their stimulus set
and found strong evidence for the configural
use of cues.
Task predictability
Another factor that determines whether cues
will be used configurally and that affects a
researcher’s ability to detect the contigural
use of cues is task predictability. Task predict-
ability refers to the degree to which judgment
cues are related to an outcome. In an analysis of
clinical judgment studies that manipulated task
predictability, Brehmer (1976) concluded that
an understanding of inference processes can-
not be attained without consideration of the
task system. Lewis et al. (1988) identified task
predictability as an important factor that affects
the usefulness of judgment models; however its
effect on hierarchical judgment models per se
has received little attention to date.
Not surprisingly, response reliability (judg-
ment accuracy) is consistently found to be posi-
tively related to task predictability in
bankruptcy prediction tasks (e.g. Casey & Sell-
ing, 1986) the context of the current study, as
well as in other task domains (Brehmer, 1978;
Simnett & Trotman, 1989). However, the rela-
6 L. G. SCHNEIDER and T. I. SELLING
tionship between task predictability and other
dimensions of model validity (predictive, diag-
nostic, and structural) may be more subtle.
For example, as the relationship between
cues and outcomes weakens, judgment func-
tions become more linear (Dawes & Corrigan,
1974), but task predictability affects judgment
processes in other ways as well that may differ-
entially affect the process of deriving compen-
satory and/or noncompensatory models. Task
predictability has been shown to affect the
way in which judgment rules are developed
over the course of a task requiring multiple
judgments. That is, as task predictability
declines, judgment rules become inconsistent
and predictable only to a limited degree (Breh-
mer, 1972, 1973a, b; Naylor & Clark, 1968; Uhl,
1966).
Uncovering the nonlinear use of
judgment cues
Many of the previously cited works have
uncovered the configural use of cues via proto-
col analysis. There are three popular forms of
protocol data: concurrent verbal, retrospective
verbal, and process tracing. The collection of
concurrent verbal protocols requires the sub
ject to “think aloud” as a judgment is being
made. Later, a judgment model can be derived
from these protocols.
Although concurrent verbal protocols can
provide a rich source of data and valuable
insights into how a judgment was formed,
this approach is not without limitations. Verbal
protocols have been criticized on the bases of
(1) the effect of verbalization, i.e. the act of
verbalization changes cognitive processes, (2)
incompleteness, i.e. not all cognitive processes
are verbalized, and (3) epiphenomenality, i.e.
the reported process is independent of the cog-
nitive process (Klersey & Mock, 1989).
Retrospective verbal (or written) protocols
have also been used in understanding judg-
ment. Retrospective protocols are generally
considered less reliable than concurrent verbal
protocols with subjects potentially construct-
ing answers to probes about previous cogni-
tive processes (Klersey & Mock, 1989).
A common characteristic of process tracing
protocols is the tracking of information that is
used in arriving at a judgment. This can be
accomplished by a researcher’s observing
which pieces of information are obtained,
how long certain information is considered,
or by tracking eye movements. Often, such
process tracing procedures are referred to as
information acquisition protocols. Information
acquisition protocols have been found to pro-
vide a good measure of the human judgment
process (Russo, 1978). The order in which
information is acquired provides information
on cue importance, and the consistency in
the acquisition order provides insight into the
structure (linear versus configural) of a sub
ject’s model.
The three protocol methodologies described
above provide valuable insight into a subject’s
judgment process. One limitation of these
methods is the difficulty in summarizing the
insights gained from their use and ultimately
developing a more general model of a particu-
lar judgment problem that could then be used
as a training tool. This limitation arises, in part,
because of the way in which data are collected.
Concurrent verbal protocols must be collected
one subject at a time. The actual judgment
model used must then be inferred from the
protocol record. Process tracing methods
require somewhat specialized computer pro-
grams designed to trace the type of informa-
tion sought by a subject for somewhat
artificially presented and described stimuli.
For example, in the present study, subjects
select financial ratios of interest from a menu
of available ratios, similar to the information
display board format used by Payne (1976).
Once the information acquisition trace has
been collected, assumptions must be made
regarding how the information trace should
be translated into an actual judgment model.
Statistically-based approaches to modeling
nonlinear judgment processes
Statistically-based models, at tirst blush, offer
benefits over the protocol methods described
above because they may be applied to histori-
cal data, and can handle a large number of
examples. In contrast, the use of concurrent
MODELS OF JUDGMENT
7
verbal protocols often limits a study to very few
subjects whose judgments may be limited to a
single example. The information acquisition
protocol studies on which the present study
is based require judgments on thirty firms.
Thus, nonlinear, statistically-based models
would seem to hold great promise for the
development of training tools or expert sys-
tems. Several noncompensatory modeling alter-
natives have been proposed in the business and
social science literatures. Two important exam-
ples are McFadden’s nested logit model (1978)
and the Elimination-by-Aspects model (Tversky
& Sattath, 1979). Widespread adoption has
eluded these models, however, because of soft-
ware limitations, restrictive assumptions, or
complexities in the modeling process. Use of
the nested logit model, for example, requires
an a priori specification of the judgment model.
The concept learning system
In contrast to the models mentioned above,
the Concept Learning System (CIS) (Cm-rim et
al., 1988; Hunt et al., 1966) is an example of an
easily implemented, exploratory modeling
methodology based on an induction or hier-
archical partitioning algorithm. The methodol-
ogy has been applied in an accounting context
(Messier & Hansen, 1988) as welI as in other
areas (Currim & Schneider, 1991; Schneider &
Currim, 1991). In the next few paragraphs we
will provide an overview of CLS, including a
discussion of data requirements and diagnos-
tics available from the system.
CLS starts with a binary dependent variable
representing group membership, choice, or
judgment. For example, a Iending institution
might be interested in modeling the likelihood
that a firm wilI remain solvent. One approach
might be to obtain financial and other data on
various firms in existence on a particular date,
and then determine the status of the firm. The
dependent variable might be coded as ” 1” if
the firm is solvent, and coded as “0” if the
firm has been dissolved or is in bankruptcy.
The goal of CLS is to uncover a set of condi-
tions, or rules, that would correctly classify the
firm as solvent versus bankrupt.
The conditions uncovered by CL!3 are based
on binary independent variables, x4, where the
index i points to the specific variable, and the
indexj indicates theJ levels of the variable (J =
2 for binary variables). In the case of solvency/
bankruptcy prediction, such variables might
include financial, as well as environmental and
industry, information. Continuous independent
variables such as the current ratio of a firm do
not pose a problem for CLS in that such varl-
ables can be divided into a series of binary vari-
able “cut-offs” in which the new variable is
coded as “ 1” if the original independent vari-
able is less than or equal to a certain cutoff, and
“0” otherwise. For example, the researcher
might specify a current ratio of “2” as one
cut-off. Specifically, when J = 1, the current
ratio is less than or equal to two, whereas
when j = 2 the current ratio is greater than
two.’ Cut-offs can be determined by theory, if
available, or by a frequency analysis (Currim &
Schneider, 1991). In panels 1 and 2 of Appendix
A we present a partial record of data for subject
#6, and its preparation for input into CLS.
Using these input data, CLS uncovers condi-
tions useful in predicting whether a firm will
survive or not via the four step algorithm
described below and illustrated in panel 3 of
Appendix A.
Step 1: Given iV independent variables, x+
where i(i = 1,2, . . ,N) indexes the number
of independent variables andj(j = 1,2, . . . ,
j) indexes the number of levels represented
by variable xf,, calculate the criterion value,
C(X+), for each level of each independent
variable, as follows:
where p(solvent ljJ+ is the probability (esti-
mated from the sample proportion) that a
’ In the remainder of this work we will refer to a variable, binary variable and binary cut-off of a continuous variable
interchangeably.
8 1. G. SCHNEIDER and T. I. SELLING
given firm will be classified as solvent,
given level J of variable x~,. Similarly,
p(bankrupt l,f,~+) is the probability (esti-
mated from the sample proportion) that a
given firm will be classified as bankrupt,
given level j of variable ~~.,-ri, is the num-
ber of observations that exhibit level J of
variable x~,. See panel 3 of Appendix A for
sample calculations of C(~J+,).
Step 2: Identify the independent variable, x+
that has the smallest criterion value, and
select this variable to partition the data set
into two groups, according to the level, j, of
x+ If two or more independent variables
have the same (smallest) criterion value,
then select one at random, and use it for
partitioning the data. For example, if the
independent variable with the smallest crher-
ion value is “current ratio G 2,” then the
original sample would be divided into two
groups: group 1 would contain firms with a
current ratio less than or equal to two; and
group 2 would contain firms with a current
ratio greater than two.
Step 3: Within each subsample, recakulate
the criterion value for all remaining cut-offs
of the independent variables, and repeat step
tW0.
Step 4: Continue this process until all obser-
vations in the data set are accurately classi-
fied, or until an appropriate stopping
criterion is reached (e.g. there is no longer
a significant increase in the proportion of
explained judgments).
The outcome of the above process is a model
which appears as a branching tree. The
“branches” of the tree identify conditions
that lead to a positive outcome (a prediction
of solvency in this example). The actual CLS
model for subject 6 is presented in panel 4
of Appendix A.
There are several useful diagnostics and tools
available in the CLS system, such as the ability
to perform simulations, determining variable
importance and model structure, and perform-
ing a split-half validation. For purposes of the
present research we are most interested in
split-half validation and determining variable
importance. These measures are useful in asses-
sing the bases of model evaluation suggested by
Currim et al . (1988) and will be discussed in
later sections.
Predfctfve accuracy. The CL!? system is an
iterative technique that will continue to include
variables in the model until all observations
have been correctly classified or until some
other user-specified stopping criterion is satis-
fied. As a result, some variables may be included
simply by chance, especially later in the parti-
tioning process. In order to develop an unbiased
measure of predictive accuracy, the CLS system
performs a split sample validation in which the
model is estimated on half of the sample (the
estimation sample). Then the model is used to
predict the classification of observations in the
other half of the sample (the validation sample).
Low levels of predictive accuracy for the valida-
tion sample would indicate some variables
entered the model by chance.
Varfabl e i mportance. Identifying the most
Important variable is less straightforward in
the CLS modeling system than in a more tradi-
tional logit or discriminant analysis approach.
CLS does not calculate a set of coefficients (as
in standard statistical approaches) that provide
information on the magnitude of the relation-
ship between the independent variables and
the dependent variable. However, the number
of branches in which an independent variable
appears captures an important indicator of vari-
able importance, namely, its position. Variables
positioned near the top of the tree are, by defi-
nition, the most useful in classifying observa-
tions.
However, using a simple count of the num-
ber of branches in which a variable appears as
a measure of importance is not without limita-
tions. Because different models can have differ-
ent patterns of branching (and different
numbers of branches), this simple measure
can be difficult to interpret. The number of
MODELS OF JUDGMENT 9
can be difficult to interpret. The number of
branches a variable can appear in will be
greater for models with a large number of
branches as compared to models with a small
number of branches. The CLS system solves the
difficulty by normalizing the measure of impor-
tance so that the sum of all variable impor-
tances for a given model equals unity.
Formally, the index of relative importance
@IMP& is
where fr,,
= the frequency with which xG
appears in a CLS model, t(i = 1,2, . . . ,m;m
S Nj, and j indexes the variable levels (I =
1,2, . _ _ , J>. Caiculation of the RIMP, measure
is illustrated in panel 5 of Appendix A. The
IUMP, measure is akin to a measure of variable
importance derived from protocol data recom-
mended by Bettman (1974).
Currim et al. (1985) investigated the proper-
ties of the CLS system in extensive simulations.
Using data generated from specific compensa-
tory and noncompensatory processes, they var-
ied the number of observations as welI as the
amount of error present in the data generating
process. With simulated data, the researchers
found that in situations with low degrees of
freedom (few data points), the CLS system out-
performs standard logit analysis. However, as
degrees of freedom become more liberal, CLS
and logit converge. The researchers also found
that, regardless of degrees of freedom, models
appeared increasingly compensatory as the
amount of error in the data generating process
increased.
Assessing validity-a useful benchmark
model
Our goal in this research is to test the validity
of the compensatory and noncompensatory
models of judgment. Previous research has
attempted to uncover nonlinearities in sub
jects’ judgment models by searching for inter-
action terms in the judgment models. The
underlying judgment model has been assumed
to be linear when interaction terms were not
significant. The validity of this approach is lim-
ited, however, by the lack of a representation
of the “true” model, However, process tracing
methodologies provide evidence for the nature
of the judgment process (Russo, 1978). In this
research, we use information acquisition proto-
cols to develop a benchmark representation of
the judgment process used by a judge. This
benchmark will be used for evaluating compen-
satory and noncompensatory models of judg-
ment.
Bases for model evaluation
In assessing the validity of compensatory and
noncompensatory models of judgment, we use
the three criteria proposed by Currim et al.
(1988). A model that has high predictive valid-
ity is one that classifies judgments of being
bankrupt or not with a high degree of accu-
racy. A model with high diagnostic validity is
one that provides accurate information on
which information cues are most useful in
arriving at an accurate prediction of solvency.
Finally, a model with high structural validity is
one that accurately indicates whether judg-
ments underlying the derived models are con-
figural or not.
All three types of validity are important and
desirable in developing a model. However, the
use to which the model is put determines the
relative importance of these three types of
validity. For example, if the purpose of the
model is to predict bankruptcy, then cer-
tainly, predictive validity is of primary impor-
tance. However, if the model is to be used as a
training device, then diagnostic and structural
validity may take precedence. There is no a
priori reason, however, to expect a perfect
correlation among the three measures of valid-
ity. For example, the predictive validity of lin-
ear models may be high in spite of a nonlinear,
underlying structural model (implying low
structural validity) in situations where the
underlying judgment cues are monitonically
related to the criterion variable (Dawes & Cor-
rigan, 1974). Similarly, if there is high collinear-
ity in the judgment cues, predictive validity
10 L. G. SCHNEIDER and T. I. SELLING
may not be undermined, although diagnostic
validity is likely to be low due to instability
and difficulty in separating effects in the judg-
ment cues. In the next section, we present
hypotheses which detail our expectations
regarding the relative performance of compen-
satory and noncompensatory models of judg-
ment under two levels of task predictability.
HYPOTHESES
Low levels of task predictability suggest the
relationship between judgment cues and out-
comes is weak. As a result, judges may have
difficulty interpreting information cues which
can lead to inconsistent judgments (Brehmer
1972, 1973a, b; Naylor & Clark, 1968; Uhl,
1966). The Currim et al . (1985) simulation
study in which such an inconsistency was
introduced, resulted in better performance of
the linear model and reduced performance of
the hierarchical model. A central theme of our
three hypotheses is that the relative perfor-
mance of linear and hierarchical models is
altered by task predictability and degrees of
freedom.
Predi cti ve val i di ty
HI: Predictive validity declines when task predictability
declines and when a hierarchical partitioning method is
used to model judgments. Predictive validity of the lin-
ear model relative to the hierarchical model improves
when task predictability declines.
Since accuracy in clinical judgments is Past research has shown through simulation
affected by ambiguity in judgment cues, we that a hierarchical partitioning approach to
might at first expect the overall predictive judgment modeling is successful in uncovering
validity of both modeling approaches to the underlying structure of a judgment model
decline when task predictability declines. Part even when there is a relatively large amount of
of the reason for a decline in judgment accu- error (Currim et al ., 1985). A large amount of
racy when there is low predictability in the error, with its tendency to drive judgment mod-
judgment cues is that judges use cues inconsis els toward a compensatory structure through
tently. However, if, as is suggested in the litera- inconsistent use of cues and lower response
ture, judges appear to be more linear in their reliability, will tend to mask the benefits of
processing under conditions of low task pre- using a hierarchical partitioning approach to
dictability, we might expect predictive validity judgment modeling.
actually to increase when task predictability
declines for the linear modeling methods,
since there is a consistency between the model
structure and the structure of the underlying
decision processes.
Dtagnostfc val i di ty
H2: Diagnostic validity declines when task predictabil-
ity declines. The decline is more pronounced for the
hierarchical partitioning modeling method.
We expect a decrement in diagnostic validity
for both modeling approaches when task pre-
dictability declines. Inconsistent use of judg-
ment cues under conditions of low task
predictability will undermine the validity of
model diagnostics for both modeling meth-
ods. Moreover, as discussed above, since low
task predictability tends to result in judges
using cues in a more linear fashion, we expect
less decrement in diagnostic validity for the
linear modeling approaches.
Structural val i di ty
H3: The correlation between benchmark cue impor-
tances and model Inferred cue Importances wilI
increase with an increasingly nonlinear benchmark
model when a hierarchical partitioning method is
employed. This effect will be most pronounced under
conditions of high task predictability.
MODELS OF JUDGMENT II
METHOD
The experiment
The database for this study was assembled by
selecting a total of forty-four subject records
from two previous experiments (Casey & Sell-
ing, 1986; Selling, 1993). Subjects were MBA
students at a small, New England college.
There were no significant differences between
subjects participating in the two studies in
terms of demographic variables, accounting
courses taken (study 1: mean = 2.65, o =
0.80; study 2: mean = 2.65, (3 = 0.77), years
of relevant accounting experience (study 1:
mean = 1.03, CT = 1 .Ol; study 2: mean = 1.09,
CT = 1 .15), or average number of cues used
(study 1: mean = 3.82, o = 3.68; study 2:
mean = 3.68, 0 = 1.44). Below we describe
the experimental conditions relevant for sub-
jects included in this analysis.
Experimental stimuli were developed from
records of actual companies. The same thirty
companies were included in both studies and
were described by seven financial ratios.’ Fif-
teen of the included companies were healthy,
and fifteen companies were in bankruptcy.
Both studies were computer-based. The
actual instructions provided to study partici-
pants are reproduced in Appendix B. Subjects
included in this study were also told 50% of the
thirty companies included in the study were
bankrupt. Each (unnamed) company profile
was presented on a separate screen. Subjects
were invited to select as many of the seven
available financial ratios as desired before mak-
ing a judgment as to whether the company had
gone bankrupt or not. Withholding the identity
of the companies forced subjects to make their
judgments based on the experimental stimuli,
rather than knowledge held in memory from
information available outside the experimental
setting (Arch et al., 1978).
Task predictability was manipulated by vary-
ing the age of accounting data presented. In the
high task predictability condition, current
accounting data was used. In the low task pre-
dictability condition, accounting data was one
year older. There were twenty-three subjects in
the high task predictability condition. There
were 11 subjects from study 1 and 12 subjects
from study 2 who fit the high task predictability
condition for a total of 23 subjects in the high
task predictability condition used in this
research. There were 11 subjects from study
1 and 10 subjects from study 2 who fit the
low task predictability condition for a total of
21 subjects in the low task predictability con-
dition used in this research.
Subjects were informed that awards would
be given to the best performing subjects based
on a point score that was a linear function of
their predictive accuracy (increasing) and num-
ber of cues chosen (decreasing). For each sub-
ject, the computer program recorded cues used
in the order they were attended to, and judg-
ments for each trial. The scorekeeping function
was intended to discourage subjects from
observing what they considered to be unimpor-
tant cues, but to attend to a sufficient number of
cues to achieve a high degree of judgment ac’cu-
racy. Given the large penalty for an incorrect
bankruptcy prediction-an incorrect predic-
tion using only one cue resulted in a lower
score than a correct prediction using all seven
cues-the scoring function would not artifi-
cially bias subjects towards minimal use of
cues and a nonlinear judgment model.
J udgment modeling
The Concept Learning System (Currim et al.,
1988), or CLS as described above, was chosen
as the hierarchical partitioning method to com-
pare to the linear model primarily because of
its ease of operation in the context of this
study. Specifically, the software package is par-
ticularly well-suited to analyzing data one sub-
ject at a time. Since numerous versions of the
linear model appear frequently in the literature
when the dependent variable is nominally
scaled, we chose to use both discriminant
analysis and logistic regression.
’ The seven fmancial ratios were (1) cash to total assets, (2) current assets to total assets, (3) current ratio, (4) current asset
turnover, (5) return on total assets, (6) cash flow to total debt, and (7) total debt to equity.
12 L. G. SCHNEIDER and T. I. SELLING
Measurhg predi cti ve accuracy
We measured predictive accuracy using two
methods. First, following Currim et aC. (1988)
and Messier & Hansen (1988) we compared
the predictive accuracy of the two modeling
approaches on a holdout sample. To do this,
we split each subject’s judgment data into an
estimation sample and a validation sample,
each sample containing half of the subject’s
judgments. We used the estimation sample to
infer a separate judgment model for each sub
ject. We then used this model to predict judg-
ments in the validation sample. This first
approach resulted in estimating models with
relatively few degrees of freedom (i.e. a max-
imum of thirteen if only one cue is selected).
Based on the results of Currim et al . (1985) we
might anticipate CLS to outperform logit or
discriminant models in this case.
We created a second environment with more
liberal degrees of freedom by using a jackknif-
ing approach (Efron, 1979). Jackknifing pro-
vides a means of increasing the number of
observations available to test predictive accu-
racy by estimating a separate model on all but
one observation and then using the model to
predict the omitted observation. This proce-
dure is repeated until all observations have
been predicted. Since our data were based on
firms that were either successful or bankrupt,
we held out two observations (one successful
iirm, one bankrupt firm) in order to maintain
the 50% classification prior probability.
Measuri ng vari abl e or cue i mportance
In traditional linear modeling approaches,
the estimated beta weights and associated t-
statistics provide information on the signifi-
cance of the independent variables. If the
underlying scales of the variables differ,
however, the raw beta weights are not compar-
able. Standardized betas are correct for differ-
ences in variable scaling. Nonsignificant betas
were set to zero in order to avoid increasing
the variation in cue importance ranking
inappropriately.
As previously described, modeling via the CLS
approach provides a measure of variable impor-
tance which ranges between 0 and 1 called
I&IMP. IUMP measures the frequency with
which a variable appears in the decision tree
across all branches in the model and is akin to
the measure of predictive importance described
by Bettman (1974). The relative importance of
an independent variable is simply based on the
relative size of the IUMP measure.
The information acquisition protocol data
provide a third method for inferring cue impor-
tance. Bettman (1974) suggests the order in
which a variable is observed presupposes its
importance. Based on this guidance, we create
a protocol-based measure of cue importance by
computing the average order of selection. Spe-
cifically, if a cue was attended to first, it was
assigned a value of 7 (corresponding to the
seven financial ratios available). If a cue was
attended to second, it was assigned a value of
6, etc. If the variable was not attended to, it
was assigned a value of 0. Cue importance for
each subject was determined by the rank of the
average of these assigned values. A related mea-
sure might be the frequency with which a cue
is observed.
Measuri ng the extent of noncompensatoty
processi ng
Payne (1976) reasons that differences in
number of cues used across judgment stimuli
is evidence of a noncompensatory judgment
process. Evidence for compensatory proces-
sing in the current study would be found if all
cues used were accessed an equal number of
times. We therefore measured the deviations
from compensatory processing for a subject
as the variance in cue usage frequency across
the thirty cases (VAR).3 Subjects attended to an
average of 2.93 cues per trial. A compensatory
model is indicated if VAB is close to zero.
3 It is possible that subjects initially observe most of the cues in the beginning of the study, but then settle on a smaller
subset of cues as they become more accustomed to the experimental stimuli. This may cause a downward bias in VAR.
However, as our goal is to contrast the relative performance of linear and nonlinear modeling approaches as models
become increasingly noncompensatory, the potential bias due to learning may not undermine the accuracy of our results.
MODELS OF JUDGMENT 13
TABLE 1. Model predictive accuracy (standard deviations are in parentheses)
Subgroup analysis CLS Logistic regression Discriminant analysis
Sprrt hal f val i dati on:
Pooled 0.712’ 0.549 0.511
N= 44 (0.160) (0.213)
(0.188)
High task 0.753t 0.497 0.425
Predictability (0.154) (0.198) (0.163)
N = 23
Low task
Predictability
N= 21
0X%6* 0.605 0fIosg
(0.158) (0.218) (0.170)
J ackkni fi ng:
Pooled
N= 44
High task
Predictability
N= 23
0.744” 0.660 0.714
(0.136) (0.208) (0.109)
0.768 0.650 0.736
(0.108) (0.270) (0.078)
Low task
Predictability
N= 21
0.717 0.672 0.690
(0.161) (0.111) (0.132)
l Significantly different from 0.549 (p =

oOl)
Significantly different from 0.511 (p = 0.0001)
-f Significantly different from 0.497 (p = O.OOOl>
Significantly diierent from 0.425 (p = 0.0001)
$ Not significantly different from 0.605
Significantly different from 0.753 (p = 0.07)
$ Significantly different from 0.425 (p = 0.0009)
I’ Significantly different from 0.660 (p = 0.029)
Not significantly diierent from 0.714.
Larger values of this measure indicate an
increasingly noncompensatory model.
RESULTS
Overvi ew
To introduce this section, we provide a com-
parison of the three modeling methodologies
based on the aggregated data from forty-four
subjects. In terms of predictive validity, CLS
dominates both logistic regression and discri-
minant analysis using the split-half validation
criterion. The validation fit for CLS is 71.2%
versus 54.9% and 51.1% for logistic regression
and discrlminant analysis, respectively (p <
0.0001 in both cases).
Since past research (Cm-rim et al., 1988) has
found the linear modeling approaches and CLS
to converge in terms of predictive accuracy
when degrees of freedom are liberal, we
would expect predictive accuracies of the
three modeling methods to become more com-
parable with the more liberal degrees of free-
dom available when using the jackknifing
procedure. Our expectations were realized.
Predictive accuracies for the three approaches
were 74.4% (CLS), 66% (logistic regression),
and 71.4% (discriminant analysis) (Table 1).
CLS only outperformed logistic regression (p
< 0.029).
We used rank order correlation. rather than
Pearson correlation, in order to compare
modelderived cue importances with those
derived from the information acquisition proto-
cols, because rank order correlation adjusts for
differences in ranges of variable importance
measures. While CLS (p = 0.395) performed
14 L. G. SCHNEIDER and T. I. SELLING
TABLE 2. Correlations of CLS and linear modeling approaches with protocol inferred cue importances
Correlation of protocol cue importance with:
Subgroup anaIysis CLS Logistic regression Discriminant analysis
Pooled 0.39s 0.236 0.283
N=44
High task predictability 0.463 0.226 0.322
N= 23
Low task predictability 0.323 0.246 0.240
N= 21
somewhat better than logistic regression (p =
0.236) and discriminant analysis (p = 0.283),
the differences did not achieve an acceptable
level of statistical significance (Table 2).
Finally, we compared the ability of the three
models to uncover the cue importances
inferred from the information acquisition pro-
tocols (our benchmark for cue importance)
when the degree of nonlinearity of the model
is taken into account (Table 3). To study this
question, we created a regression model in
which the correlation of the model-inferred
cue importance with the benchmark cue
importance was the dependent variable and
the independent variable was VAR. We found
the CLS model outperforms both logistic
regression and discriminant analysis, especially
as the underlying model becomes more non-
compensatory (coefficient on the VAR term is
positive and statistically significant at the p c
0.05 level).
To summarize our aggregate results, we find
that CLS outperforms both logistic regression
and discriminant analysis in terms of predictive
validity. Results are a little less clear in terms of
diagnostic and structural validity. However, it
appears CLS outperforms the linear modeling
approaches in terms of structural validity
when the underlying model is noncompensa-
tory.
The i mpact of task predi ctabi l i ty on
predi cti ve, di agnosti c, and structural
val i di ty
Hypothesi s 1. I n our first hypothesis we rea-
son that predictive validity declines when task
predictability is low and when a hierarchical
partitioning modeling approach is employed.
Since judges may appear to combine informa-
tion in a more linear fashion when task predict-
ability is low, we would expect linear
estimation methods to be able to capitalize on
this apparent linearity and improve on the pre-
dictive accuracy dimension. As reported in
Table 1, CLS outperforms the two linear mod-
eling approaches on the basis of predictive
validity. This result is consistent with pre-
vious research (Currim et al ., 1985; Messier
& Hansen, 1988) that reported benefits in pre-
dictive validity when CLS is used under condi-
tions of low degrees of freedom. We also
found that performance of the linear model-
ing approaches improve when task predict-
ability is low (and the underlying model is
more likely to appear compensatory), while
performance of CLS improves when task pre-
dictability is high (and the underlying model is
more likely to appear noncompensatory).
Thus we find support for our first hypothesis.
Hypothesi s 2. Our second hypothesis states
that diagnostic validity is lower when task pre-
dictability is low, and the decline is more pro-
nounced for the CLS algorithm. Differences in
performance between CLS and the linear mod-
eling approaches did not achieve acceptable
levels of statistical significance (see Table 2).
However, CLS appears to perform better in
conditions of high task predictability, as com-
pared to low task predictability (consistent
with the hypothesis). In addition, CLS always
appears to outperform the two linear modeling
MODELS OF JUDGMENT 15
TABLE 3. Correlation of model and protocol inferred cue importances as a function of the nonlinearity of the underlying
choice model’
Task predictability
condition Intercept VAR RZ
Correlation of protocol with:
CLS Aggregated 0.097
(0.150)
2.335t
(1.101)
0.097
0.126
0.078
0.017
0 017
0.016
0.026
0.064
0.003
High
N= 23
0.150
(0.189)
2.403t
(1.382)
LOW
N= 21
0.037
(0.245)
2.283
(1.797)
Logistic regression Aggregated 0.364t
(0.150)
-0.929
(1.101)
=gh
N= 23
0.352+
(0.203)
-0.903
(1.479)
LOW
N= 21
0.379
(0.239)
-0.978
(1.758)
Discrimiint analysis Aggregated 0.144
(0.145)
1.116
(1.059)
High
N= 23
0.120
(0.182)
1.589
(1.323
LOW
N= 21
0.196
(0.241)
0.402
(1.773)
l Standard errors in parentheses
tp 6 0.05
* p s 0.10.
approaches both in aggregate and across task
predictability conditions.
Hypothesis 3. Our final hypothesis states that
the correlation between protocol-inferred and
model-inferred cue importances will increase
with an increasingly nonlinear underlying
model, when a hierarchical partitioning meth-
odology is employed, in a high task predictabil-
ity condition. We find strong support for this
hypothesis, as reported in Table 3. Note that
the relationship between model structure and
the correlation of protocol cue importances
and those derived from a hierarchical partition-
ing approach is driven by the high task predict-
ability condition. This effect is indicated by the
increase in R* to 0.126. This finding is consistent
with our expectation that in a high task predict-
ability condition model nonlinearities are more
apparent.*
SUMMARY AND CONCLUSIONS
This research has investigated the relative
strengths of linear and hierarchical estimation
techniques for inferring judgments, and cogni-
4 We performed several sensitivity analyses to check the stability of our results. For example, we tried several different
definitions of model structure. We also investigated an alternative measure of cue importance based only on the frequency
with which the cue was observed, independent of order of observation. Both sets of analyses provide similar results.
16 L. G. SCHNEIDER and T. I. SELLING
tive processes underlying those judgments
under two conditions of task predictability.
We did this by estimating both linear and non-
linear models for subjects involved ln a sol-
vency/bankruptcy judgment task. The data
used were actual judgments by subjects, and
measures of cue importance and noncompensa-
tory processing derived from concurrent infor-
mation acquisition protocols. This approach
provides a first step in adding a measure of
external validity to past research which is
based on simulated data.
The results of our analysis suggest that ben-
efits may be gained from using a nonlinear
approach in modeling judgments without a
high risk of losses in terms of measures of pre-
dictive, diagnostic, and structural validity. Our
risk assessment is based on the consistently
high performance of CLS when degrees of free-
dom and task predictability are varied.
We next turned to an investigation of three
measures of model validity: predictive, diagnos
tic, and structural. The resuits are consistent
with past research indicating that a nonlinear
approach provides benefits in terms of predic-
tive validity in cases where degrees of freedom
are low, as in our tests using a holdout sample.
The two modeling approaches appear to
become more comparable in their predictive
accuracies when degrees of freedom are more
liberal, as was found in the jackknifing results.
We might expect a similar result if a jackknifing
procedure were applied to the data in Messier
& Hansen (1988).
Noncompensatory modeling appears to be
superior to linear modeling in terms of the
validity of the cue importance measure (diag-
nostic validity). At both levels of task predict-
ability, the correlation of benchmark cue
importances (as inferred from the information
acquisition protocols) is directionally superior
to that of the linear model. In terms of struc-
tural validity, the correlations between the non-
linear model cue importance measures and the
benchmark measures improve as the underly-
ing model structure deviates from linearity.
This effect is mostly attributable to the findings
for the high task predictability condition. We
are unable to observe, however, any change in
the correlation between linear model measures
of cue importance and information acquisition
protocol-based measures as the underlying
model deviates from linearity in the low task
predictability condition. Linear models cannot
be relied upon to capture structural relation-
ships, but only statistical relationships.
It is interesting to compare certain aspects of
the current study to studies employing a simu-
Iation methodology. In the latter, judgment
processes and the role of individual cues could
have been precisely specified. However, the
problem of quantifying cue importance mea-
sures when cues are used configurally is a com-
mon limitation. For the current study, this
limitation might arise in assigning cue impor-
tances when using the information acquisition
protocols as data. One benefit of using the CIS
modeling methodology is that the RIMP mea-
sure infers cue importance based on the posi-
tion and frequency of appearance of the cue in
the judgment model.
Information acquisition protocols provide a
rich source of behavioral data. In our study,
subjects were second-year graduate MBA stu-
dents, who in the strictest sense would be
regarded as novices, although they might
more appropriately be thought of as non-
experts. Experts (in contrast to novices)
may have wellestablished and/or idiosyn-
cratic procedures for culling information
from a set of financial measures, and may in
some way be affected by the constraints
imposed by a computer-based experiment.
As a result, if this research were replicated
on expert subjects, some modification in the
information acquisition protocol procedure
may be required.
It is also possible that experts may exhibit a
different degree of compensatory processing
than our student subjects. Expert judgment
processes may fit a hierarchical model better
than a compensatory model because experts
tend to recognize and deal with complex inter-
actions more competently than novices, or use
MODELS OF JUDGMENT
17
information cues differently. For example,
Ettenson et al. (1987) contrasted experts and
novices on information use and estimated judg-
ment models. They found no difference
between experts and novices in terms of the
number of cues that were significant predictors
of judgment, but found the experts tended to
focus on one primary cue and used other cues
to “make minor adjustments” @. 236) suggest-
ing a more hierarchical processing strategy. No
single cue seemed to dominate students’ judg-
ments, suggesting a more compensatory judg-
ment strategy. The Ettenson et al. (1987)
findings suggest our results are conservative,
and that it may be appropriate to employ non-
compensatory modeling approaches in the
development of expert judgment models for
use as training and/or predictive tools.
One limitation of this methodology, is that
the researcher cannot know how information
cues were actually combined or retained in
memory during processing (Payne & Rags-
dale, 1978; Russo, 1978). Measures of impor-
tance based on the sequence of cue
observance may therefore be a measure of
cue importance in the sense of cue salience
rather than cue importance in a predictive or
diagnostic sense (Bettman, 1974).5 This sug-
gests that benefits may be available through
use of a modeling methodology such as CLS
or a regression-based approach as a comple-
ment to information acquisition protocols.
BIBLIOGRAPHY
Arch, D. C., Bettman, J. R. & Kakkar, P., Subjects Information Processing in Information Display Board
Studies, in Hunt, H. K. (ed.), Advances in Consumer Reseurcb Vol. V (Chicago: Association for
Consumer Research, 1978).
Barr, A. & Feigenbaum, E. A., Z%e Handbook of Artificial Intelligence, Vol. II (London: Pittman Books,
1982).
Bettman, J. R., An Information Processing ‘IBeoy of Consumer Choice (New York: Addison-Wesley,
1979).
Bettman, J. R., Toward a Statistics for Consumer Decision Net Models, Journal of Consumer Research
(1974) pp. 71-80.
Biggs, S. F., An Empirical Investigation of the Information Processes Underlying Four Models of Choice
Behavior, Accounting Symposium, Ohlo State University (1978).
Blggs, S. F., Messier, Jr., W. F. & Hansen, J. V., A Descriptive AnaIysis of Computer Audit Specialists’
Decision-making Behavior ln Advanced Computer Environments, Audfting: A Journal of Practfce and
Tbeoy (1987) pp. 1-21.
Biggs, S. F. h Mock, T. J., An Investigation of Auditor Decision Processes in the Evaluation of Internal
Controls and Audit Scope Decisions, Journal of AccountJng Reseurcb (1983) pp. 234-255.
Bii, R. S. & Marcus, S. A., Measures of Compensatory and Noncompensatory Models of Decision
Behavior: Process Tracing versus Policy Capturing, OrganfzaNonal Bebavfor and Human Perfor-
mance (1983) pp. 331-352.
Brehmer, B., Response Consistency in Probabilistic Inference Tasks, Orgcznfzutionul Bebuuior and
Human Performance (1978) pp. 103-115.
Brehmer, B., Note on Cllnlcal Judgement and the Formal Characteristics of Cllnlcal Tasks, Psycbologicul
Bulletin (1976) pp. 778-782.
Brehmer, B., Effects of Task Pnxlictabllity and Cue Valldity on Interpersonal Learning of Inference Task
Involving Both Linear and Nonlinear Relations, Organizatfonal Bebavfor and Human Performance
(1973a) pp. 24-26.
s We thank Eric Johnson of the University of PennsyIvania for this insight
18 L. G. SCHNEIDER and T. I. SELLING
Brehmer, B., Policy Conflict and Policy Change as a Function of Task Characteristics. IX. The Effect of Task
Predictability, Scandinavian Journal of Psychology (1973b) pp. 220-280.
Brehmer, B., Cue Utilization and Cue Consistency in Multiplecue Probability Learning, Organfzatfonul
Bebavior and Human Pet$ormance (1972) pp. 286-2%.
Brown, C. E. & Solomon, I., Auditor ConIiguraI Information Processing in Control Risk Assessment,
Auditing: A Journal of Z?actke and Theory (Fall 1990) pp. 17-38.
Casey, C. J, & SelIing, T. I., The Effect of Task PredictabiIity and Prior Probability Disclosure on Judge-
ment Quality and Confidence, The Accountfng Review (1986) pp. 301-317.
Currim, I. S., Meyer, R. J. & L.e, N. T., Disaggregate Tree-structured Modeling of Consumer Choice Data,
Journal of Markeling Research (1988) pp. 253-266.
Currim, I. S., Meyer, R. J. & Le, N. T., An Inductive Learning Algorithm for Inferring Production-system
Models of Consumer Choice, U.C.L.A. Working Paper (1985).
Currim, I. S. & Schneider, L. G., A Taxonomy of Purchase Strategies in a Promotion Intensive Environ-
ment, Marketing Science (1991) pp. 91-110.
Dawes, R. M. & Corrigan, B., Linear Models In Decision Making, PsycboiogfcaZ Bulletin (1974) pp. 95-
106.
Efron, B., Bootstrap Methods-Another Look at the Jackknife, Annals of StaUsrics (1979) pp. l-26.
Ettenson, R., Shanteau, J. & Krogstad, J., Expert Judgment: Is More Information Better? Psycbologfcal
Reports (1987) pp. 227-238.
Hunt, E. B., Marin, J. & Stone, P. T., Experfment in Znduclion (New York: Academic Press, 1966).
Klersey, G. F. & Mock, T. J., Verbal Protocol Research in Auditing, Accounting, 0tganizatfon.s and
Society (1989) pp. 133-151.
Larker, D. F. & Lessig, V. P., An Fxamioation of the Linear and Retrospective Process Tracing Approaches
to Judgment Modeling, The AccounUng Review (1983) pp. 58-77.
Lewis, B. L., Patton, J. M. & Green, S. L., The Effects of Information Choice and Information use on
Analysts Predictions of Municipal Bond Rating Change, The Accounting Review (1988) pp. 270-282.
McFadden, D., Modeling the Choice of Residential Location, in Karlquist, A. et al. (eds.), Spatial Znter-
action Theory and ResidentiaZ Location (Amsterdam: North-Holland, 1978) pp. 75-%.
Meservy, R. D., Bailey, Jr., A. D. t Johnson, P. E., Internal Control Evaluation: A Computational Model of
the Review Process, Auditing: Journal of Practice and Tbeory (1986) pp. 44-74.
Messier, Jr., W. F. & Hansen, J. V., Inducing Rules for Expert System Development: An Example Using
Default and Bankruptcy Data, Management Science (1988) pp. 1403-1415.
Michalski, R. S. & Chilausky, R. L., Learning by Being Told and Learning from Examples: An Experimental
Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert
System for Soybean Diagnosis, Znternational Journal of Policy Analysfs and Information Systems
(1980) pp. 125-161.
Naylor, J. C. & Clark, R. D., Intuitive Inference Strategies in Interval Learning Tasks as a Function of
Validity Magnitude and Sign, Organizattonal Bebavfor and Human Performance (1968) pp. 378-
399.
Newell, A. & Simon, H. A., Human Problem Solving (Englewood Cliffs: Prentice-Hall, 1972).
Payne, J. W., Contingent Decision Behavior, Psychological BuUetin (1982) pp. 382-402.
Payne, J. W. & Easton Ragsdale, E. K., Verbal Protocols and Direct Observation of Supermarket Shopping
Behavior: Some Findings and a Discussion of Methods, Advances in Consumer Research Vol. V
(Chicago: Association for Consumer Research, 1978).
Payne, J. W., Task Complexity and Contingent Processing in Decision Making: An Information Search and
Protocol Analysis, Organixatfonal Bebavior and Human Performance (1976) pp. 366-387.
Russo, J. E., Eye Fixations Can Save the World: A Critical Evaluation and a Comparison Between Eye
Fixations and Other Information Processing Methodologies, in Hunt, H. K. (ed.), Advances in Con-
sumer Research Vol. V (Chicago: Association for Consumer Research, 1978).
Schneider, L. G. & Currim, I. S., Consumer Purchase Behaviors Associated with Active and Passive Deal
Proneness, International Journal of Research in Marketing (1991) pp. 205-222.
Selling, T. I., Confidence Evidence from a Bankruptcy Prediction Task, Bebavkral
Research in Accounting (1993) pp. 237-264.
Selling, T. I. and Shank, J., Linear versus Process Tracing Approaches to Judgement Modeling: A New
Perspective on Cue Importance, Accounting, Organizations and Society (1989) pp. 65-77.
Shields, M., Some Effects of Information Load on Search Patterns Used to Analyze Performance Reports,
Accounting, Organizations and Society (1980) pp. 429-442.
MODELS OF JUDGMENT
19
Simnett, R., & Trotman, K., Auditor versus Model: Information Choice and Information Processing The
Accounti ng Revi ew (1989) pp. 514-528.
Slavic, P., Cue-Consistency and Cue-Utilization in Judgment, Amerl cunJ oumal ofP.ychol ogy (1966) pp.
427-434.
Tversky, A. & Sattath, S., Preference Trees, Psycboioglcal Review (1979) pp. 542-573.
Uhl, C. N., Fffects of Multiple Stimulus Validity and Criterion Dispersion of Learning of Interval Concepts,
Journal of Expeti mental Psychol ogy
Dep. Var. Cash Bow to Cash flow to Cash flow to
Survive = 1 total assets Current ratio total debt total debt ROA
Firme Bankrupt = 0 (cut-off 1) (cutoff 4) (cut-off 1) (cut-off 2) (cut-off 3)
825039 1 0 0 0 0 0
34
0 0 1 0 0 1
35 0 0 0 1 1 1
40 0 1 1 0 1 1
559108 1 0 0 0 0 1
758663 1 0 1 0 0 1
Recall that the criterion value is based on (1) the frequency with which a Bnancial ratio occurs at a
given cutoff (J&X,,) and (2) the conditional probabhity of a given outcome (e.g. that the Iirm is solvent or
bankrupt), given the specilic cut-off of the IinanciaI ratio In question @(outcome j&, >). These two
measures are then used to calculate the criterion value
C(x,, ) = - 2 fJlx, Wolventll, xi, )log&olventll, x9)
,=*
+P(banIouptk xf, )log$(bam=tptV, ~ec, )I.
Using the abbreviated data set shown in panel 2, we can calculate the criterion value for the Iimt cut-off
for cash flow to total assets. First, the frequency with which cash flow to total assets appears below the
cutoff is 1. Similarly, the frequency with which cash flow to total assets appears above the cut-off is 5.
The conditional probability that subject x6 predicts the firm is solvent, given cash flow to total assets
below the cut-off is zero. ‘Ihe conditional probability that subject Xg predicts the firm is bankrupt, given
cash flow to total assets below the cut-off is one. Similar calculations are performed for subject x6’s
predictions when cash flow to total assets are above the cut-off.
The actual calculation of the criterion value for the first cut-off for cash to total assets using the very
abbreviated data set presented in panel 2 is as follows:
c(~~,=-~x((oxlo~))+(lx~o~l))
+5x ((;xIoga

+(;xlogaa>)]
C(x,,) = 6.089.
The same calculations may be performed on the full data set for the same cues and associated cut-off
values. These are presented in the table below:
Variable cutoff Criterion value
Cash to total assets 1 4.089
Currcnt ratio 4 15.776
ROA 3 8.957
Cash flow to total debt 1 8.827
Cash flow to total debt 2 3.897
MODELS OF JUDGMENT
The smallest criterion value (3.897) corresponds to cut-off 2 for the Total Debt to Equity variable. This
21
variable is used as the first split in the hierarchical model for subject ++6. At this point, the algorithm
proceeds to step 2. Specifically, the data are partitioned into two groups: observations for which cash
flow to total debt is less than or equal to 0.028, and observations for which cash flow to total debt is
greater than 0.028. Within each of these two partitions, a criterion value is calculated for the remaining
cut-offs of the independent variables, as previously illustrated. The estimated model for subject ti is
presented in panel 4.
PANEL 4: CLS model for subject 6
Yes
Current
ratio
5 I x5.5
Cash flow
to debt _
< 0.028
(I) Predict bankruptcy
Yes
Yes
Cash flow to 1
(4) Predict bankruptcy
(5) Predict survival
Routes (3) and (5)-the “predict survival” routes-ate formally in the hierarchical partitioning model,
while routes (l), (2). and (4) arc implied by the model.
PANEL 5: Cal cul ati on of the RI MP measure
Calculation of the RIhfP measure is based on the frequency with which a variable appears in the
model, divided by the total number of variables in the model (i.e. a total of five variables appear in routes
(3) and (5) which are formally a part of the model. Notice for subject x6 that there are a total of five
variables that appear in the model. Cash flow to total debt appears twice at the s 0.028 level and once at
the G -0.088 level, for a total of three appearances. The RIMP measure for cash flow to total debt is
simply 3 + 5 or 0.6. Both cash flow to total assets and the current ratio appear once in the model. The
RIMP measure for both these variables is 1 + 5 or 0.2.
APPENDIX B: INSTRUCTIONS TO SUBJECTS
In the study you will be presented with financial accounting ratios for one year for each of 30 real-life
companies with concealed identities. Based on these data and supplementary information on page 2 of
the instructions, you are to: (1) predict whether each 6rm will survive or go bankrupt during the two-
year period after the year for which data arc presented, and (2) estimate the probability in your judgment
of survival or failure. Since real companies and historical accounting data are involved, each company’s
actual status, failed or survived, is already known.
The diskette that you received is your personal copy to be returned to the monitor at the end of the
session. Please do not exchange it with anyone. You are also asked not to talk with anyone about the
22 L. G. SCHNEIDER and T. I. SELLING
study until it has been completed on Thursday, May 15, the date of the last experimental session. The
monitor will answer any questions you may have dtuing the course of the experiment.
In this study you will be asked to make two judgments for each of 30 iirms: a prediction that the
company wEI either fail (i.e. go bankrupt) or survive, and an assessment of the probability, expressed as a
percent, that your prediction Is correct. Thus, your probability estimate can be any number from 50 to
100 and can be interpreted as your degree of certainty about the correctness of your answer. For
example, if you respond that the probabiEty is X%, it means that you believe that there are about X
chances out of 100 that your answer is correct. A response of 100% means that you are absolutely certain
that your answer is correct. A response of 50% means that your best guess is as likely to be right as wrong.
No estimate below 50% is allowable, because you should always be picking the alternative that you think
is more likely to be correct.
Before you make each prediction, you wiB have an opportunity to select as few as 1 or as many as 7
financial ratios for display on the screen. A charge of 10 points wiE be assessed against you for each ratio
that you select; thus, the total charge for one company wiE range from 10 to 70 points. These points will
be subtracted from the points that are awarded for your prediction. 100 points wiE be awarded for a
correct prediction (i.e. failure or survival) and 100 points deducted for an incorrect prediction. Thus, for
each of the 30 iirms Included in the study the point-scoring formula is:
Correct prediction: 100 - (10 X number of ratios selected);
Incorrect prediction: -100 - (IO X number of ratios selected).
For each correctly predicted firm your score can range from 30 to 90 points, the exact amount
depending on the number of ratios that you selected. For an incorrect prediction, your score wiE range
from - 110 to - 170 points, again the exact amount depending on how many ratios you selected. The
sum of points for all 30 Brms will determine your final score, except for bonuses, described next.
Your total score wiE be adjusted by the degree to which you are “wellcallbrated”. In this experiment,
someone is defined to be perfectly caRbrated if the percentage of firms they predlcted correctly is equal
to the average of their probability responses (described in the second pamgraph of these Instructions).
Subjects who are the best calibrated will receive the highest number of bonus points, and so on. Hence,
your performance score can be maximized only by responding as accurately as you can when assessing
the probability that a judgment is correct.

doc_260669655.pdf

A COMPARISON OF COMPENSATORY AND NON COMPENSATORY MODELS OF JUDGMENT

Attachments