THE IMPACT OF THE REVIEW PROCESS IN HYPOTHESIS GENERATION TASKS

Description
This paper examines the performance of auditors in generating hypotheses in an analytical review case
both prior to and after review. It partitionst wo possible sourceso f gain from the review process, namely
the discussion effect and the rank effect. The discussion effect compares review groups with and without
discussion.

Pergamon Accounting, Organizations and Sociefy, Vol. 20, No. 5, pp. 345-357, 1995
Copyright 0 1995 ELwvier Science Ltd
Printed in Great Britain. AU rights reserved
0361-3682/95 $9.50+0.00
03613682(95)OOOOZX
THE IMPACT OF THE REVIEW PROCESS IN HYPOTHESIS GENERATION TASKS*
ZUBAIDAH ISMAIL
Nati onal Uni versi ty of Shgapore
and
KEN T. TROTMAN
Uni versi ty of New South Wal es
Abstract
This paper examines the performance of auditors in generating hypotheses in an analytical review case
both prior to and after review. It partitions two possible sources of gain from the review process, namely
the discussion effect and the rank effect. The discussion effect compares review groups with and without
discussion. The rank effect compares seniors and audit managers. The study found that the review
process results in more plausible hypotheses being generated. Discussion within the review process
between reviewer and reviewee was found to be one source of gain from the teview process. While
there was no difference in the performance of the senior and manager mviewets, managers completed
the task in less time.
Although the review process in auditing is an
integral control mechanism of audit firms
(Libby & Trotman, 1991; Solomon, 1987) the
multi-person character of the audit environ-
ment has largely been overlooked in existing
audit judgment Literature (Solomon, 1987). A
few exceptions (e.g. Trotman, 1985; Trotman
81 Yetton, 1985; Ramsay, 1994) document some
gains from the review process in internal con-
trol evaluation tasks. However, we know little
about the sources of these gains and under
what circumstances these gains are Likely to
occur.
This paper explores some structural and pro-
cess explanations for improvements resulting
from the review process. Specilically, it
examines the performance of auditors in gen-
erating hypotheses in an analytical review case
both prior to and after review. It seeks to parti-
tion two possible effects of review which may
bear upon performance. These are the discus-
sion effect and the rank effect. The discussion
effect is examined by comparing review groups
with discussion and review groups without dis
cussion. The rank effect is partitioned by look-
ing at review by seniors and review by
managers.
At present little is known about the compara-
tive performance of individuals and audit teams
in the planning stage of the audit (Solomon,
1987). However, judgments in this stage of
the audit opinion formulation process are criti-
l This project was supported by an Austiaiian Research Council Grant. We wish to thank Kevin Bird, Peter Luckett, Norma
Marth~ov and Peter Roebuck for assistance in the design and analysis of the expe-riment. We are also grateful for the
comments of Sarah Banner, Russell Craig Peter Gillett, Ganesh Krishnamoorthy, Rob Libby, Roger Simnett, ha Solomon,
Rick Tubbs, Amie Wright and Sally Wright as well as the participants at the USC Audit Judgment Conference, Maastricht
Audit Conference and the Behavioral Accounting Research Colloquium, Boston.
345
346 2. ISMAIL and K. T. TROTMAN
cal as the auditor’s starting point for subse-
quent audit activities. The initial formulation
of the set of possible errors may direct informa-
tion search (Kida, 1984; Trotman & Sng, 1989)
and evaluation of the most likely error (Heiman,
1990). Conclusions from other research
domains emphasize the importance of this
stage of decision making. For example, in med-
ical research, it was found that the primary
cause of diagnostic failure was not the misinter-
pretation of subsequent tests, but the failure to
ever consider the true explanation for present-
ing symptoms in the physicians’ set of initial
hypotheses @Istein et al ., 1978). In addition
to its importance, the generation of hypoth-
eses in an analytical review situation is an
appropriate task to use in examining the
review process as auditors often inherit
hypotheses for an audit difference from fellow
members of the audit team (Libby, 1985). In
addition, review activities include thinking of
other possible explanations for audit results
shown on workpapers (Bamber et aZ., 1988).
The first variable manipulated is the level of
discussion. The reviewer may review the work-
papers by discussing with the senior or review
may entail a review of the workpapers in the
senior’s absence, that is, review without discus-
sion. Both these forms of review appear in
practice and have been examined in the audit
judgment literature. For example, Cohen &
Rida (1989) and Trotman & Yetton (1985)
studied the review of work done by a hypothe-
tical senior so there was no face-to-face discus-
sion Trotman (1985) aRowed for discussion as
part of the review process. However, no study
has examined the relative effectiveness of the
two different forms of the review process.
Another factor which may bear upon the
performance of the review team is the rank
of the reviewers. Audit review can be con-
ducted by someone of equal or higher rank
(AUP 13, 1993). Rank wiII be highly correlated
with experience and wiII proxy for knowledge
and ability (Bormer & Lewis, 1990). Findings of
studies which explain group superiority over
individual judgments (e.g. Yetton & Bottger,
1982; Trotman, 1985; Libby et al ., 1987)
indicate that it is the task and the spread of
experience among group members, which
facilitate the recognition of expertise, which
in turn determines the weighting and combina-
tion of judgments. Also, studies at the indivi-
dual level have found that experience has an
impact in hypothesis generation tasks. For
example, it has been found that more experi-
enced audit managers generated more plausi-
ble and fewer implausible hypotheses (Bedard
& Biggs, 1991; BOMer & Lewis, 1990; Libby &
Frederick, 1990). Experience also improved
auditors’ perception of the frequency of occur-
rence of financial statement errors. Experienced
auditors, hence, were found to generate more
frequently occurring errors as explanations for
audit findings (Libby & Frederick, 1990). How-
ever, no study has examined the impact of using
reviewers of similar or higher rank than the
reviewee.
DEVELOPMENT OF PROPOSITIONS
This study examines a hypothesis generation
task in the prehrninary analytical review phase
of the audit opinion formulation process. The
main measure of performance used in this
study is the number of plausible hypotheses
generated. This measure was also used in the
Libby (1985) and Libby & Frederick (1990)
studies. Libby & Frederick (1990) suggested
that access to a greater number of plausible
potential errors decreases the probability that
auditors wiII miss the actual cause of an audit-
ing finding because the error is not available in
or retrievable from memory.
Generaily, individuals generate fairly limited
sets of hypotheses (Casey et al ., 1984; Heiman,
1990; Mehle, 1982; Mehle et al , 1981). For
auditors, this performance varies depending
on the experience (rank) of the individual
and how rare the error to be generated is
(Banner & Penn&ton, 1991). Review of the
group decision-making Literature indicates that
in several ways groups might perform better in
hypothesis generation. First, group decision
making allows for the pooling of information
REVIEW PROCESS IMPACT 347
(Casey et al., 1984). Complex tasks often
require the search for and evaluation of alter-
natives, and these tasks are likely to require
more information than is possessed by an indi-
vidual @amber & Byllnski, 1982). Groups gen-
erally should have a greater sum of knowledge
than individuals (Burton, 1987; Hare, 1976;
Stein, 1975; Taylor et cd., 1958; Yetton & Bott-
ger, 1982). Generally, research results show
that group performance would benefit from
the combination of abilities and knowledge,
and points of view of its members (Shaw,
1976). Thus, problems that require the utiliza-
tion of knowledge should give groups an advan-
tage over individuals. Even lf one member of the
group (e.g. the leader) has more knowledge
than other group members, the limited unique
knowledge of less informed individuals can
serve to till some gaps in knowledge (Burton,
1987; Maier, 1970). The group could pool par-
tial information from different group members
and generate new or improved ideas as a result
of the sharing of ideas (Casey et al ., 1984).
In this study, a form of group decision
scheme which is a distinctive feature of the
audit process, namely the review process, is
examined. Review involves another individual
examining the judgment of an individual audi-
tor, and hence may be regarded as a form of
group decision making/judgment. The review
process with discussion is a special form of
interacting group where the reviewer has the
final decision. The above discussion about
group structure and process factors would
therefore apply to the performance of review
groups with discussion.
For the review group without discussion, the
factors noted earlier, including pooling of infor-
mation and greater sum of knowledge, suggest
that the review process would improve perfor-
mance. In addition, the reviewers are given the
judgments of the individual which may serve as
cues in activating the retrieval of hypotheses
from the same accounting cycle. This was
found to be especially the case for atypical
cues (Libby & Frederick, 1990). Hence, in
terms of the number of plausible hypotheses,
both types of review groups may be expected
to generate a greater number of plausible
hypotheses compared with the individual
audit senior.’
Hla. Senior-reviewed groups wiU generate a greater
number of plausible hypotheses than individuals.
Hlb. Manager-reviewed groups wilI generate a greater
number of plausible hypotheses than individuals.
piscussion which encourages cognitive
interstimulation is likely to enhance perfor-
mance on a hypothesis generation task. Cogni-
tive interstimulation refers to the facilitation of
retrieval of ideas/hypotheses which would not
have been retrieved if it were not for the com-
ments of another member that may contain
task relevant stimuli (Lamm & Trommsdorff,
1973). Studies on hypothesis retrieval (e.g.
Elstein & Bordage, 1978) suggest that two
strategies may be used; retrieval from cues
and retrieval from hypotheses. Hypotheses
suggested by other group members or dis-
cussion of particular cues may serve to faclli-
tate the retrieval of other hypotheses or cues
(Casey et al , 1984; Fisher, 1987). Discussion
also allows for error correction. Error correc-
tion may be regarded as one type of group
process gain. Critical observations by another
group member can help the group to make a
correct choice by rejecting wrong answers.
Disagreements are also common in group deci-
sion making and can be useful regardless of
their accuracy (Wanous & Youtz, 1986). This
may be because it allows for the careful exam-
ination of the dissenting views leading to
higher quality solutions (Nemeth, 1986) and
diverse solutions. One would expect that
senior-reviewed groups and manager-reviewed
groups with discussion to outperform the cor-
responding review groups without discussion.
The following proposition is tested:
’ It shouId be noted that the predicted differences may not alI be strictly a group phenomenon. For example, if the same
individual reviewed his/her own work after a break and was told to work harder, the number of hypotheses generated
would also probably increase (cf. Heiman, 1990).
348 2. ISMAIL and K. T. TROTMAN
HZ. In a review situation, groups with discussion will
produce a greater number of plausible hypotheses
compared with groups without discussion.
Auditors of higher rank have been found to
have increased hypothesis generation capacity
(Libby & Frederick, 1990). These auditors, who
have greater experience and a more complete
knowledge structure, will have access to a
larger number of plausible potential errors to
explain a given audit finding. Thus the
reviewer of a higher rank can be expected to
add more non-redundant information to the
group’s pool of knowledge. However, it
should be noted that the experience difference
in Libby & Frederick was considerably greater
(1 compared with 5 years). While Bonner &
Lewis (1990) used seniors and managers and
found significant performance differences, it
should be noted that their task used a rare
error whereas the present study used a fre-
quent error. For our task, frequency know-
ledge would be more important and could
possibly be gained by the time an auditor
becomes a senior. However, the group litera-
ture on structure and resources (Bottger, 1981;
Einhom et al , 1977; Laughlin et al ., 1975;
Libby et al , 1987; Steiner, 1972; Trotman,
1985; Yetton & Bottger, 1982) indicates that
the experience and skills a group member
brings to bear in the group will influence the
group product. Therefore, it is suggested that
rank wilI affect the number of hypotheses
generated.
H3. In a review situation, manager-reviewed groups
will produce a gnz.ater number of plausible hypotheses
compared with senior-reviewed groups.
Additional analysis examines implausible
hypotheses generated, as their inclusion can
lead to reduced audit efficiency. Time taken
by subjects is also anaIysed under additional
analysis as time can be considered a measure
of the cost of review.
RESEARCH METHODS
Desi gn
Thi s study uses a 2 X 3 repeated measures
design with reviewer rank and discussion as
the independent variables. The first dependent
variable measures the rank of the reviewer.
Rank (and thus experience) is explored by hav-
ing two levels of experience for the reviewer,
one level involves review by a senior; the other
level, an assistant manager/manager.
The second independent variable manipulates
two different forms of the review process. The
reviewer may review the work-papers by dis-
cussing with the senior, or the review may entail
a review of the workpapers in the senior’s
absence, that is, review without discussion.
The review effect examined necessitates the
use of a three phase design. In phase 1, the
seniors make their individual judgments; in
phase 2, their work is reviewed in their pre-
sence; and in phase 3, review is conducted in
their absence. Hence, in phase 2, the reviewers
were asked to discuss with the senior the
latter’s judgment, and submit a reviewed judg-
ment; and in phase 3, a different group of
reviewers were told that a senior from their
Iirm had completed the case, and they were
to review the judgment in their absence. The
judgments reviewed in phase 3 were the same
judgments as used in phase 2. The design,
therefore, controls for individual differences
in the original judgments made by seniors as
the same individual judgment forms the basis
for review in the subsequent two phases.
In the present study, the original case infor-
mation in Libby & Frederick (1990) was mod-
ified as the instrument was to be administered
in Singapore. Basically the same information
was given in respect of company description
(except for changes in the names of personnel
of the company in question), company opera-
tions and two sets of three financial ratios,
based on the previous year’s audited financial
statements. The current year’s unaudited f’inan-
cial statements were not given so that subjects
would not be able to calculate other ratios to
narrow down the scope of possible errors. The
REVIEW PROCESS IMPACT 349
information given was therefore deliberately
inadequate to form a conclusive judgment as
to the correct explanation for the changes in
the second set of financial ratios, and allowed
subjects to generate a wide range of possible
errors. The industry information was changed
because the dollar figures, in the original case,
were United States dollars. These original
figures were multiplied by a factor of five to
be consistent with those of a large electronics
firm in Singapore.
Task
The task was a hypothesis generation task
which asked subjects to list financial statement
errors which led to the changes in the financial
ratios. To facilitate the interpretation of errors
listed, subjects were also asked to specify the
adjusting journal entries (accounts only, not dol-
lar amount) necessary to correct for each error.’
The reviewers in phase 2 were told that a
senior from the firm had performed the ana-
lytical review task and that they were to
review the senior’s work with the latter pre-
sent to allow for discussion. Similar instruc-
tions were given to the reviewers in phase 3,
except that they were told that the senior
would not be present at the time of the
review. The individual instructions for the
above tasks were modified for the reviewers
in phases 2 and 3. In respect of the hypothesis
generation task, the reviewers were first asked
to list the errors in the senior’s list which they
believed did not explain the change in the
financial ratios. In the encoding of data, these
errors were treated as deletions from the
original list of hypotheses. Second, they were
asked to list the errors in the senior’s list which
may have led to the change in llnancial ratios.
Lastly, the reviewers were asked to list all other
possible errors not included by the senior.
Subjects
The “Big 6” firms in Singapore were
approached and all agreed to participate in
the study. Firms were asked to provide sub
jects in groups of six consisting of four seniors
and two assistant managers/managers. All par-
ticipants were required to be familiar with the
task of analytical review, that is, they must have
participated in some stage of the analytical
review process. The minimum experience
level was therefore specified to be 2 years.
However, in some of the firms, seniors with
at least 1 year’s auditing experience, which
included performance of analytical review
work, participated where other more ex-
perienced staff members were not available.
The two most experienced seniors were allo
cated as reviewers. This was done to ensure
realism as it would be unrealistic to have a
senior of, say, 2 years’ experience reviewing
the work of someone with 3f years’ experi-
ence. The other two seniors were used as indi-
viduals prior to review. The manager-reviewers
were assistant managers and managers with at
least 4 years’ experience. They were allocated
to either the manager-reviewed group with dis-
cussion (phase 2) or manager-reviewed group
without discussion (phase 3).
In total, 22 teams were provided (one firm
provided five teams; three firms, four teams
each and the remaining two firms, three and
two teams each). An examination of the
debriefing questionnaire, however, revealed
that several of the subjects did not meet the
experience requirement. Hence, observations
of these subjects were excluded leaving 17
groups for the senior-reviewer treatments and
21 groups for the manager-reviewer treat-
ments. Mean audit experience of subjects in
the respective treatments were: individuals
prior to review by senior, 1.58 years; indivi-
duals prior to review by manager, I.99 years;
senior-reviewers, 3.35 years; senior-reviewers
without discussion, 3.53 years; manager-
reviewers, 6.22 years; manager-reviewers with-
out discussion, 6.43 years. The senior reviewers
had significantly greater experience than the
seniors subject to review (p C 0.05). While
’ Subjects were also asked to select the most likely error from their list. However, these data are not analysed in this paper
as there is not one correct answer here conditioned on the data presented.
350
2. ISMAIL and K. T. TROTMAN
this difference was necessary for realism it does
bias the results away from finding a significant
rank effect. This limitation is discussed in the
linal section of the paper.
Procedures
All experimental treatments were given the
purpose of the study, an explanation for the
conciseness of the case materials, and assur-
ance as to the contidentiality of their
responses. All subjects performed the experi-
ment at the office of their respective firm.
The audit seniors were told to work indivi-
dually, and that their judgments would be
reviewed by another member of the firm. The
fact that reviewee subjects were aware that
they would be reviewed is consistent with
practice. On completion of the tasks, subjects
were instructed not to discuss the case or tasks
with their other colleagues prior to the review
of their work. When each subject had com-
pleted the two tasks, the researcher drew a
line at the end of the list of hypotheses to the
bottom of the page to ensure that subjects did
not add any more hypotheses to their initial list
of hypotheses during the review of their work.
reviewers were asked to read the case informa-
tion, and were told that (a) they would be
reviewing a set of judgments made by an audit
senior from the same firm; (b) the review was
to be conducted with the senior present to
allow for clarification and discussion; and (c)
specific instructions in respect of the review
of each task would be given in the research
instrument itself. The senior-reviewers con-
ducted their review in separate rooms from
manager-reviewers so that they could not over-
hear the discussion of other groups as this
might intluence their own discussion and judg-
ments. The two sets of research instruments
were collected on completion of the review.
Before the start of phase 2 one of the
researchers took a photocopy of the individual
auditor’s list for the hypothesis generation task,
and returned the original research instrument
to the senior for subsequent review. Each copy
of the judgments was attached to the booklet
for the review without discussion treatment
(phase 3). This photocopy of the original judg-
ment also ensured that any changes made to
the list of hypotheses could be detected.
The review with no discussion was con-
ducted about 2 hours after the start of phase
1, although it could have been conducted at
the same time as phase 2. The choice of timing
was a matter of practicality. First, the researcher
had to deal with fewer experimental treatments
at any one time and, second, it allowed the
respective treatments to be physically sepa-
rated to allow for discussion in privacy. The
reviewers in phase 3 received the same instruc-
tions given to reviewers in phase 2 except that
they were told that the seniors would not be
present during the review.
Dependent variables and encoding of
data
The reviewers were asked to participate in
the second phase of the experiment which was
scheduled 1 hour after the start of phase 1 of
the experiment, to allow the seniors time to
complete the tasks before being reviewed.
The manager-reviewers and senior-reviewers
were briefed on the experiment and given
instructions pertaining to the review. The
The encoding of raw data was performed
independently by one of the researchers and a
former audit manager of a “Big 6” firm. There
were only a small number of disagreements
which were resolved by a second former senior
manager of a “Big 6” firm. The main analysis of
the data was based on Libby & Frederick’s
(1990) primary classification of errors into plau-
sible, implausible and high frequency errors.
The following criteria for plausibility were
used. Only errors which were unintentional
were considered as financial statement
errors3 Intentional errors, for example where
3 Coakley % Loebbecke (1985, p. 206) split errors into accounting errors (unintentional mistakes) and accounting hegu-
kuities (intentional distortions). This study concentrated on the former. However, the coders noted very few of the latter
and consequently the results of the study would not change if accounting irreghities were included.
RRVIRW PROCESS IMPACT
TABLE 1. Matchedpairs t-tests for munber of plausible hypotheses
n Mean Std Dev. t
351
l-tailed p
SR 17 5.6471
SRG 8.5880
SR 17 5.6471 3.181 -2.79 0.006
SRG.d
7.4118 2.874
MR 21 52857 2.390 -5.66 0.000
MRG 7.3333 2.763
MR 21 5.2857 2.390 -2.85 0.005
MRG‘.A 6.6667 3.071
3.181 -5.00 0.000
3.465
SR: individual before review by senior; MR individual before review by manqq SRG: senior-reviewed group with
discus&on; SRG,,,$ se&u-reviewed group with no dixussion; MRG: manger-reviewed group with discussion; MRGM:
manger-reviewed group with no discussion.
fraud is involved (that is where subjects used
words like fictitious or kiting) were treated as
“not an error”. Other criteria were whether (a)
the transaction cycle and audit objective
violakd were clearly stated; (b) the effect of
the error, given the changes in the set of
ratios, could be ascertained, and (c) the effects
were consistent with the changes in the ratios
given. The journal entries were asked for so
that the subjects would be specific about the
accounts, and so that the effect on the financial
ratios could be more easily ascertained. How-
ever, in cases where the subjects omitted to
give the journal entries (these instances were
few) but the description of the errors was
clear, the errors were classified as plausible if
all four criteria were satisfied. Following Libby
% Frederick (1990), plausible errors identified
were then categorized according to whether
they were high frequency (L-W) errors, that is
errors which were found to be among the six
most frequently occurring errors in medium-
sized manufacturing firms, as identified by the
Coakley & Loebbecke (1985) study.
RESULTS
Univariate matched-pairs t-tests were calcu-
lated comparing the number of plausible
hypotheses generated by individuals before
review with the corresponding reviewed judg-
ments by seniors and managers. The results of
the matched-pairs t-tests are summarized in
Table 1. All matched-pairs t-tests comparing
individual with senior-reviewed group, indivi-
dual with senior-reviewed group without dis-
cussion, individual with manager-reviewed
group and individual with manager-reviewed
group without discussion were signiticantly
different (f = -5.00, p < 0.000; t = -2.79,
p < 0.006; t = -5.66, p < 0.000; t = -2.85,
p < 0.005, respectively). The results therefore
support Hla and Hlb indicating that review
will improve individual performance as mea-
sured by the number of plausible hypotheses
generated.*
As noted earlier, the sources for the improve-
ment in performance as a result of the review
process were analysed using a 2 X 3 repeated
* Another dependent variable which has been used as a measure of perf ormance in Libby & Prederlck (1990) is the
number of high fquency hypotheses. IJsbq~ this dependent variable the results showed that the number of high
fnquenq hppothcses Increased with &ew. All comparisons between individu& prior to review by senior (SR) with
senlor+evIewed groups (SRG), and senior-reviewed groups without dbcussion (SRG& were signlflcant (t = -5.64, 2-
tailedp c O.tXK$ t = -2.38, ztlikdp c 0.030), as were comparkuuu between individuals prior to review by manager (MR)
with mullecr-rrviewed lpoulw (MRG), and
managerrevlewed groups without discussion (MRG,& (t = 3.25, 2-talledp <
0.004; t = -4.14, ZtpiLdp < 0.001). However, this analy& has lhuitatlons as the Coakley and Loebbecke data refers to
U.S. ‘medlum-slxd manut&nAng Iirms. No shuIIar arch&al data for large Slngporean tirms are available.
352 2. ISMAIL and K. T. TROTMAN
TABLE 2. Contrast tests of the rank and discussion effects
Source of varhuion
Between
Al
El TOC
Within
Bl
Al Bl
Rrror
B2
Al B2
Er r or
sum of squares df Mean squares
17.462 1 17.462
708.055 36 19.668
103.607 1 103.607
2.555 1 2.555
63.112 36 1.753
15.958 1 15.958
1.221 1 1.221
170.569 36 4.738
F
0.888
59.099’
1.457
3.368’
0.258
.
Al is the contrast which compares the number of plausible hypotheses generated by senIorreviewed groups to manager-
reviewed groups.
Bl is the contrast which compares the number of plausible hypotheses generated by all IndividuaI seniors prior to review
compared to those generated by aII review groups.
B2 is the contrast which compares the number of plausible hypotheses generated by all review with discussion groups
compared to ail review without discussion groups.
l Sign&ant at p < 0.001.
** Sign&ant at p < 0.10.
measures ANOVA with rank and type of review
as independent variables and number of plausi-
ble hypotheses as the dependent variable. Rank
is a measured variable being either senior or
manager. Type of review is a within subjects
variable with three levels: pre-review, review
with discussion, and review without discus
sion. Contrasts were developed (as shown in
Table 2) to test H2 and H3.
To test for a discussion effect (H2), contrast
B2 (see Table 2) compares the number of plau-
sible hypotheses developed by the review with
discussion groups to the number developed by
the review without discussion groups. To test
for a rank effect (I-I3), contrast Al (see Table 2)
compares the number of plausible hypotheses
developed by senior-reviewed groups with
manager-reviewed groups. In addition, con-
trast Bl compares the number of plausible
hypotheses generated pre-review compared
with those developed after review (regardless
of whether discussion occurred).
The results of these contrasts are shown in
Table 2. Contrast B2 shows a marginally sign&
cant effect for discussion (F = 3.368, p =
0.075). This discussion effect is not affected
by the rank of the auditors (contrast Al B2, F
= 0.258, ns). Contrast Al, which tests for a
rank effect, does not show a signikant effect
(F = 0.888, ns). Therefore H3, which hypothe-
sized a main effect for rank, is not supported.
Contrast Bl shows that the number of plausible
hypotheses generated after review is signi-
ficantly greater than generated pre-review (F
= 59.099, p S 0.001). This effect is not
affected by the rank of the reviewers (contrast
Al Bl, F = 1.457, ns). These results further
support HlA and HlB and are consistent with
the individual t-test in Table 1.5
ADDITIONAL ANALYSES
I mpl ausi bl e hypotheses
The inclusion of implausible hypotheses by
auditors may lead to a reduction in audit effi-
’ We ako examined the same contrasts for discussion and rank with high frequency hypotheses instead of plausible
hypotheses as the dependent variable. No sIgni6cant effects were found (p > 0.10). As noted in footnote 4, there are
limitations in anaIysing these high frequency data in the Singapore context.
REVIEW PROCESS IMPACT
353
ciency as time may need to be spent on follow-
ing up these items. In this study implausible
hypotheses were classilied as hypotheses
which were not consistent with all three finan-
cial ratios. The number of such hypotheses was
very small (< 0.30 implausible hypotheses on
average, for all treatments) and there were no
significant differences in the number of these
implausible hypotheses between treatments.6
Discussion effect
Earlier results indicated that when perfor-
mance was measured by plausible hypo-
theses, there was a marginally signiticant
discussion effect. To investigate this issue
further, a summary of the plausible hypotheses
deducted and added to the seniors’ list for each
of the review treatments was prepared (Table
3>.’ The table indicates that new hypotheses
were generated supporting the operation of
pooling of knowledge and/or of cognitive inter-
stimulation. However, all review groups also
eliminated plausible hypotheses. Separate 2 X
2 ANOVAs, with discussion and rank as inde-
pendent variables, and plausible hypotheses
generated and excluded as dependent vari-
ables, were run respectively. The results indi-
cated no sign&ant effects in terms of plausible
hypotheses generated for discussion (F = 0.29,
p = 0.593), rank (F = 1.84, p = 0.180) and
interaction effects (F = 0.01, p = 0.907). In
terms of plausible hypotheses deducted, how-
ever, there was a signiticant discussion main
effect (F = 9.11, p = 0.004) but no signiticant
main effect for rank (F = 0.03, p =0.864) and
interaction effects (F = 0.86, p = 0.357). This
suggests that the benefit of discussion results
from fewer plausible hypotheses being elimi-
nated from the hypothesis set.
TABLE 3. Summary of changes in mean number of
plausible hypotheses
Hypotheses Hypotheses
excluded generated
Senior-reviewed group - 0.24 3.18
discussion
Senior-reviewed group - 1.12 2.88
no discussion
Manager-reviewed group 0.48 2.52
- discussion
Manager-reviewed group 0.95 2.33
-
no discussion
TABLE 4. Mean (standard deviation) time taken to
complete the hypothesis generation task
Time taken in
Treatment hours
Individuals before review by senior 1.02 (0.363)
Senior-reviewed group - discussion
0.89 (0.377)
Senior-reviewed group - no discussion 0.80 (0.352)
Individuals prior to review by manager
0.93 (0.187)
Manager-reviewed group - discussion 0.68 (0.206)
Manager-reviewed group - no discussion 0.65 (0.200)
Rank effect
H3 predicted a rank effect but, as noted ear-
lier, this hypothesis was not supported. To
investigate this further, the time taken by
each treatment group was analysed. Table 4
provides a summary of the time taken by the
respective experimental treatments.
A 2 X 2 ANOVA, with discussion and rank as
independent variables, and time as the depen-
dent variable, was run for all review groups.
The results indicated no sign&ant discussion
main effect (F = 0.64, p = 0.426) or interaction
effect (F = 0.24, p = O.627), but there was a
significant rank main effect (F = 7.13, p =
0.009). Senior-reviewed groups spent more
time on the task than manager-reviewed
groups.
O Following Libby & Frederick (1990) a broader deflnitlon of implausible hypotheses was also considered which included
errors that were unlnterpretable by the coders, wrong direction errors and duplicate errors. Again, if number of lmpllu
sible hypotheses were used as the dependent variable in hypotheses l-3, none of the hypotheses were supported.
’ It should be noted that the numbers in Table 3 can be used to reconcile the differences between the pre-review and post-
tevlew results in Table 1. For example, 5.65 + 3.18 - 0.24 = 8.59, where SR = 5.65 and SRG = 8.59 in Table 1.
354 2. IShlAIL and K. T. TROThlAN
DISCUSSION AND CONCLUSION
This paper examines the effectiveness of the
review process on a hypothesis generation task
and some possible sources of gain from using
the review process. The first main finding was
that the review process increased the number
of plausible hypotheses generated. All four
forms of review groups added on average two
or three additional hypotheses. The results indi-
cate that regardless of whether review was by a
more experienced group member, or whether
discussion was allowed, the review process
added to the number of plausible hypotheses.
The implication of increasing the number of
plausible hypotheses considered, is that the
likelihood of the correct hypothesis being con-
sidered is increased as well (Johnson et al.,
1991; Libby & Frederick, 1990). T%vo possible
explanations for hypotheses added are pooling
of information and/or cognitive interstimula-
tion. Cognitive interstimulation can occur
regardless of the mode of communication, dis-
cussion or through written notes (see Casey et
aZ., 1984). The results are also consistent with
the Libby (1985) and Libby & Frederick (1990)
studies where prompts or cues inherited by
auditors influenced the generation of other
hypotheses, though no discussion was in-
volved. Libby & Frederick (1990) stated that
auditors often inherit an initial explanation for
an audit fmding from the workpapers or from
others including client personnel and co-
workers. They suggested that this starting
point should affect the other alternatives con-
sidered because accessing a particular error in
memory decreases the processing necessary to
activate another error closely linked to it in the
knowledge store.
This finding extends the findings of previous
studies on the review process using other tasks
and other measures of performance. Results of
these previous studies have shown an improve-
ment in performance, as measured by consensus
on evaluation of internal controls Ofotman &
Yetton, 1985) and accuracy of the estimate of
dollar error of an inventory system (Trotman,
1985), by interacting groups and audit teams.
The present paper shows that the benefits of
the review process also apply to unstructured
tasks, such as generation of hypotheses, as
part of analytical review procedures.
Group discussion was one of the possible
sources of gain considered. Group discussion
increased the number of plausible hypo-
theses. All review groups with discussion had
a larger number of plausible hypotheses com-
pared with review groups with no discussion.
To investigate this further, the breakdown of
plausible hypotheses deleted and added to
the audit senior’s initial set of judgments was
examined. The breakdown of the plausible
hypotheses indicated that there was no differ-
ence in the number of plausible hypotheses
added within experience levels regardless of
whether there was discussion or not. How-
ever, the analysis on hypotheses deleted
showed that the difference in number of plau-
sible hypotheses after review between the
groups with and without discussion could be
attributed to fewer plausible hypotheses being
excluded from the individual auditors’ initial
sets of hypotheses in the discussion treat-
ment. These results suggest that an important
advantage of discussion in the review process
is that it reduces the likelihood of plausible
hypotheses being eliminated. Possible explana-
tions for these results include (a) audit seniors
were able to support or explain their judg-
ments when discussion was allowed and (b)
the reviewers may not have deleted as many
hypotheses under the discussion treatment to
avoid embarrassing the reviewees.
Rank, which was the other source of gain
considered, did not account for differences
between senior-reviewed groups and manager-
reviewed groups in terms of the number of
plausible hypotheses generated. The results
for rank as an explanatory factor for perfor-
mance are not consistent with the general find-
ings in the literature. Libby & Frederick (1990)
found that the more experienced auditors
(more than 4 years’ audit experience, about 6
years in the present study) tended to generate
more plausible and more frequently occurring
errors than the less experienced staff auditors
REVIEW PROCESS IMPACT 355
(about 1 years’ experience, 3 in the present
study). However, Libby and Frederick looked
at the individual judgments of novices and
experienced subjects, and did not examine
the review process. One reason for the results
of the present study could be that the senior-
reviewers had acquired sufficient experience
(mean experience = 3.44 years) to generate
plausible hypotheses. The results did in fact
show them to perform well in generating on
average about five plausible hypotheses.
Another explanation could be that the differ-
ence in experience between senior-reviewers
and manager-reviewers may not be wide enough
to show any differences in performance. It was
necessary for realism to assign the more experi-
enced seniors as reviewers and the less experi-
enced seniors as reviewees. This may have
biased the results away from finding a signifi-
cant rank effect. The senior reviewers had
about 34 years’ experience and, if little fman-
cial statement error knowledge is gained after
this period, any differences between senior-
and manager-reviewers would be small. Consis
tent with this explanation, Bonner & Lewis
(1990) found that there was no difference in
auditors’ knowledge about analytical pro-
cedures between seniors (average experience
= 39 months) and managers (average experi-
ence = 95 months). Additionally, there may be
a “ceiling effect”, that is a maximum number
of hypotheses above which even groups can-
not generate additional hypotheses without
getting more information or evidence. Among
experienced physicians, Elstein et al . (1978)
fourid that there is a limit to the number of
hypotheses generated by experienced physi-
cians that is unrelated to their knowledge.
They suggested that the long-term store of
medical knowledge of a physician with a
reasonable amount of clinical experience is
substantially larger than the number of hype
theses that can be evaluated simultaneously.
In the auditing context, it is realistic to expect
such a “ceiling effect”. Efficiency considera-
tions would result in a few of the more
frequently occurring and/or more likely hype
theses being considered initially, evidence
gathered, and the hypotheses further refined
or additional hypotheses generated in the light
of new evidence found.
The results of the additional analysis on time
taken to complete the tasks showed that hold-
ing discussion constant, senior-reviewers took
more time to review the individual audit
senior’s work than manager-reviewers. The
extra time taken may have allowed them to
develop more hypotheses. The results also indi-
cate that the manager-reviewed groups were
more efficient than the senior-reviewed groups
as’ they took less time to generate about the
same number of plausible hypotheses.
The study also found that individuals pro
duced a limited number of hypotheses. On
average the individual seniors generated 5.4
plausible hypotheses. This is consistent with
the performance of Libby & Frederick’s
(1990) subjects who generated 4.1 plausible
hypotheses on the same task. The results are
also consistent with the conclusions in other
fields of research. For example, the size of
the hypothesis set explored at any point in
time is usually around four or five hypotheses
with an upper bound of six or seven (Elstein et
al ., 1978). In psychological studies (e.g. Mehle,
1982) on average, subjects produced 3.4
plausible hypotheses. The total number of dis-
tinct hypotheses generated by all subjects in
the present study was 58 hypotheses (17.8
hypotheses in Mehle, 1982). On average the
individual seniors produced about 10% of the
total pooled hypotheses generated by all sub
jects. Hence, this Iinding that individuals pro
duce limited sets of hypotheses is consistent
with the general conclusion in the psycho
logical and medical literature (Elstein et al ,
1978; Mehle et al., 1981; Wason & Johnson-
Iaird, 1972).
In conclusion, this study has found one
source of gain from the review process being
the presence of discussion between the
reviewer and reviewee. Obviously there are
other sources of gain given that the review
groups without discussion outperformed indi-
viduals prior to review. One area of future
research is to document these other sources
356 2. ISMAIL and K. T. TROTMAN
of gain. A second area is to further investigate various types of process gains. This research
the dynamics of the discussion process. For would be of major benefit to audit firms in
example, what elements of discussion lead to how they structure their audit groups.
BIBLIOGRAPHY
AustraIian Society of Accountants, and the Institute of Chartered Accountants in Australia, AUP 13 -
Contmi of the Quality of Audft Work (J anuary 1993).
Bamber, E. M., Bamber, L. S. & Bylinski, J. H., A Descriptive Study of Audit Managers’ Working Paper
Review, Audfting: a J ournal of Practice and Theory (1988) pp. 137-149.
F&amber, E. M. & ByIinski, J. H., The Audit Team and the Audit Review Process: an OrganizationaI
Approach, J ournal of Accountfng Literature (1982) pp. 35-58.
Bedard, J. C. % Biggs, S. F., Pattern Recognition, Hypotheses Generation, and Auditor Performance in an
Analytical Task, Ibe Acwunttng Review (1991) pp. 622-642.
Banner, S. E. & Lewis, B. L., Determinan ts of Auditor Fxpertise, J ournal of Accounting Research
(Supplement 1990).
Bonner, S. E. % Pennington, N., Cognitive Processes and Knowledge as Determinants of Auditor
Expertise, J oumaZ of Accounting Literature (1991).
Bottger, P. C., Group Composition and Performance: Four Studies of the Relationship Amongst Member
AbiIity, Group Process, Decision Scheme and Effectiveness, Ph.D. dissertation, Australian Graduate
School of Management, University of New South Wales (1981).
Burton, G. E., The “Clustering Effect”: an Ideagenetation Phenomenon During Nominal Group, Small
Group Bebaufour (1987) pp. 224-238.
Casey, J. T., Gettys, C. F., PIiske, R. M. & Mehle, T., A Partition of SmaIl Group Performance into
Information and Social Components, Organlzatfonal Bebavtour and Human Performance (1984)
pp. 112-139.
CoakIey, J. R. & Loebbecke, J. K., The Fxpectation of Accounting Errors in Medium-sized Manufacturing
Firms, Advances in Accounting (1985) pp. 199-245.
Cohen, J. % Kida, T., The Impact of Analytical Review Results, Internal Control Reliability, and
Experience on Auditor’s Use of Analytical Review, J ournal of Accounting Reseamb (1989) pp. 263-276.
Einhom, H. J., Hogarth, R. M. & Klempner, E., QuaIity of Group Judgment, Psycbologicul Bulletin (1977)
pp, 158-172.
Elstein, A. S. % Bordage, G., The Psychology of Clinical Reasoning: Current Research Approaches, in
Stone, G., Cohen, F. % Adler, N. (Eds), Hedtb Psychology, (East Lansing, Mich@n: MicbQan State
University, 1978).
Flstein, A. S., Shuhnan, L. S. & Spratlot, S. A., Medical Problem Solving. An Analysis of Clinical
Reasonfng (Cambridge, Massachusetts: Harvard University Press, 1978).
Fisher, S. D., Cue Selection in Hypothesis Generation: Reading Habits, Consistency Checking, and
Diagnostic Scanning, Organizational Bebaoior and Human &c&ion Processes (1987) pp. 170-192.
Hare, A. P., Handbook of Small Group Resea*cb, 2nd Fdn (Collier: Massachusetts Free Press, 1976).
Heiman, V. B., Auditor’s Assessment of the Strength of Analytical Review Explanations, l& Accounting
Review (1990) pp. 875-890.
Johnson, P. E., JamaI, K. & Berryman, R. G., EfYects of Framing on Auditor Decisions, Organizational
Bebavior and Human Decision Process (1991) pp. 75-105.
Kida, T., The Effect of Causality and Specificity on Data Use, J ournal of Accounting Research (1984) pp.
145-152.
Iamm, H. & Trommsdorff, G., Group Versus Individual Performance on Tasks Requiring Ideational
Proficiency (Brainstorming): a Review, European J ournal of Social Psychology (1973) pp. 361-388.
Laughlin, P. R., Kerr, N. L., Davis, J. H., Halff, H. M. & Marciniak, K. A., Group Size, Member Ability, and
Social Decision Schemes on an InteUective Task, J ournal of Personalfty and Social Pqnzbology (1975)
pp. 522-535.
Libby, R., AvaiIabiIity and the Generation of Hypotheses in AnaIyticaI Review, J ournal of Accounting
Reseamb (1985) pp. 648-667.
REVIEW PROCESS IMPACT 357
Libby, R. & Frederick, D. M., Experience and the Ability to Explain Audit Fmdings,JournaC ofAccounting
Research (1990) pp. 348-367.
Libby, R. & Trotman, K. T., The Review Process as a Contml for LMferential RecaII of Evidence in Auditor
Judgments, Working paper, Johnson Graduate School of Management, Cornell University, Ithaca, New
York (1991).
Libby, R., Trotman, K. T. & Zimmer, I., Member Variation, Recognition of Expertise, and Group Perfor-
mance, J ournal of Applied Psychology (1987) pp. 81-87.
Maier, N. R. F., Problem Solving and Creatfvi@ in I ndividuals and Groups (California: Brooks/Cole,
1970).
MehIe, T., Hypothesis Generation in an Automobile MaIfunction Inference Task, Acta Pq&ologfca
(1982) pp. 87-106.
MebIe, T., Gettys, C. F., Manning, C., Baca, S. & Fisher, S., The Availability Explanation of Excessive
Plausibility Assessments, Acta Psycboiogica (1981) pp. 127-140.
Mock, T. J. & Turner, J. L., I nternal Accounting Control Evaluation and Auditor J udgment, Auditing
Research Monograph No. 3 (New York AICPA, 1981).
Nemeth, C. J., DifferentiaI Contributions of Majority vs Minority Influence, Psycbologkal Review (1986)
pp. 23-32.
Ramsay, R. J., Senior/Manager Difference in Audit Workpaper Review Performance, J ournal ofdccount-
fng Research (1994) pp. 127-135.
Shaw, M. E., Group Dynamics: tbe Psychology of Small Group Bebavfor (New York: McGraw-Hi&
1976).
Solomon, I., Multi-auditor Judgment/Decision Making Research, J ournal ofAccounting Literature (1987)
pp. 1-25.
Stein, M. K., Stfmukating Creativity (New York: Academic Press, 1975).
Steiner, I. D., Group Process and Productivity (New York Academic Press, 1972).
Taylor, D. W., Berry, P. C. % Block, C. H., Does Group Participation when using Brainstorming Facilitate
or Inhibit Creative Think@?, Adminktrative &fence Quarterly (1958) pp. 23-47.
Trotman, K. T., The Review Process and the Accuracy of Auditor Judgments, J ournal of Accountfng
Research (1985) pp. 740-752.
Trotman, K. T. % Sng, J., The Effect of Hypothesis Framing, Prior Expectations and Cue Diagnosticity on
Auditors’ Information Choice, Accounting, Organizations and Socfety (1989) pp. 565-576.
Trotman, K. T. & Yetton, P. W., The meet of the Review Process on Auditor Judgments, J ournal of
Accounting Research (1985) pp. 256-267.
Wanous, J. P. % Youtz, M. A., Solution Diversity and the Quality of Group Decisions, Academy of
Management J ournal (1986) pp. 149-159.
Wason, P. C. &Johnson-L&d, P. N., Psycbologv of Reasoning: Struzture and Content (London: D. T.
Batsford, 1972).
Yetton, P. W. t Bottger, P. C., Individual Versus Group Problem SoIving: an Empirical Test of a Best
Member Strategy, Organizational Behavior and Human Per$ownance (1982) pp. 307-321.

doc_111596383.pdf
 

Attachments

Back
Top