Essay versus multiple-choice: student preferences and the underlying rationale with implications for test construction.
Parmenter, David A.
INTRODUCTION
Many universities are experiencing budget cuts that result in
larger classes. The need to teach these larger classes without suffering
an excessive reduction in the amount of time available for research and
service-related endeavors exerts pressure on faculty to make increasing
use of multiple-choice questions on quizzes and exams. Although
multiple-choice questions provide the obvious benefit of being easier to
grade, their use is in conflict with the intuitive feelings of many
faculty that multiple-choice questions are inferior to essays and other
open-ended forms of assessment both as a measure and as a promoter of
student learning.
Previous studies have demonstrated that the use of assessments that
test higher-order thinking can encourage students to pursue study
strategies that develop a deeper level of understanding rather than
utilizing strategies that achieve little more than test-related recall.
Previous work has also demonstrated that the use of these deeper
learning strategies tends to lead to greater learning, at least for
those learning tasks requiring higher-order thinking. This paper will
review those results, relating them to the choice between
multiple-choice and essay, and also discuss various other strengths and
weaknesses of the essay and multiple-choice format. The paper's
major purpose is to extend the knowledge of student assessment
preferences by investigating the rationale behind those preferences. It
will report the results of a survey questionnaire in which undergraduate
business students in a junior level Principles of Management course and
a capstone Strategic Management course specified not only their
preferences for multiple-choice or essay but also the reasons for those
preferences. The patterns that result will demonstrate two main factors
driving student opinions, one factor focused on the ease of obtaining a
high test score (with preferences for multiple-choice) and the other
focused on the fairness and validity of the assessment measure (with
preferences for essay). These results, in conjunction with various ideas
drawn from the literature, will be used to generate examrelated
pedagogical recommendations.
ESSAY VERSUS MULTIPLE-CHOICE AS A MEAURE OF LEARNING
Because the primary goal of an exam is to accurately assess student
learning, one of the key issues to consider is whether or not open-ended
and multiple-choice questions differ in terms of reliability and
validity. The literature tends to favor multiple-choice. For example,
Bridgeman (1992) suggested that although multiple-choice is less
reliable on a question-by-question basis due to guessing, the fact that
multiple-choice questions take less time to answer (and grade) would
allow an exam made up entirely of multiple-choice to contain more
questions and therefore be more reliable than an exam containing fewer
open-ended questions. Hassmen and Hunt (1994) agreed but also suggested
that guessing shouldn't be discouraged because guesses are
generally based at least partially on student knowledge of the relevant
course content. A weakness of essay questions cited by many authors is
that they require subjective grading, with even factors that are
unrelated to answer content shown to have an impact on exam scores. For
example, Powers, Fowles, Farnum and Ramsey (1994) found that handwritten answers tended to be graded more leniently than identical answers
written using a word processor. The authors suggested that graders may
be more forgiving of handwritten answers because poor penmanship can
disguise spelling and grammar errors, because the scratch-outs that
appear in handwritten answers demonstrate the students' efforts at
improvement and because word processed answers tend to fill less space
on the page and therefore may appear to be less complete.
A related measurement issue concerns the question of whether or not
essays and other forms of open-ended questions truly assess different
dimensions of learning than are assessed by multiple-choice. Many
studies have found that student scores on open-ended questions were so
closely related to their scores on multiple-choice as to suggest that
both types of questions were measuring the same things (Bridgeman, 1992;
Lukhele, Thissen & Wainer, 1994; Walstad & Becker, 1994),
suggesting that the difficult-to-administer open-ended questions might
not be worth the extra effort because multiple-choice alone could be
used to assess the same learning. Lukhele et al. noted further that
essays became particularly unhelpful as a measure of learning if
students were allowed to select the questions to answer from a longer
list of questions provided, allowing the students to avoid topics they
hadn't learned. In contrast, Thissen, Wainer and Wang (1994) and
Harris and Kerby (1997) found that essay scores and multiple-choice
scores, although significantly correlated, were not so closely related
as to be identical. Even more convincingly, Becker and Johnston (1999)
applied two-stage least squares, a more sophisticated statistical
procedure than was used in most previous studies of this topic, and
found that essay and multiple-choice scores were not significant
predictors of each other, suggesting that they were clearly measuring
different dimensions of knowledge.
Various authors have provided potential explanations for the high
correlations often seen between multiple-choice and essay scores,
implying that this should not be interpreted as a suggestion that both
types of questions measure the same learning. Driessen and van der
Leuten (2000) suggested that student reasoning is so difficult to grade
that the grading process for essay answers may unintentionally
deteriorate into an exercise in counting the number of facts provided in
the answer, thereby turning even an essay question into a
recall-oriented question. Because many students recognize that quantity
is often rewarded and therefore attempt to provide it in their answers,
timed exams may not give students the opportunity to provide both the
quantity-oriented list of facts and the more thoughtful organization,
integration, and analysis of those facts that would demonstrate higher
order thinking (Minbashian, Huon & Bird, 2004). Students who are
fearful of failure are particularly prone to this behavior of attempting
to provide quantity (Diseth & Martinsen, 2003).
Despite the mixed evidence for the existence of a meaningful
difference between what is measured by multiple-choice questions and
what is measured by open-ended questions, many authors suggest that
open-ended questions are more effective at assessing higher order
thinking in contrast to the recall-oriented focus often seen in
multiple-choice (Bridgeman, 1992; Scouller, 1998; Scouller &
Prosser, 1994; Walstad & Becker, 1994). For example, essay answers
can demonstrate student thought processes and creativity in a way that
cannot be achieved using multiple-choice (Walstad & Becker, 1994).
Students may hold these same perceptions of multiple-choice questions as
being targeted at lower level recall-oriented learning (Scouller, 1998),
even for multiple-choice questions that have been specifically designed
to test higher level learning (Scouller & Prosser, 1994). Faculty
should pay full attention to student perceptions of assessment methods
because those perceptions may influence student study strategies, as
will be discussed further below.
ESSAY VERSUS MULTIPLE-CHOICE AS A PROMOTER OF LEARNING
Student perceptions of what learning really is and the study
strategies used to achieve that learning are conventionally classified
into two categories, surface and deep. Surface-oriented strategies focus
on course content as a set of disconnected ideas that are to be accepted
passively from the instructor or text. The primary learning goal is to
memorize facts and subsequently reproduce them on an exam (Entwistle
& Entwistle, 1992; Minbashian et al., 2004). To a certain extent,
surface strategies could be viewed as simply the lack of a strategy. As
shown by Scouller and Prosser (1994), surface-oriented students tend not
to be self- reflective about their studying processes and may not really
be able to appreciate the difference between the recall of facts and a
true understanding of the material. The surface orientation can apply to
the instructor as well as to the students. A surface-oriented
instructor, operating under the assumption that students bring little
expertise of their own to the class, would tend to emphasize lecture
with the view that his primary task is to impart information.
In contrast, deep learning strategies focus on achieving true
understanding rather than simply preparing for the test. This would
involve more self-directed learning with less focus on the teacher being
in charge, increased collaboration between students, integration of
ideas, critical evaluation of the logic and evidence that support each
new concept, inference of abstract principles from examples, and careful
monitoring of one's own understanding accompanied by efforts to
clear up any misunderstandings
as they arise (Chi, Bassok, Lewis, Reimann & Glaser, 1989;
Minbashian et al., 2004). Students report that the sense of being able
to integrate the various ideas into one's own structure and
recognize the patterns that result is a key component of their
understanding (Entwistle & Entwistle, 1992). Students using deep
strategies tend to have a feeling of being in control of their own
learning as well as a sense of excitement. Because these students, in
essence, construct their own knowledge through their strategic efforts,
deep learning is often referred to as constructivist. A deep-oriented
instructor would deemphasize lecturing and instead encourage questions
and discussion, present multiple perspectives rather than portraying
ideas as completely true or completely false, provide time and
opportunities for student collaboration, relate course content to
professional practice, provide frequent feedback and encourage students
to monitor their own understanding.
Any given student would generally apply a mix of surface and deep
strategies, with most students having a bias toward one or another and
some being adept at adjusting their choice of strategy in reaction to
the characteristics of a particular class or assignment. Some students,
unfortunately, apply little of either type, seeming unwilling or unable
to evaluate their own largely strategy-free behavior or to develop the
study skills necessary to become more strategic. Many authors (Cassady,
2004; Hagedorn , Sagher & Siadat, 2000; Taylor & Hyde, 2000)
point out the importance of teaching students how to study effectively
in addition to teaching course content. In general, students tend to
gravitate toward taking more control over their own learning and
adopting more sophisticated study habits as they age and progress
through college, graduate school and working careers (Baxter Magolda,
2004; Gijbels, 2005; Richardson; 1995). The conventional assumption that
older students returning to school after significant work experience
will have poor study practices is not necessarily correct.
Although the results in the literature are somewhat mixed,
generally students emphasizing deep strategies and deemphasizing surface
strategies have been found to learn more effectively and achieve better
performance on assignments involving higher order learning, although
their performance may not be better on lower order activities (Gijbels,
2005; Gravoso, Pasa and Mori, 2002; Minbashian et al., 2004; Taylor
& Hyde, 2000). As reported by Scouller and Prosser (1994), the use
of deep strategies has also been found to lead to greater student
satisfaction. Thus it would seem worthwhile for instructors to behave in
such a way as to encourage students to adopt deep learning strategies.
Trigwell, Prosser and Waterhouse (1999) found that instructors who use
surface-oriented teaching methods encourage a similar surface approach
from the students and also found a similar although weaker relationship
for the deep orientation. A particular danger is that instructors
utilizing surface-oriented teaching methods may encourage a surface
approach not only in their own classes but also in subsequent classes,
making it harder for instructors in those later more advanced classes to
prompt the adoption of deeper learning strategies (Raimondo, Esposito
& Gershenberg, 1990).
A key question given this paper's contrast of multiple-choice
and essay questions, is whether or not the type of exam questions
utilized in a course will impact student learning strategies. Scouller
(1998) found that students believe different abilities are being
assessed by multiple-choice and essay questions, with essays being
viewed as testing higher level thinking than multiple-choice. These
student perceptions were found to translate into action as students
applied more deep strategies and fewer surface strategies when preparing
for essays. Although not specifically comparing multiple-choice and
essay, Entwistle and Entwistle (1992) similarly found that
students' study strategies were impacted by the level of questions
they expected to see on the exam, with narrowly focused questions that
stressed recall rather than the integration of ideas encouraging the use
of surface strategies. Even students who had applied deep strategies
throughout the semester and had achieved a high level of understanding
felt pressured to apply surface-oriented memorization in the last days
leading up to the exam. However, there is also evidence suggesting that
students with a general preference toward deep strategies will continue
to use deep strategies even if the exam itself does not encourage that
(Entwistle & Entwistle, 1992; Scouller & Prosser, 1994). An
overly heavy workload may undermine efforts to encourage deep
strategies, possibly due to students believing that recall-focused
surface strategies will enable them to quickly gather enough information
to pass an exam that they do not have enough to time prepare for more
carefully (Taylor & Hyde, 2000). Similarly, an exam on which
achieving a high score is particularly critical, for example a single
exam on which an entire semester's grade will be based or a highly
competitive exam for admission into an exclusive graduate program, may
encourage a "beat the test" mentality and therefore foster an
increased reliance on surface strategies (Diseth & Martinsen, 2003).
Overall, the evidence suggests that the use of exams focused on
higher-order learning is an important component of any teaching effort
intended to encourage greater student adoption of deep strategies. Most
authors view open-ended questions, including essays, as more capable of
achieving a higher-order focus. Gijbels (2005), for example, suggests
that even when worded in such a way as to test higher-order thinking,
multiple-choice will be approached by students as a task calling for
surface-oriented strategies. Nevertheless, there are some
counterarguments. Wainer and Thissen (1993) suggest that skilled test
writers can create multiple-choice questions that effectively test
higher-order thinking and that the obvious efficiency advantages of
multiple-choice should promote greater faculty efforts to develop that
skill. Suskie (2004) agrees that multiple-choice questions can be
written to test some, although not all, of the higher-order thinking
skills from Bloom's Taxonomy. Similarly, but from the opposite
perspective, Raimondo et al. (1990) advise that not all essay questions
are necessarily higher-order as many simply call for recall-oriented
answers.
ADDITIONAL CONSIDERATIONS
There are several additional advantages and disadvantages that
should be mentioned briefly to complete the discussion. A key
disadvantage of multiple-choice questions, especially in light of the
fact that grading efficiency is one of their key benefits, is that
multiple-choice questions tend to be more difficult to write, thus
negating some of the time savings achieved during the grading process.
One danger of this problem is that it tends to be more difficult to
write multiple-choice questions designed to test higher order thinking
than those that test recall, which can result in exams that are more
recall-oriented than the instructors might have actually intended
(Suskie, 2004). The writing difficulty can also encourage faculty to
protect their questions for use in future semesters by not handing back
the graded exams or by handing them back only long enough for the
students to see the scores (Roediger, 2005), a practice would not only
deprive the students of valuable feedback but might also tend to
encourage a belief among the students that the test score is the
important thing, not the learning. In fairness, it should be noted that
the ease with which multiple-choice can be graded can allow very quick
feedback in the cases in which the instructor is willing to hand back
and review the exam (Becker and Johnston, 1999).
Another disadvantage of multiple-choice is that test-takers are
exposed to numerous incorrect answers, many of which may be constructed
so as to appear to be correct. Roediger (2005) found that students
tended to remember these incorrect lures as being correct when queried
about them later, suggesting that students actually learn the wrong
things as part of the testing process. A related disadvantage is that
students receive corrective feedback whenever their own answer does not
appear as one of the available alternatives, a prompt to reconsider the
question and correct their mistake that would not be present in an
open-ended assessment (Bridgeman, 1992). Some students react to the
availability of the possible answers by working backwards to answer the
question, particularly on quantitative problems. Bridgeman (1992) found
that 81% of the students reported working backwards to solve problems, a
problem-solving methodology that would not normally be appropriate when
solving realistic problems in the field.
The need to minimize the impact of this corrective feedback
encourages faculty to create incorrect answers that match common student
errors or that are worded in such a way as to appear to be correct. The
problem with this tactic, however, is that those apparently reasonable
but incorrect answers are what the students may most clearly remember
later as truly being correct. Creating incorrect answers that match
common student mistakes, although potentially problematic as discussed
above, can also be viewed as a benefit of multiple-choice in that this
method allows an instructor to quickly diagnose the various
misconceptions that appear to be occurring most frequently in a given
semester.
An additional disadvantage of multiple-choice questions is that
they can be gender-biased. As reported by Hassmen and Hunt (1994),
numerous studies have shown that men tend to have an advantage on
multiple-choice. Including essay questions on an exam can help to reduce
this bias. A major weakness of essay questions is that they are
relatively time-consuming to answer, meaning that a timed exam can only
include a small number of questions (Walstad & Becker, 1994). This
can result in an exam that is unable to evaluate student learning over
all of the topics covered in that portion of the course, a problem that
can penalize some students and reward others depending upon which topics
go untested. This problem of failing to test a significant portion of
the course content is exacerbated (but tends to become more favorable for the students) if students are allowed to select the questions that
they prefer to answer from a longer list of questions provided.
A final advantage of essay, probably more accurately characterized as a disadvantage of multiple-choice, is that essay or other open-ended
questions can more appropriately drive the curriculum and instructional
activities. While it would be inappropriately simplistic to view
teaching as an exercise in preparing students to pass exams, most
faculty would be somewhat accepting of the idea that one of
teaching's main goals is to enable students to correctly complete
relevant problem-solving exercises and broadly focused integrative
essays. Very few, in contrast, would view teaching toward the test to be
valid if that test were made up of only multiple-choice questions
(Harris & Kerby, 1997).
METHODOLOGY
The intent of this study is to evaluate student preferences for
multiple-choice or essay questions and, in particular, to extend that
evaluation to investigate the rationale behind those preferences and the
extent to which those preferences may change depending on whether the
student feels himself to be well prepared or poorly prepared for an
upcoming exam. Numerous studies, some cited above, have investigated the
relationships between teaching methods, student study strategies and
student performance. Various studies have also evaluated student
preferences for multiple-choice versus open-ended questions. However,
little work has been done to determine student rationales for having
given preferences. The data on student preference as a function of
preparedness will implicitly suggest which type of question students
believe requires greater preparation and may provide some insight
concerning the type of exam questions that instructors should use to
prompt greater study effort.
Data was gathered using a survey instrument that was completed by
81 undergraduate students enrolled in either the junior-level Principles
of Management course or the senior-level Strategic Management capstone
course. Because both courses are required for all business students in
the program, the sample included a mix of the usual business school
majors, i.e. accounting, finance, economics, management, marketing and
so forth. Seven of the respondents were pursuing majors other than
business.
The first section of the survey, for which students were instructed
to reflect upon their entire college experience and not just the course
in which they were completing the survey, required the students to
respond to eleven statements and two questions. Of the eleven
statements, two concerned general preferences for multiple-choice or
essay questions and nine concerned the rationale behind those
preferences. The students responded to each statement by selecting
"Disagree Strongly," "Disagree,"
"Neutral," "Agree" or "Agree Strongly."
The nine available reasons for preferring one type of exam over another
were based on discussion with fellow business faculty and extensive
anecdotal evidence concerning student opinions about testing methods.
These nine statements allowed the expression of opinions such as
multiple-choice questions being unfair due to the inability to earn
partial credit and multiple-choice being easier because the student
doesn't need to know the topic thoroughly but rather just enough to
recognize the right answer. More detail on the nine statements will be
provided during the discussion of the survey results.
The first section of the survey ended with two questions focused on
student preferences for multiple-choice or essay questions as a function
of how well prepared the students were for a hypothetical exam. One
question asked for the preferred type of exam question if "you are
very well prepared for an exam, knowing the subjects involved backwards
and forwards." The second asked for the preference if "you
haven't been able to study as much as you would have liked and
aren't really very well prepared for an exam." In order to
provide the students with the option of selecting a middle ground rather
than forcing them to either extreme, the students could respond by
selecting an exam composed entirely of multiple-choice questions, an
exam composed entirely of essay questions, or an exam containing a mix
of both types of questions.
The second section of the survey contained a series of statements
and questions that were intended to apply only to the class in which the
students were currently enrolled. Although this section was designed
primarily to obtain feedback on certain class-specific teaching
strategies and not as part of this study, some of the results are
relevant and will be included in the discussion of the survey results.
The third and final section of the survey asked for demographic
information such as major and grade point average. The survey concluded
with an open-ended question asking for comments in order to solicit
input concerning any preference-related rationales that might have been
omitted from the nine statements in the first section. The data were
analyzed using SPSS.
RESULTS AND DISCUSSION
For purposes of analysis the responses to the 11 statements from
the first section of the survey were coded using 1 for "Disagree
Strongly" through 5 for "Agree Strongly" such that values
of 1 or 2 signified disagreement with the statement, a value of three
signified neutrality, and values of 4 or 5 signified agreement. The null
hypothesis tested for each statement was "students on average do
not agree with the statement," which was expressed numerically as
"the population average is less than or equal to 3.0." The
alternative hypotheses of "students on average agree with the
statement" was expressed numerically as "the population
average is greater than 3.0." An abbreviated version of each
statement can be found below in Table 1 along with the sample mean, t
score and level of significance. The statements have been numbered for
convenience, with the order changed somewhat from that used on the
survey to facilitate the discussion that follows the table. Throughout
the discussion recall that rejection of the null hypothesis signifies
agreement with the relevant statement.
Note that statements 1 and 2 (which did not appear consecutively in
the survey) are mirror images of one another, with the first stating a
preference for multiple-choice over essay and the second stating the
reverse. The correlation between these two statements was -.843, close
to -1.0 as would be expected. The results for the hypothesis tests of
these two statements, as shown in Table 1, led to a rejection of the
null hypothesis for the statement preferring multiple-choice and an
acceptance of the null for the statement preferring essays. Because
these statements are opposites of one another it would be unreasonable
to obtain data that would lead to a rejection of the null for both. The
results for these two statements were evaluated further using paired t
tests.
The paired t test of the difference between the multiple-choice
preference and the essay preference resulted in a t score of 1.862 and a
significance level of .066, suggesting that there was almost but not
quite sufficient evidence to conclude that multiple-choice questions
were preferred more highly. Breaking the sample into two subsets, one
containing students in the junior-level Principles of Management course
and the other containing students in the senior-level capstone Strategic
Management course, led to results that were a little more enlightening.
The paired t test for the Principles of Management students achieved a t
score of 2.824 and a significance level of .007, suggesting that
multiple-choice questions were preferred by a significant margin. The
Strategic Management students, however, produced a t score of -0.437 and
a significance level of .655, suggesting no difference in preference.
This finding that the preference for multiple-choice declines for more
advanced students is in alignment with the ideas discussed earlier that
students tend to develop more of a deep focus and a drive for greater
understanding as they progress through school (Baxter Magolda, 2004;
Gibjels, 2005; Richardson, 1995).
To investigate whether the preferences were impacted by the
presence of a quantitative orientation, the subset of the sample
majoring in accounting, finance or economics was compared to the other
students. The supposition that quantitative skills might be correlated
with a dislike for writing and hence a dislike of essays was not
supported by the data.
As can be seen in Table 1, the null hypothesis was rejected for six
of the nine rationale-related statements. The results suggested that
students agreed with three of the statements related to the ease of
multiple-choice: (3) Multiple-choice easier: I only need to recognize
correct answer; (4) Multiple-choice easier: I can eliminate obviously
wrong answers; and (5) Multiple-choice easier: I might guess correctly
even if I don't know. The results also suggested agreement with one
statement about the unfairness of multiple-choice, (7) Multiple-choice
less fair: I can't earn partial credit. The results demonstrated
support for two statements concerning essay questions, one statement
about the difficulty of essays, (9) Essay Harder: I must fully
understand topic to produce good answer, and one statement about the
fairness of essays, (10) Essay more fair: More accurately show what I
know/don't know. Only three statements were not supported: (6)
Multiple-choice harder: I need to know nit-picky details; (8) Essay
easier: I can earn partial credit even if topic knowledge low; and (11)
Essay less fair: Content knowledge results biased by writing skill.
In general the significant statements appear to show two trends,
one suggesting that students believe that multiple-choice questions are
easier and another suggesting that students feel essay questions to be
fairer and more valid. To further investigate this phenomenon a factor
analysis was run which resulted in two factors, the first of which
explained 40.46% of the variance and the second of which explained
20.06%. Excluding loadings between .40 and -.40, the first factor had
positive loadings on all four statements viewing multiple-choice as
easier or similarly essays as more difficult, statements 3, 4, 5 and 9,
and loaded negatively on all other statements. The second factor loaded
on all three fairness-related statements. It loaded positively on (7)
Multiple-choice less fair: I can't earn partial credit, and (10)
Essay more fair: More accurately show what I know/don't know, and
loaded negatively on (11) Essay lass fair: Content knowledge results
biased by writing skill. The second factor also loaded positively on (4)
Multiple-choice easier: I might guess correctly even if I don't
know. These results suggest that the ability to obtain a high score more
easily (via multiple-choice) and the ability to receive a score that
accurately measures what the student has learned (via essay, except for
statement 4) are both valued by the students.
While the first factor could be viewed as somewhat disappointing
although certainly not surprising, the second factor is encouraging from
an instructor's perspective. It is particularly encouraging to note
that the inclusion of writing ability as a component of essay
performance was not seen as a drawback of that format. It is unclear,
however, whether that result is primarily due to the students having had
experiences that suggest that most instructors do not pay particularly
close attention to writing when grading or whether the students truly
accept the idea that writing is an important skill in business and is
therefore a reasonable thing to assess. It is curious that being able to
guess on multiple-choice loaded positively on a fairness-related factor.
Possibly the students agree with the Hassmen and Hunt (1994) contention
that guessing is valid because it is generally based on at least partial
knowledge of the relevant course content. These results also agree with
Bridgeman's (2006) finding that although 81% of the students
reported preferring multiple-choice, only 43% felt that multiple-choice
was more valid. The fact that ease appears to be more important to the
students than validity also supports O'Neill (2001), who studied
multiple sections of the same course, some of which received
multiple-choice exams and others essay exams, and found that the essay
sections' students were the ones who complained about being treated
badly.
The data for the two questions on preference as a function of
preparedness were coded using 1 for students who preferred an exam made
up entirely of multiple-choice, 2 for those who wanted a mix of
multiple-choice and essay, and 3 for those who preferred entirely essay.
The mean response was 1.74 under the hypothetical situation in which the
student was well prepared for the upcoming exam and only 1.44 for the
situation in which the student was poorly prepared. With both means less
than 2.0, these results match those seen earlier in which the students
in general preferred multiple-choice. However, the preference for
multiple-choice diminished as the students became more prepared. A
paired t test of the difference between the well prepared preference and
the poorly prepared preference resulted in a t score of 3.829 and a
significance level of less than .001, suggesting that the appeal of
essays increases as preparedness does. In light of the results discussed
above, this suggests that the fairness and validity attraction of essays
may begin to overcome the easiness attraction of multiple-choice when
the students have prepared sufficiently well to not fear the increased
difficulty of essay questions.
Although the second section of the survey was focused on only the
classes in which the respondents were registered (all of which were
taught by the author) and thus asked questions geared specifically
towards methods used in those classes, some of the results from that
section are relevant to the discussion here. These classes all utilized
a pedagogical approach in which the students were given a study guide in
advance of the coverage of each chapter that pointed out the key
concepts to learn for that chapter. Each study guide contained two
sections, one with a list of terms and concepts to learn and another
with several essay questions. The students were told to know and
understand the definitions for each of the terms on the top half of the
page and to recognize the key management issues related to each term.
They were also told to be ready to fully answer every essay question as
all of them were eligible to be selected for the exam. The essays were
often broadly focused and required the students to contrast or integrate
a variety of ideas, often including ideas from previous chapters. It was
made very clear that complete, detailed and organized answers were
required for the essays. The exams were a mix of multiple-choice and
essay, with the one or two essay questions drawn word-for-word from the
study guides and the multiple-choice designed such that students who had
learned all of the terms from the study guides should be able to achieve
a good score. The students were not able to use the text during the
exams but were allowed to use their notes. They were also allowed to
write out essay answers in advance of the test if they chose to do so
and then simply hand in those answers if the questions they had answered
turned out to be on the exam. The instructor's goal, as was
explained to the class on the first day of the semester, was to provide
students with a significant incentive to study carefully and take good
notes, operating under the assumption that most students who did so
would learn the material successfully (an assumption that
end-of-semester instructor evaluation surveys have supported
repeatedly).
This section of the survey was structured similarly to the first
section, containing a series of statements followed by two questions
concerning exam preference as a function of preparedness. Four of the
statements referred to whether the respondent was "encouraged to
study more diligently than I would have otherwise" as a result of,
respectively, the study guides, the use of open notes exams, the
presence of essays on the exams, and the requirement for the essays to
be answered very completely in order to obtain a good score. Two other
statements concerned whether being required to provide more complete
essay answers than usual was made fair by the student's ability to
see the questions in advance on the study guides and whether the student
felt that the extra study effort put forth had actually resulted in
greater learning. The students demonstrated highly significant agreement
with these statements, with all of them achieving a level of
significance less than .001. Taken as a group, these statements suggest
several things. First, students are prompted to study harder both by the
presence of essays on an exam and by the fact that those essays will be
rigorously graded. Second, students find it reasonable to include
rigorously graded essays if the content of the questions can be viewed
in advance to direct the students' study efforts. And third,
students believe that their increased efforts led to greater learning
than in the usual class.
These results support the literature discussed previously that
demonstrated that prompting students to adopt deep-oriented study
strategies would tend to encourage them to do so and that this increased
adoption of deep-oriented strategies would tend to lead to an increased
level of learning. Narrowing the focus to this paper's comparison
of multiple-choice and essay questions, these results suggest that
students are simply more fearful of essays and recognize that they need
to put forth more effort to answer them effectively.
IMPLICATIONS FOR TEST CONSTRUCTION
The results of this study demonstrate that students tend to prefer
multiple-choice and that their preference is driven largely by the
belief that this type of question is easier, with the preference
becoming even stronger when the students are poorly prepared for an
exam. More encouragingly from a faculty perspective, the results also
demonstrate that students appear to have an appreciation for the
fairness and validity of essay questions as a measure of the success of
their learning efforts and become more accepting of essays when well
prepared for an exam. The results also support previous findings in the
literature, for example Entwistle and Entwistle (1992), Scouller (1998),
and Trigwell, Prosser and Waterhouse (1999), that suggest that
instructor efforts to promote increased student utilization of deep
learning strategies can be successful.
These results, in conjunction with previous work discussed earlier
in the paper, prompt a variety of pedagogical recommendations for
faculty who are being pressured by time or budget constraints to make
greater use of multiple-choice questions. First, faculty should
recognize that it is possible to test certain types of higher-order
learning using the multiple-choice format and that a multiple-choice
exam does not therefore have to be an assessment of only memorization
and recall (Anderson et al., 2001; Suskie, 2004; Wainer and Thissen,
1993). Thus an increase in the use of multiple-choice questions does not
necessarily lead to an inferior exam, although faculty should be aware
when writing multiple-choice questions that there can be an
unintentional bias toward writing questions focused on lower-level
learning simply because those questions are so much easier to create
(Roediger, 2005).
The results for statements 3, 4, 5 and 9 demonstrate that students
believe multiple-choice questions to be easier. Thus a second
recommendation is that faculty who attempt to develop higher-order
multiple-choice questions must make it clear to students that these
questions will call for deeper learning than is normally required by
multiple-choice. Otherwise the use of multiple-choice may encourage the
students to apply primarily surface-oriented study methods or to simply
put less effort into studying (Gijbels, 2005; Scouller, 1998; Scouller
& Prosser, 1994). They would therefore fail to achieve a level of
learning sufficient to succeed on the unexpectedly difficult exam. It
would be worthwhile to provide each class with example questions from
previous semesters that could be utilized to demonstrate the difference
between lower-order questions and those focused on higher-order learning
in the context of the discipline of the course.
As an extension of the second recommendation, faculty should
consider the possibility of broadening the discussion beyond simply test
questions to consider higher--versus lower-order learning in general by
beginning each semester with a discussion of relevant pedagogical topics
such as Bloom's Taxonomy and the utilization of deep learning
strategies (Cassady, 2004; Hagedorn, Sagher & Siadat, 2000; Taylor
& Hyde, 2000). Most undergraduates aren't exposed to these
topics and therefore haven't received the guidance that would
enable them to more fully develop their own study skills. Anderson et
al. (2004) provide a good discussion of Bloom's Taxonomy.
A third and closely related recommendation is that faculty should
emphasize the fact that exams including higher-order multiple-choice
questions will more closely mimic an essay exam's ability to
distinguish between students who have utilized surface strategies to
memorize a list of isolated facts and those who have applied deep
strategies to develop a more complete understanding of course content.
Traditional lower-order multiple-choice questions often fail to reward
the use of deep strategies because surface strategies are sufficient,
i.e. the students who develop a deeper understanding don't benefit
tangibly via higher test scores because that deeper understanding
isn't required to answer the exam questions correctly. As was
demonstrated by statement 10 in this study, students have an
appreciation for the fairness and validity of essays due to their
ability to reward those who have worked to achieve higher-order
learning. Students need to be convinced that higher-order
multiple-choice questions will similarly reward those who have achieved
greater understanding.
The results from the second section of this study's survey
suggest a fourth recommendation that it can be very beneficial for
faculty to provide clear guidelines to students concerning the level of
learning expected for each course topic. With years of experience taking
traditional multiple-choice exams, in which most exam questions call
primarily for memorization and recall, many students may conclude that
all topics deserve approximately the same rather cursory study. With the
addition of higher-order questions, however, it becomes necessary to let
students know which topics are fully deserving of deeper study.
Providing study guides, grading rubrics and old exams and assignments to
demonstrate the level of understanding required can be an important
motivator (Driessen & van der Leuten, 2000). By the way, it should
be noted that in most courses there are indeed quite a few topics that
really don't deserve more than cursory coverage and therefore
probably shouldn't be tested via anything other than lower-order
questions. Faculty must overcome the natural tendency to feel that every
topic within their discipline is interesting and important in order to
focus the students' deeper-oriented study efforts on the truly
critical ideas.
A fifth recommendation is that faculty must allow students
sufficient time both for the exam specifically and for studying and
learning in general. The inclusion of higher-order multiple-choice
questions will result in exams that require more reasoning time per
question. If faculty write exams containing the same number of questions
as before and provide the same amount of time as before, the students
will be forced to approach the new higher-order questions with the same
rather cursory approach that so many students have learned to utilize
for multiple-choice from years of experience with lower-order questions.
This situation will not only seem tremendously unfair to the students
but may also lead many of them to conclude that there is little payoff
from utilizing the deep study strategies that the faculty are hoping to
foster. Although no faculty want to offer a class that will become known
as easy, it is also important to make sure that students are given
achievable learning goals and reasonable (though challenging) workloads.
As noted by Taylor and Hyde (2000), students tend to gravitate toward
surface strategies in order to quickly accumulate a list of test-ready
facts when there isn't enough time to learn more deeply. Providing
students with insufficient time to prepare for an exam or insufficient
time to take an exam may sabotage faculty efforts to encourage the
adoption of deep learning strategies.
A final particularly critical recommendation is that faculty should
make certain to provide timely detailed feedback. One of the
characteristics of students practicing deep learning strategies is a
desire to monitor their own understanding and correct misconceptions as
they occur (Chi, Bassok, Lewis, Reimann & Glaser, 1989; Minbashian
et al., 2004). To facilitate this process, faculty will need to avoid
the temptation to protect multiple-choice questions for repeat use in
future semesters by preventing students from seeing anything more than
just their exam scores. The inclusion of higher-order questions, which
by definition involve more complex reasoning than would usually be seen
in multiple-choice exams, makes this feedback even more critical.
Faculty must hand back the exam and discuss such questions in detail so
that students will fully understand the complex reasoning processes
required. By carefully designing the incorrect answer options so that
each incorrect answer would appear correct to students guilty of a
common misconception or reasoning error, the exam can be used to
diagnose these common mistakes. The incorrect answers then become a
useful device to drive an enlightening discussion about those common
misconceptions and reasoning errors, why they are incorrect, and how to
correctly apply the ideas and skills of the discipline to find the right
answer. In this way the students' incorrect answers can prompt
valuable learning rather than becoming little more than uncorrected
mistakes that the students may vaguely remember later as being correct
(Roediger, 2005). Purdie, Hattie and Douglas (1996) investigated student
utilization of a variety of learning strategies and found that the
least-used of the various tactics was reviewing feedback. This suggests
that students may view the score as the ultimate outcome of an exam,
regardless of the type of questions used. It is important that faculty
promoting deep learning strategies make the provision of meaningful
informative feedback a key part of the learning process.
REFERENCES
Anderson, L.W., D.R. Krathwohl, P.W. Airasian, K.W. Cruikshank,
R.E. Mayer, P.R. Pintrich et al. (Eds.) (2001). A taxonomy for learning,
teaching and assessing. New York: Addison Wesley Longman.
Baxter Magolda, M.B. (2004). Evolution of a constructivist
conceptualization of epistemological reflection. Educational
Psychologist, 39(1), 31-42. Retrieved June 13, 2007, from the ERIC
database.
Becker, W.E. & C. Johnston (1999). The relationship between
multiple choice and essay response questions in assessing economics
understanding. The Economic Record, 75(231), 348-357. Retrieved February
24, 2006 from the Business Source Premier database.
Bridgeman, B. (1992). A comparison of quantitative questions in
open-ended and multiple-choice formats [Electronic version]. Journal of
Educational Measurement, 29(3), 253-271.
Cassady, J.C. (2004). The influence of cognitive test anxiety
across the learning-testing cycle [Electronic version]. Learning and
Instruction, 14, 569-592.
Chi, M.T.H., M. Bassok, M.W.Lewis, P. Reimann & R. Glaser
(1989). Self-explanations: How students study and use examples in
learning to solve problems [Electronic version]. Cognitive Science, 13,
145-182.
Diseth, A. & O. Martinsen (2003). Approaches to learning,
cognitive style, and motives as predictors of academic achievement.
Educational Psychology, 23(2), 195-207. Retrieved July 20, 2004 from the
Academic Search Premier database.
Driessen, E. & C. van der Vleuten (2000). Matching student
assessment to problem-based learning: Lessons from experience in a law
faculty. Studies in Continuing Education, 22(2), 235-248. Retrieved June
14, 2007, from the Academic Search Premier database.
Entwistle, A. & N. Entwistle (1992). Experiences of
understanding in revising for degree examinations. Learning and
Instruction, 2, 1-22. Retrieved June 19, 2007, from the Science Direct
database.
Gijbels, D. (2005). The relationship between students'
approaches to learning and the assessment of learning outcomes
[Electronic version]. European Journal of Psychology of Education,
20(4). Retrieved February 24, 2006, from the Academic Search Premiere
database.
Gravoso, R.S., A.E. Pasa & T. Mori (2002). Influence of
students' prior learning experiences, learning conceptions and
approaches on their learning outcomes [Electronic version]. Proceedings
of the Higher Education Research and Development Society of Australasia,
282-289.
Hagedorn, L.S., Y. Sagher & M.V. Siadat (2000). Building study
skills in a college mathematics classroom [Electronic version]. Journal
of General Education, 49(2), 132-155.
Harris, R.B. & W.C. Kerby (1997). Statewide performance
assessment as a complement to multiple-choice testing in high school
economics. Journal of Economic Education, 28(2), 122-134. Retrieved June
14, 2007 from the Business Source Premier database.
Hassmen, P. & D.P. Hunt (1994). Human self-assessment in
multiple-choice testing [Electronic version]. Journal of Educational
Measurement, 31(2), 149-160.
Lukhele, R., D. Thissen & H. Wainer (1994). On the relative
value of multiple-choice, constructed response, and examinee-selected
items on two achievement tests [Electronic version]. Journal of
Educational Measurement, 31(3), 234-250.
Minbashian, A., G.F. Huon & K.D. Bird (2004). Approaches to
studying and academic performance in short-essay exams. Higher
Education, 47, 161-176. Retrieved Jun 14, 2007, from the Academic Search
Premier database.
O'Neill, P.B. (2001). Essay versus multiple-choice exams: An
experiment in the principles of macroeconomics course. The American
Economist, 45(1), 62-70. Retrieved February 24, 2006 from the Business
Source Premier database.
Powers, D.E., M.E. Fowles, M. Farnum & P. Ramsey (1994). Will
they think less of my handwritten essay if others word process theirs?
Effects on essay scores of intermingling handwritten and word-processed
essays [Electronic version]. Journal of Educational Measurement, 31(3),
220-233.
Purdie, N., J. Hattie & G. Douglas (1996). Student conceptions
of learning ahte their use of self-regulated learning strategies: A
cross-cultural comparison. Journal of Educational Psychology, 88(1),
87-100. Retrieved June 12, 2007, from the PsycArticles Publications
database.
Raimondo, H.J., L. Esposito & I. Gershenberg (1990).
Introductory class size and student performance in intermediate theory
courses [Electronic version]. Journal of Economic Education, 21(4),
369-381. Retrieved June 11, 2007, from the Business Source Premier
database.
Richardson, J.T.E. (1995). Mature students in higher education: II:
An investigation of approaches to studying and academic performance.
Studies in Higher Education, 20(1), 5-17. Retrieved May 29, 2006, from
the Academic Search Premier database.
Roediger, H.L. III (2005). The positive and negative consequences
of multiple-choice testing [Electronic version]. Journal of Experimental
Psychology: Learning, Memory & Cognition, 31(5), 1155-1159.
Scouller, K. (1998). The influence of assessment method on
students' learning approaches: Multiple choice question examination
versus assignment essay [Electronic version]. Higher Education, 35,
453-472.
Scouller, K.M. & M. Prosser (1994). Students' experiences
in studying for multiple choice question examinations. Studies in Higher
Education, 19(3), 267-279. Retrieved May 29, 2006, from the Academic
Search Premier database.
Suskie, L. (2004). Assessing Student Learning: A common sense
guide. Bolton, MA: Anker Publishing.
Taylor, R. & M. Hyde (2000). Learning context and
students' perceptions of context influence student learning
approaches and outcomes in animal science 2. Proceedings of the Teaching
and Educational Development Institute Conference on Effective Teaching
and Learning at University. Retrieved June 13, 2007, from
http://www.tedi.uq.edu.au/conferences/teach_conference00/
papers/taylor-hyde.html
Thissen, D., H. Wainer & X-B Wang (1994). Are tests comprising
both multiple-choice and free-response items necessarily less
unidimensional than multiple-choice tests? An analysis of two tests
[Electronic version]. Journal of Educational Measurement, 31(2),
113-123.
Trigwell, K., M. Prosser & F. Waterhouse (1999). Relations
between teachers' approaches to teaching and students'
approaches to learning [Electronic version]. Higher Education, 37,
57-70.
Walstad, W.B. & W.E. Becker (1994). Achievement differences on
multiple-choice and essay tests in economics [Electronic version].
Proceedings of the American Economic Association, 193-196.
Wainer, H. & D. Thissen (1993). Combining multiple-choice and
constructed-response test scores: Toward a Marxist theory of test
construction [Electronic version]. Applied Measurement in Education,
6(2), 103-118.
David A. Parmenter, Northern Arizona University-Yuma
Table 1: Hypothesis Tests of Student Agreement with Assessment
Method Statements
Statements (abbreviated
versions) Mean t score Significance
(1) I prefer multiple- 3.30 2.375 .01
choice questions to essay
questions
(2) I prefer essay 2.84 -1.227 .888
questions to multiple-
choice questions
(3) Multiple-choice 3.38 3.339 <.001
easier: I only need to
recognize correct answer
(4) Multiple-choice 3.70 6.008 <.001
easier: I can eliminate
obviously wrong answers
(5) Multiple-choice 3.52 5.208 <.001
easier: I might guess
correctly even if I don't
know
(6) Multiple-choice 2.79 -1.645 .948
harder: I need to know
nit-picky details
(7) Multiple-choice less 3.47 3.977 <.001
fair: I can't earn
partial credit
(8) Essay easier: I can 3.14 1.085 .141
earn partial credit even
if topic knowledge low
(9) Essay harder: I must 3.42 3.693 <.001
fully understand topic to
produce good answer
(10) Essay more fair: 3.69 6.888 <.001
More accurately show what
I know/don't know
(11) Essay less fair: 2.70 -2.449 .991
Content knowledge results
biased by writing skill