文章基本信息

标题：Essay versus multiple-choice: student preferences and the underlying rationale with implications for test construction.
作者：Parmenter, David A.
期刊名称：Academy of Educational Leadership Journal
印刷版ISSN：1095-6328
出版年度：2009
期号：May
语种：English
出版社：The DreamCatchers Group, LLC
关键词：Essay;Essays;Learning;Multiple choice tests;Multiple-choice examinations

Essay versus multiple-choice: student preferences and the underlying rationale with implications for test construction.

Parmenter, David A.

INTRODUCTION

Many universities are experiencing budget cuts that result in larger classes. The need to teach these larger classes without suffering an excessive reduction in the amount of time available for research and service-related endeavors exerts pressure on faculty to make increasing use of multiple-choice questions on quizzes and exams. Although multiple-choice questions provide the obvious benefit of being easier to grade, their use is in conflict with the intuitive feelings of many faculty that multiple-choice questions are inferior to essays and other open-ended forms of assessment both as a measure and as a promoter of student learning.

Previous studies have demonstrated that the use of assessments that test higher-order thinking can encourage students to pursue study strategies that develop a deeper level of understanding rather than utilizing strategies that achieve little more than test-related recall. Previous work has also demonstrated that the use of these deeper learning strategies tends to lead to greater learning, at least for those learning tasks requiring higher-order thinking. This paper will review those results, relating them to the choice between multiple-choice and essay, and also discuss various other strengths and weaknesses of the essay and multiple-choice format. The paper's major purpose is to extend the knowledge of student assessment preferences by investigating the rationale behind those preferences. It will report the results of a survey questionnaire in which undergraduate business students in a junior level Principles of Management course and a capstone Strategic Management course specified not only their preferences for multiple-choice or essay but also the reasons for those preferences. The patterns that result will demonstrate two main factors driving student opinions, one factor focused on the ease of obtaining a high test score (with preferences for multiple-choice) and the other focused on the fairness and validity of the assessment measure (with preferences for essay). These results, in conjunction with various ideas drawn from the literature, will be used to generate examrelated pedagogical recommendations.

ESSAY VERSUS MULTIPLE-CHOICE AS A MEAURE OF LEARNING

Because the primary goal of an exam is to accurately assess student learning, one of the key issues to consider is whether or not open-ended and multiple-choice questions differ in terms of reliability and validity. The literature tends to favor multiple-choice. For example, Bridgeman (1992) suggested that although multiple-choice is less reliable on a question-by-question basis due to guessing, the fact that multiple-choice questions take less time to answer (and grade) would allow an exam made up entirely of multiple-choice to contain more questions and therefore be more reliable than an exam containing fewer open-ended questions. Hassmen and Hunt (1994) agreed but also suggested that guessing shouldn't be discouraged because guesses are generally based at least partially on student knowledge of the relevant course content. A weakness of essay questions cited by many authors is that they require subjective grading, with even factors that are unrelated to answer content shown to have an impact on exam scores. For example, Powers, Fowles, Farnum and Ramsey (1994) found that handwritten answers tended to be graded more leniently than identical answers written using a word processor. The authors suggested that graders may be more forgiving of handwritten answers because poor penmanship can disguise spelling and grammar errors, because the scratch-outs that appear in handwritten answers demonstrate the students' efforts at improvement and because word processed answers tend to fill less space on the page and therefore may appear to be less complete.

A related measurement issue concerns the question of whether or not essays and other forms of open-ended questions truly assess different dimensions of learning than are assessed by multiple-choice. Many studies have found that student scores on open-ended questions were so closely related to their scores on multiple-choice as to suggest that both types of questions were measuring the same things (Bridgeman, 1992; Lukhele, Thissen & Wainer, 1994; Walstad & Becker, 1994), suggesting that the difficult-to-administer open-ended questions might not be worth the extra effort because multiple-choice alone could be used to assess the same learning. Lukhele et al. noted further that essays became particularly unhelpful as a measure of learning if students were allowed to select the questions to answer from a longer list of questions provided, allowing the students to avoid topics they hadn't learned. In contrast, Thissen, Wainer and Wang (1994) and Harris and Kerby (1997) found that essay scores and multiple-choice scores, although significantly correlated, were not so closely related as to be identical. Even more convincingly, Becker and Johnston (1999) applied two-stage least squares, a more sophisticated statistical procedure than was used in most previous studies of this topic, and found that essay and multiple-choice scores were not significant predictors of each other, suggesting that they were clearly measuring different dimensions of knowledge.

Various authors have provided potential explanations for the high correlations often seen between multiple-choice and essay scores, implying that this should not be interpreted as a suggestion that both types of questions measure the same learning. Driessen and van der Leuten (2000) suggested that student reasoning is so difficult to grade that the grading process for essay answers may unintentionally deteriorate into an exercise in counting the number of facts provided in the answer, thereby turning even an essay question into a recall-oriented question. Because many students recognize that quantity is often rewarded and therefore attempt to provide it in their answers, timed exams may not give students the opportunity to provide both the quantity-oriented list of facts and the more thoughtful organization, integration, and analysis of those facts that would demonstrate higher order thinking (Minbashian, Huon & Bird, 2004). Students who are fearful of failure are particularly prone to this behavior of attempting to provide quantity (Diseth & Martinsen, 2003).

Despite the mixed evidence for the existence of a meaningful difference between what is measured by multiple-choice questions and what is measured by open-ended questions, many authors suggest that open-ended questions are more effective at assessing higher order thinking in contrast to the recall-oriented focus often seen in multiple-choice (Bridgeman, 1992; Scouller, 1998; Scouller & Prosser, 1994; Walstad & Becker, 1994). For example, essay answers can demonstrate student thought processes and creativity in a way that cannot be achieved using multiple-choice (Walstad & Becker, 1994). Students may hold these same perceptions of multiple-choice questions as being targeted at lower level recall-oriented learning (Scouller, 1998), even for multiple-choice questions that have been specifically designed to test higher level learning (Scouller & Prosser, 1994). Faculty should pay full attention to student perceptions of assessment methods because those perceptions may influence student study strategies, as will be discussed further below.

ESSAY VERSUS MULTIPLE-CHOICE AS A PROMOTER OF LEARNING

Student perceptions of what learning really is and the study strategies used to achieve that learning are conventionally classified into two categories, surface and deep. Surface-oriented strategies focus on course content as a set of disconnected ideas that are to be accepted passively from the instructor or text. The primary learning goal is to memorize facts and subsequently reproduce them on an exam (Entwistle & Entwistle, 1992; Minbashian et al., 2004). To a certain extent, surface strategies could be viewed as simply the lack of a strategy. As shown by Scouller and Prosser (1994), surface-oriented students tend not to be self- reflective about their studying processes and may not really be able to appreciate the difference between the recall of facts and a true understanding of the material. The surface orientation can apply to the instructor as well as to the students. A surface-oriented instructor, operating under the assumption that students bring little expertise of their own to the class, would tend to emphasize lecture with the view that his primary task is to impart information.

In contrast, deep learning strategies focus on achieving true understanding rather than simply preparing for the test. This would involve more self-directed learning with less focus on the teacher being in charge, increased collaboration between students, integration of ideas, critical evaluation of the logic and evidence that support each new concept, inference of abstract principles from examples, and careful monitoring of one's own understanding accompanied by efforts to clear up any misunderstandings

as they arise (Chi, Bassok, Lewis, Reimann & Glaser, 1989; Minbashian et al., 2004). Students report that the sense of being able to integrate the various ideas into one's own structure and recognize the patterns that result is a key component of their understanding (Entwistle & Entwistle, 1992). Students using deep strategies tend to have a feeling of being in control of their own learning as well as a sense of excitement. Because these students, in essence, construct their own knowledge through their strategic efforts, deep learning is often referred to as constructivist. A deep-oriented instructor would deemphasize lecturing and instead encourage questions and discussion, present multiple perspectives rather than portraying ideas as completely true or completely false, provide time and opportunities for student collaboration, relate course content to professional practice, provide frequent feedback and encourage students to monitor their own understanding.

Any given student would generally apply a mix of surface and deep strategies, with most students having a bias toward one or another and some being adept at adjusting their choice of strategy in reaction to the characteristics of a particular class or assignment. Some students, unfortunately, apply little of either type, seeming unwilling or unable to evaluate their own largely strategy-free behavior or to develop the study skills necessary to become more strategic. Many authors (Cassady, 2004; Hagedorn , Sagher & Siadat, 2000; Taylor & Hyde, 2000) point out the importance of teaching students how to study effectively in addition to teaching course content. In general, students tend to gravitate toward taking more control over their own learning and adopting more sophisticated study habits as they age and progress through college, graduate school and working careers (Baxter Magolda, 2004; Gijbels, 2005; Richardson; 1995). The conventional assumption that older students returning to school after significant work experience will have poor study practices is not necessarily correct.

Although the results in the literature are somewhat mixed, generally students emphasizing deep strategies and deemphasizing surface strategies have been found to learn more effectively and achieve better performance on assignments involving higher order learning, although their performance may not be better on lower order activities (Gijbels, 2005; Gravoso, Pasa and Mori, 2002; Minbashian et al., 2004; Taylor & Hyde, 2000). As reported by Scouller and Prosser (1994), the use of deep strategies has also been found to lead to greater student satisfaction. Thus it would seem worthwhile for instructors to behave in such a way as to encourage students to adopt deep learning strategies. Trigwell, Prosser and Waterhouse (1999) found that instructors who use surface-oriented teaching methods encourage a similar surface approach from the students and also found a similar although weaker relationship for the deep orientation. A particular danger is that instructors utilizing surface-oriented teaching methods may encourage a surface approach not only in their own classes but also in subsequent classes, making it harder for instructors in those later more advanced classes to prompt the adoption of deeper learning strategies (Raimondo, Esposito & Gershenberg, 1990).

A key question given this paper's contrast of multiple-choice and essay questions, is whether or not the type of exam questions utilized in a course will impact student learning strategies. Scouller (1998) found that students believe different abilities are being assessed by multiple-choice and essay questions, with essays being viewed as testing higher level thinking than multiple-choice. These student perceptions were found to translate into action as students applied more deep strategies and fewer surface strategies when preparing for essays. Although not specifically comparing multiple-choice and essay, Entwistle and Entwistle (1992) similarly found that students' study strategies were impacted by the level of questions they expected to see on the exam, with narrowly focused questions that stressed recall rather than the integration of ideas encouraging the use of surface strategies. Even students who had applied deep strategies throughout the semester and had achieved a high level of understanding felt pressured to apply surface-oriented memorization in the last days leading up to the exam. However, there is also evidence suggesting that students with a general preference toward deep strategies will continue to use deep strategies even if the exam itself does not encourage that (Entwistle & Entwistle, 1992; Scouller & Prosser, 1994). An overly heavy workload may undermine efforts to encourage deep strategies, possibly due to students believing that recall-focused surface strategies will enable them to quickly gather enough information to pass an exam that they do not have enough to time prepare for more carefully (Taylor & Hyde, 2000). Similarly, an exam on which achieving a high score is particularly critical, for example a single exam on which an entire semester's grade will be based or a highly competitive exam for admission into an exclusive graduate program, may encourage a "beat the test" mentality and therefore foster an increased reliance on surface strategies (Diseth & Martinsen, 2003).

Overall, the evidence suggests that the use of exams focused on higher-order learning is an important component of any teaching effort intended to encourage greater student adoption of deep strategies. Most authors view open-ended questions, including essays, as more capable of achieving a higher-order focus. Gijbels (2005), for example, suggests that even when worded in such a way as to test higher-order thinking, multiple-choice will be approached by students as a task calling for surface-oriented strategies. Nevertheless, there are some counterarguments. Wainer and Thissen (1993) suggest that skilled test writers can create multiple-choice questions that effectively test higher-order thinking and that the obvious efficiency advantages of multiple-choice should promote greater faculty efforts to develop that skill. Suskie (2004) agrees that multiple-choice questions can be written to test some, although not all, of the higher-order thinking skills from Bloom's Taxonomy. Similarly, but from the opposite perspective, Raimondo et al. (1990) advise that not all essay questions are necessarily higher-order as many simply call for recall-oriented answers.

ADDITIONAL CONSIDERATIONS

There are several additional advantages and disadvantages that should be mentioned briefly to complete the discussion. A key disadvantage of multiple-choice questions, especially in light of the fact that grading efficiency is one of their key benefits, is that multiple-choice questions tend to be more difficult to write, thus negating some of the time savings achieved during the grading process. One danger of this problem is that it tends to be more difficult to write multiple-choice questions designed to test higher order thinking than those that test recall, which can result in exams that are more recall-oriented than the instructors might have actually intended (Suskie, 2004). The writing difficulty can also encourage faculty to protect their questions for use in future semesters by not handing back the graded exams or by handing them back only long enough for the students to see the scores (Roediger, 2005), a practice would not only deprive the students of valuable feedback but might also tend to encourage a belief among the students that the test score is the important thing, not the learning. In fairness, it should be noted that the ease with which multiple-choice can be graded can allow very quick feedback in the cases in which the instructor is willing to hand back and review the exam (Becker and Johnston, 1999).

Another disadvantage of multiple-choice is that test-takers are exposed to numerous incorrect answers, many of which may be constructed so as to appear to be correct. Roediger (2005) found that students tended to remember these incorrect lures as being correct when queried about them later, suggesting that students actually learn the wrong things as part of the testing process. A related disadvantage is that students receive corrective feedback whenever their own answer does not appear as one of the available alternatives, a prompt to reconsider the question and correct their mistake that would not be present in an open-ended assessment (Bridgeman, 1992). Some students react to the availability of the possible answers by working backwards to answer the question, particularly on quantitative problems. Bridgeman (1992) found that 81% of the students reported working backwards to solve problems, a problem-solving methodology that would not normally be appropriate when solving realistic problems in the field.

The need to minimize the impact of this corrective feedback encourages faculty to create incorrect answers that match common student errors or that are worded in such a way as to appear to be correct. The problem with this tactic, however, is that those apparently reasonable but incorrect answers are what the students may most clearly remember later as truly being correct. Creating incorrect answers that match common student mistakes, although potentially problematic as discussed above, can also be viewed as a benefit of multiple-choice in that this method allows an instructor to quickly diagnose the various misconceptions that appear to be occurring most frequently in a given semester.

An additional disadvantage of multiple-choice questions is that they can be gender-biased. As reported by Hassmen and Hunt (1994), numerous studies have shown that men tend to have an advantage on multiple-choice. Including essay questions on an exam can help to reduce this bias. A major weakness of essay questions is that they are relatively time-consuming to answer, meaning that a timed exam can only include a small number of questions (Walstad & Becker, 1994). This can result in an exam that is unable to evaluate student learning over all of the topics covered in that portion of the course, a problem that can penalize some students and reward others depending upon which topics go untested. This problem of failing to test a significant portion of the course content is exacerbated (but tends to become more favorable for the students) if students are allowed to select the questions that they prefer to answer from a longer list of questions provided.

A final advantage of essay, probably more accurately characterized as a disadvantage of multiple-choice, is that essay or other open-ended questions can more appropriately drive the curriculum and instructional activities. While it would be inappropriately simplistic to view teaching as an exercise in preparing students to pass exams, most faculty would be somewhat accepting of the idea that one of teaching's main goals is to enable students to correctly complete relevant problem-solving exercises and broadly focused integrative essays. Very few, in contrast, would view teaching toward the test to be valid if that test were made up of only multiple-choice questions (Harris & Kerby, 1997).

METHODOLOGY

The intent of this study is to evaluate student preferences for multiple-choice or essay questions and, in particular, to extend that evaluation to investigate the rationale behind those preferences and the extent to which those preferences may change depending on whether the student feels himself to be well prepared or poorly prepared for an upcoming exam. Numerous studies, some cited above, have investigated the relationships between teaching methods, student study strategies and student performance. Various studies have also evaluated student preferences for multiple-choice versus open-ended questions. However, little work has been done to determine student rationales for having given preferences. The data on student preference as a function of preparedness will implicitly suggest which type of question students believe requires greater preparation and may provide some insight concerning the type of exam questions that instructors should use to prompt greater study effort.

Data was gathered using a survey instrument that was completed by 81 undergraduate students enrolled in either the junior-level Principles of Management course or the senior-level Strategic Management capstone course. Because both courses are required for all business students in the program, the sample included a mix of the usual business school majors, i.e. accounting, finance, economics, management, marketing and so forth. Seven of the respondents were pursuing majors other than business.

The first section of the survey, for which students were instructed to reflect upon their entire college experience and not just the course in which they were completing the survey, required the students to respond to eleven statements and two questions. Of the eleven statements, two concerned general preferences for multiple-choice or essay questions and nine concerned the rationale behind those preferences. The students responded to each statement by selecting "Disagree Strongly," "Disagree," "Neutral," "Agree" or "Agree Strongly." The nine available reasons for preferring one type of exam over another were based on discussion with fellow business faculty and extensive anecdotal evidence concerning student opinions about testing methods. These nine statements allowed the expression of opinions such as multiple-choice questions being unfair due to the inability to earn partial credit and multiple-choice being easier because the student doesn't need to know the topic thoroughly but rather just enough to recognize the right answer. More detail on the nine statements will be provided during the discussion of the survey results.

The first section of the survey ended with two questions focused on student preferences for multiple-choice or essay questions as a function of how well prepared the students were for a hypothetical exam. One question asked for the preferred type of exam question if "you are very well prepared for an exam, knowing the subjects involved backwards and forwards." The second asked for the preference if "you haven't been able to study as much as you would have liked and aren't really very well prepared for an exam." In order to provide the students with the option of selecting a middle ground rather than forcing them to either extreme, the students could respond by selecting an exam composed entirely of multiple-choice questions, an exam composed entirely of essay questions, or an exam containing a mix of both types of questions.

The second section of the survey contained a series of statements and questions that were intended to apply only to the class in which the students were currently enrolled. Although this section was designed primarily to obtain feedback on certain class-specific teaching strategies and not as part of this study, some of the results are relevant and will be included in the discussion of the survey results. The third and final section of the survey asked for demographic information such as major and grade point average. The survey concluded with an open-ended question asking for comments in order to solicit input concerning any preference-related rationales that might have been omitted from the nine statements in the first section. The data were analyzed using SPSS.

RESULTS AND DISCUSSION

For purposes of analysis the responses to the 11 statements from the first section of the survey were coded using 1 for "Disagree Strongly" through 5 for "Agree Strongly" such that values of 1 or 2 signified disagreement with the statement, a value of three signified neutrality, and values of 4 or 5 signified agreement. The null hypothesis tested for each statement was "students on average do not agree with the statement," which was expressed numerically as "the population average is less than or equal to 3.0." The alternative hypotheses of "students on average agree with the statement" was expressed numerically as "the population average is greater than 3.0." An abbreviated version of each statement can be found below in Table 1 along with the sample mean, t score and level of significance. The statements have been numbered for convenience, with the order changed somewhat from that used on the survey to facilitate the discussion that follows the table. Throughout the discussion recall that rejection of the null hypothesis signifies agreement with the relevant statement.

Note that statements 1 and 2 (which did not appear consecutively in the survey) are mirror images of one another, with the first stating a preference for multiple-choice over essay and the second stating the reverse. The correlation between these two statements was -.843, close to -1.0 as would be expected. The results for the hypothesis tests of these two statements, as shown in Table 1, led to a rejection of the null hypothesis for the statement preferring multiple-choice and an acceptance of the null for the statement preferring essays. Because these statements are opposites of one another it would be unreasonable to obtain data that would lead to a rejection of the null for both. The results for these two statements were evaluated further using paired t tests.

The paired t test of the difference between the multiple-choice preference and the essay preference resulted in a t score of 1.862 and a significance level of .066, suggesting that there was almost but not quite sufficient evidence to conclude that multiple-choice questions were preferred more highly. Breaking the sample into two subsets, one containing students in the junior-level Principles of Management course and the other containing students in the senior-level capstone Strategic Management course, led to results that were a little more enlightening. The paired t test for the Principles of Management students achieved a t score of 2.824 and a significance level of .007, suggesting that multiple-choice questions were preferred by a significant margin. The Strategic Management students, however, produced a t score of -0.437 and a significance level of .655, suggesting no difference in preference. This finding that the preference for multiple-choice declines for more advanced students is in alignment with the ideas discussed earlier that students tend to develop more of a deep focus and a drive for greater understanding as they progress through school (Baxter Magolda, 2004; Gibjels, 2005; Richardson, 1995).

To investigate whether the preferences were impacted by the presence of a quantitative orientation, the subset of the sample majoring in accounting, finance or economics was compared to the other students. The supposition that quantitative skills might be correlated with a dislike for writing and hence a dislike of essays was not supported by the data.

As can be seen in Table 1, the null hypothesis was rejected for six of the nine rationale-related statements. The results suggested that students agreed with three of the statements related to the ease of multiple-choice: (3) Multiple-choice easier: I only need to recognize correct answer; (4) Multiple-choice easier: I can eliminate obviously wrong answers; and (5) Multiple-choice easier: I might guess correctly even if I don't know. The results also suggested agreement with one statement about the unfairness of multiple-choice, (7) Multiple-choice less fair: I can't earn partial credit. The results demonstrated support for two statements concerning essay questions, one statement about the difficulty of essays, (9) Essay Harder: I must fully understand topic to produce good answer, and one statement about the fairness of essays, (10) Essay more fair: More accurately show what I know/don't know. Only three statements were not supported: (6) Multiple-choice harder: I need to know nit-picky details; (8) Essay easier: I can earn partial credit even if topic knowledge low; and (11) Essay less fair: Content knowledge results biased by writing skill.

In general the significant statements appear to show two trends, one suggesting that students believe that multiple-choice questions are easier and another suggesting that students feel essay questions to be fairer and more valid. To further investigate this phenomenon a factor analysis was run which resulted in two factors, the first of which explained 40.46% of the variance and the second of which explained 20.06%. Excluding loadings between .40 and -.40, the first factor had positive loadings on all four statements viewing multiple-choice as easier or similarly essays as more difficult, statements 3, 4, 5 and 9, and loaded negatively on all other statements. The second factor loaded on all three fairness-related statements. It loaded positively on (7) Multiple-choice less fair: I can't earn partial credit, and (10) Essay more fair: More accurately show what I know/don't know, and loaded negatively on (11) Essay lass fair: Content knowledge results biased by writing skill. The second factor also loaded positively on (4) Multiple-choice easier: I might guess correctly even if I don't know. These results suggest that the ability to obtain a high score more easily (via multiple-choice) and the ability to receive a score that accurately measures what the student has learned (via essay, except for statement 4) are both valued by the students.

While the first factor could be viewed as somewhat disappointing although certainly not surprising, the second factor is encouraging from an instructor's perspective. It is particularly encouraging to note that the inclusion of writing ability as a component of essay performance was not seen as a drawback of that format. It is unclear, however, whether that result is primarily due to the students having had experiences that suggest that most instructors do not pay particularly close attention to writing when grading or whether the students truly accept the idea that writing is an important skill in business and is therefore a reasonable thing to assess. It is curious that being able to guess on multiple-choice loaded positively on a fairness-related factor. Possibly the students agree with the Hassmen and Hunt (1994) contention that guessing is valid because it is generally based on at least partial knowledge of the relevant course content. These results also agree with Bridgeman's (2006) finding that although 81% of the students reported preferring multiple-choice, only 43% felt that multiple-choice was more valid. The fact that ease appears to be more important to the students than validity also supports O'Neill (2001), who studied multiple sections of the same course, some of which received multiple-choice exams and others essay exams, and found that the essay sections' students were the ones who complained about being treated badly.

The data for the two questions on preference as a function of preparedness were coded using 1 for students who preferred an exam made up entirely of multiple-choice, 2 for those who wanted a mix of multiple-choice and essay, and 3 for those who preferred entirely essay. The mean response was 1.74 under the hypothetical situation in which the student was well prepared for the upcoming exam and only 1.44 for the situation in which the student was poorly prepared. With both means less than 2.0, these results match those seen earlier in which the students in general preferred multiple-choice. However, the preference for multiple-choice diminished as the students became more prepared. A paired t test of the difference between the well prepared preference and the poorly prepared preference resulted in a t score of 3.829 and a significance level of less than .001, suggesting that the appeal of essays increases as preparedness does. In light of the results discussed above, this suggests that the fairness and validity attraction of essays may begin to overcome the easiness attraction of multiple-choice when the students have prepared sufficiently well to not fear the increased difficulty of essay questions.

Although the second section of the survey was focused on only the classes in which the respondents were registered (all of which were taught by the author) and thus asked questions geared specifically towards methods used in those classes, some of the results from that section are relevant to the discussion here. These classes all utilized a pedagogical approach in which the students were given a study guide in advance of the coverage of each chapter that pointed out the key concepts to learn for that chapter. Each study guide contained two sections, one with a list of terms and concepts to learn and another with several essay questions. The students were told to know and understand the definitions for each of the terms on the top half of the page and to recognize the key management issues related to each term. They were also told to be ready to fully answer every essay question as all of them were eligible to be selected for the exam. The essays were often broadly focused and required the students to contrast or integrate a variety of ideas, often including ideas from previous chapters. It was made very clear that complete, detailed and organized answers were required for the essays. The exams were a mix of multiple-choice and essay, with the one or two essay questions drawn word-for-word from the study guides and the multiple-choice designed such that students who had learned all of the terms from the study guides should be able to achieve a good score. The students were not able to use the text during the exams but were allowed to use their notes. They were also allowed to write out essay answers in advance of the test if they chose to do so and then simply hand in those answers if the questions they had answered turned out to be on the exam. The instructor's goal, as was explained to the class on the first day of the semester, was to provide students with a significant incentive to study carefully and take good notes, operating under the assumption that most students who did so would learn the material successfully (an assumption that end-of-semester instructor evaluation surveys have supported repeatedly).

This section of the survey was structured similarly to the first section, containing a series of statements followed by two questions concerning exam preference as a function of preparedness. Four of the statements referred to whether the respondent was "encouraged to study more diligently than I would have otherwise" as a result of, respectively, the study guides, the use of open notes exams, the presence of essays on the exams, and the requirement for the essays to be answered very completely in order to obtain a good score. Two other statements concerned whether being required to provide more complete essay answers than usual was made fair by the student's ability to see the questions in advance on the study guides and whether the student felt that the extra study effort put forth had actually resulted in greater learning. The students demonstrated highly significant agreement with these statements, with all of them achieving a level of significance less than .001. Taken as a group, these statements suggest several things. First, students are prompted to study harder both by the presence of essays on an exam and by the fact that those essays will be rigorously graded. Second, students find it reasonable to include rigorously graded essays if the content of the questions can be viewed in advance to direct the students' study efforts. And third, students believe that their increased efforts led to greater learning than in the usual class.

These results support the literature discussed previously that demonstrated that prompting students to adopt deep-oriented study strategies would tend to encourage them to do so and that this increased adoption of deep-oriented strategies would tend to lead to an increased level of learning. Narrowing the focus to this paper's comparison of multiple-choice and essay questions, these results suggest that students are simply more fearful of essays and recognize that they need to put forth more effort to answer them effectively.

IMPLICATIONS FOR TEST CONSTRUCTION

The results of this study demonstrate that students tend to prefer multiple-choice and that their preference is driven largely by the belief that this type of question is easier, with the preference becoming even stronger when the students are poorly prepared for an exam. More encouragingly from a faculty perspective, the results also demonstrate that students appear to have an appreciation for the fairness and validity of essay questions as a measure of the success of their learning efforts and become more accepting of essays when well prepared for an exam. The results also support previous findings in the literature, for example Entwistle and Entwistle (1992), Scouller (1998), and Trigwell, Prosser and Waterhouse (1999), that suggest that instructor efforts to promote increased student utilization of deep learning strategies can be successful.

These results, in conjunction with previous work discussed earlier in the paper, prompt a variety of pedagogical recommendations for faculty who are being pressured by time or budget constraints to make greater use of multiple-choice questions. First, faculty should recognize that it is possible to test certain types of higher-order learning using the multiple-choice format and that a multiple-choice exam does not therefore have to be an assessment of only memorization and recall (Anderson et al., 2001; Suskie, 2004; Wainer and Thissen, 1993). Thus an increase in the use of multiple-choice questions does not necessarily lead to an inferior exam, although faculty should be aware when writing multiple-choice questions that there can be an unintentional bias toward writing questions focused on lower-level learning simply because those questions are so much easier to create (Roediger, 2005).

The results for statements 3, 4, 5 and 9 demonstrate that students believe multiple-choice questions to be easier. Thus a second recommendation is that faculty who attempt to develop higher-order multiple-choice questions must make it clear to students that these questions will call for deeper learning than is normally required by multiple-choice. Otherwise the use of multiple-choice may encourage the students to apply primarily surface-oriented study methods or to simply put less effort into studying (Gijbels, 2005; Scouller, 1998; Scouller & Prosser, 1994). They would therefore fail to achieve a level of learning sufficient to succeed on the unexpectedly difficult exam. It would be worthwhile to provide each class with example questions from previous semesters that could be utilized to demonstrate the difference between lower-order questions and those focused on higher-order learning in the context of the discipline of the course.

As an extension of the second recommendation, faculty should consider the possibility of broadening the discussion beyond simply test questions to consider higher--versus lower-order learning in general by beginning each semester with a discussion of relevant pedagogical topics such as Bloom's Taxonomy and the utilization of deep learning strategies (Cassady, 2004; Hagedorn, Sagher & Siadat, 2000; Taylor & Hyde, 2000). Most undergraduates aren't exposed to these topics and therefore haven't received the guidance that would enable them to more fully develop their own study skills. Anderson et al. (2004) provide a good discussion of Bloom's Taxonomy.

A third and closely related recommendation is that faculty should emphasize the fact that exams including higher-order multiple-choice questions will more closely mimic an essay exam's ability to distinguish between students who have utilized surface strategies to memorize a list of isolated facts and those who have applied deep strategies to develop a more complete understanding of course content. Traditional lower-order multiple-choice questions often fail to reward the use of deep strategies because surface strategies are sufficient, i.e. the students who develop a deeper understanding don't benefit tangibly via higher test scores because that deeper understanding isn't required to answer the exam questions correctly. As was demonstrated by statement 10 in this study, students have an appreciation for the fairness and validity of essays due to their ability to reward those who have worked to achieve higher-order learning. Students need to be convinced that higher-order multiple-choice questions will similarly reward those who have achieved greater understanding.

The results from the second section of this study's survey suggest a fourth recommendation that it can be very beneficial for faculty to provide clear guidelines to students concerning the level of learning expected for each course topic. With years of experience taking traditional multiple-choice exams, in which most exam questions call primarily for memorization and recall, many students may conclude that all topics deserve approximately the same rather cursory study. With the addition of higher-order questions, however, it becomes necessary to let students know which topics are fully deserving of deeper study. Providing study guides, grading rubrics and old exams and assignments to demonstrate the level of understanding required can be an important motivator (Driessen & van der Leuten, 2000). By the way, it should be noted that in most courses there are indeed quite a few topics that really don't deserve more than cursory coverage and therefore probably shouldn't be tested via anything other than lower-order questions. Faculty must overcome the natural tendency to feel that every topic within their discipline is interesting and important in order to focus the students' deeper-oriented study efforts on the truly critical ideas.

A fifth recommendation is that faculty must allow students sufficient time both for the exam specifically and for studying and learning in general. The inclusion of higher-order multiple-choice questions will result in exams that require more reasoning time per question. If faculty write exams containing the same number of questions as before and provide the same amount of time as before, the students will be forced to approach the new higher-order questions with the same rather cursory approach that so many students have learned to utilize for multiple-choice from years of experience with lower-order questions. This situation will not only seem tremendously unfair to the students but may also lead many of them to conclude that there is little payoff from utilizing the deep study strategies that the faculty are hoping to foster. Although no faculty want to offer a class that will become known as easy, it is also important to make sure that students are given achievable learning goals and reasonable (though challenging) workloads. As noted by Taylor and Hyde (2000), students tend to gravitate toward surface strategies in order to quickly accumulate a list of test-ready facts when there isn't enough time to learn more deeply. Providing students with insufficient time to prepare for an exam or insufficient time to take an exam may sabotage faculty efforts to encourage the adoption of deep learning strategies.

A final particularly critical recommendation is that faculty should make certain to provide timely detailed feedback. One of the characteristics of students practicing deep learning strategies is a desire to monitor their own understanding and correct misconceptions as they occur (Chi, Bassok, Lewis, Reimann & Glaser, 1989; Minbashian et al., 2004). To facilitate this process, faculty will need to avoid the temptation to protect multiple-choice questions for repeat use in future semesters by preventing students from seeing anything more than just their exam scores. The inclusion of higher-order questions, which by definition involve more complex reasoning than would usually be seen in multiple-choice exams, makes this feedback even more critical. Faculty must hand back the exam and discuss such questions in detail so that students will fully understand the complex reasoning processes required. By carefully designing the incorrect answer options so that each incorrect answer would appear correct to students guilty of a common misconception or reasoning error, the exam can be used to diagnose these common mistakes. The incorrect answers then become a useful device to drive an enlightening discussion about those common misconceptions and reasoning errors, why they are incorrect, and how to correctly apply the ideas and skills of the discipline to find the right answer. In this way the students' incorrect answers can prompt valuable learning rather than becoming little more than uncorrected mistakes that the students may vaguely remember later as being correct (Roediger, 2005). Purdie, Hattie and Douglas (1996) investigated student utilization of a variety of learning strategies and found that the least-used of the various tactics was reviewing feedback. This suggests that students may view the score as the ultimate outcome of an exam, regardless of the type of questions used. It is important that faculty promoting deep learning strategies make the provision of meaningful informative feedback a key part of the learning process.

REFERENCES

Anderson, L.W., D.R. Krathwohl, P.W. Airasian, K.W. Cruikshank, R.E. Mayer, P.R. Pintrich et al. (Eds.) (2001). A taxonomy for learning, teaching and assessing. New York: Addison Wesley Longman.

Baxter Magolda, M.B. (2004). Evolution of a constructivist conceptualization of epistemological reflection. Educational Psychologist, 39(1), 31-42. Retrieved June 13, 2007, from the ERIC database.

Becker, W.E. & C. Johnston (1999). The relationship between multiple choice and essay response questions in assessing economics understanding. The Economic Record, 75(231), 348-357. Retrieved February 24, 2006 from the Business Source Premier database.

Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats [Electronic version]. Journal of Educational Measurement, 29(3), 253-271.

Cassady, J.C. (2004). The influence of cognitive test anxiety across the learning-testing cycle [Electronic version]. Learning and Instruction, 14, 569-592.

Chi, M.T.H., M. Bassok, M.W.Lewis, P. Reimann & R. Glaser (1989). Self-explanations: How students study and use examples in learning to solve problems [Electronic version]. Cognitive Science, 13, 145-182.

Diseth, A. & O. Martinsen (2003). Approaches to learning, cognitive style, and motives as predictors of academic achievement. Educational Psychology, 23(2), 195-207. Retrieved July 20, 2004 from the Academic Search Premier database.

Driessen, E. & C. van der Vleuten (2000). Matching student assessment to problem-based learning: Lessons from experience in a law faculty. Studies in Continuing Education, 22(2), 235-248. Retrieved June 14, 2007, from the Academic Search Premier database.

Entwistle, A. & N. Entwistle (1992). Experiences of understanding in revising for degree examinations. Learning and Instruction, 2, 1-22. Retrieved June 19, 2007, from the Science Direct database.

Gijbels, D. (2005). The relationship between students' approaches to learning and the assessment of learning outcomes [Electronic version]. European Journal of Psychology of Education, 20(4). Retrieved February 24, 2006, from the Academic Search Premiere database.

Gravoso, R.S., A.E. Pasa & T. Mori (2002). Influence of students' prior learning experiences, learning conceptions and approaches on their learning outcomes [Electronic version]. Proceedings of the Higher Education Research and Development Society of Australasia, 282-289.

Hagedorn, L.S., Y. Sagher & M.V. Siadat (2000). Building study skills in a college mathematics classroom [Electronic version]. Journal of General Education, 49(2), 132-155.

Harris, R.B. & W.C. Kerby (1997). Statewide performance assessment as a complement to multiple-choice testing in high school economics. Journal of Economic Education, 28(2), 122-134. Retrieved June 14, 2007 from the Business Source Premier database.

Hassmen, P. & D.P. Hunt (1994). Human self-assessment in multiple-choice testing [Electronic version]. Journal of Educational Measurement, 31(2), 149-160.

Lukhele, R., D. Thissen & H. Wainer (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests [Electronic version]. Journal of Educational Measurement, 31(3), 234-250.

Minbashian, A., G.F. Huon & K.D. Bird (2004). Approaches to studying and academic performance in short-essay exams. Higher Education, 47, 161-176. Retrieved Jun 14, 2007, from the Academic Search Premier database.

O'Neill, P.B. (2001). Essay versus multiple-choice exams: An experiment in the principles of macroeconomics course. The American Economist, 45(1), 62-70. Retrieved February 24, 2006 from the Business Source Premier database.

Powers, D.E., M.E. Fowles, M. Farnum & P. Ramsey (1994). Will they think less of my handwritten essay if others word process theirs? Effects on essay scores of intermingling handwritten and word-processed essays [Electronic version]. Journal of Educational Measurement, 31(3), 220-233.

Purdie, N., J. Hattie & G. Douglas (1996). Student conceptions of learning ahte their use of self-regulated learning strategies: A cross-cultural comparison. Journal of Educational Psychology, 88(1), 87-100. Retrieved June 12, 2007, from the PsycArticles Publications database.

Raimondo, H.J., L. Esposito & I. Gershenberg (1990). Introductory class size and student performance in intermediate theory courses [Electronic version]. Journal of Economic Education, 21(4), 369-381. Retrieved June 11, 2007, from the Business Source Premier database.

Richardson, J.T.E. (1995). Mature students in higher education: II: An investigation of approaches to studying and academic performance. Studies in Higher Education, 20(1), 5-17. Retrieved May 29, 2006, from the Academic Search Premier database.

Roediger, H.L. III (2005). The positive and negative consequences of multiple-choice testing [Electronic version]. Journal of Experimental Psychology: Learning, Memory & Cognition, 31(5), 1155-1159.

Scouller, K. (1998). The influence of assessment method on students' learning approaches: Multiple choice question examination versus assignment essay [Electronic version]. Higher Education, 35, 453-472.

Scouller, K.M. & M. Prosser (1994). Students' experiences in studying for multiple choice question examinations. Studies in Higher Education, 19(3), 267-279. Retrieved May 29, 2006, from the Academic Search Premier database.

Suskie, L. (2004). Assessing Student Learning: A common sense guide. Bolton, MA: Anker Publishing.

Taylor, R. & M. Hyde (2000). Learning context and students' perceptions of context influence student learning approaches and outcomes in animal science 2. Proceedings of the Teaching and Educational Development Institute Conference on Effective Teaching and Learning at University. Retrieved June 13, 2007, from http://www.tedi.uq.edu.au/conferences/teach_conference00/ papers/taylor-hyde.html

Thissen, D., H. Wainer & X-B Wang (1994). Are tests comprising both multiple-choice and free-response items necessarily less unidimensional than multiple-choice tests? An analysis of two tests [Electronic version]. Journal of Educational Measurement, 31(2), 113-123.

Trigwell, K., M. Prosser & F. Waterhouse (1999). Relations between teachers' approaches to teaching and students' approaches to learning [Electronic version]. Higher Education, 37, 57-70.

Walstad, W.B. & W.E. Becker (1994). Achievement differences on multiple-choice and essay tests in economics [Electronic version]. Proceedings of the American Economic Association, 193-196.

Wainer, H. & D. Thissen (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction [Electronic version]. Applied Measurement in Education, 6(2), 103-118.

David A. Parmenter, Northern Arizona University-Yuma

Table 1: Hypothesis Tests of Student Agreement with Assessment
Method Statements

Statements (abbreviated
versions) Mean t score Significance

(1) I prefer multiple- 3.30 2.375 .01
choice questions to essay
questions

(2) I prefer essay 2.84 -1.227 .888
questions to multiple-
choice questions

(3) Multiple-choice 3.38 3.339 <.001
easier: I only need to
recognize correct answer

(4) Multiple-choice 3.70 6.008 <.001
easier: I can eliminate
obviously wrong answers

(5) Multiple-choice 3.52 5.208 <.001
easier: I might guess
correctly even if I don't
know

(6) Multiple-choice 2.79 -1.645 .948
harder: I need to know
nit-picky details

(7) Multiple-choice less 3.47 3.977 <.001
fair: I can't earn
partial credit

(8) Essay easier: I can 3.14 1.085 .141
earn partial credit even
if topic knowledge low

(9) Essay harder: I must 3.42 3.693 <.001
fully understand topic to
produce good answer

(10) Essay more fair: 3.69 6.888 <.001
More accurately show what
I know/don't know

(11) Essay less fair: 2.70 -2.449 .991
Content knowledge results
biased by writing skill