文章基本信息

标题：Examining the impact of specific types of item-writing flaws on student performance and psychometric properties of the multiple choice question
本地全文：下载
作者：Hannah Pham ; James Besanko ; Peter Devitt 等
期刊名称：MedEdPublish
电子版ISSN：2312-7996
出版年度：2018
卷号：7
期号：4
页码：1-16
DOI：10.15694/mep.2018.0000225.1
出版社：Association for Medical Education in Europe (AMEE)
摘要：Background: Item-writing flaws (IWFs) are common in multiple choice questions (MCQs) despite item-writing guidelines. Previous studies have shown that IWFs impact validity as observed through student performance, item difficulty, and discrimination. Most previous studies have examined IWFs collectively and have shown that they have a diverse impact. The aim of the study was to determine if the effects of individual types of IWFs are systematic and predictable.Method: A cross-over study design was used. 100 pairs of MCQ items (with and without an IWF) were constructed to test 10 types of IWFs. Medical students were invited to participate in a mock examination. Paper A consisted of 50 flawed followed by 50 unflawed items. Paper B consisted of 50 unflawed followed by 50 flawed items. The effect of each of the IWFs on mean item scores, item difficulty and discrimination were examined.Results: The hypothesised effect of IWFs on mean item scores was confirmed in only 4 out of 10 cases. ‘Longest choice is correct’, ‘Clues to the right answer (Eponymous terms)’ and ‘Implausible distractors’ positively impacted, while ‘Central idea in choices rather than stem’ negatively impacted mean item scores. Other flaws had either the opposite or no statistically significant effect. IWFs did not impact item difficulty or discrimination.Conclusion: The effect of IWFs is neither systematic nor predictable. Unpredictability in assessment produces error and thus loss of validity. Therefore, IWFs should be avoided. Faculties should be encouraged to invest in item-writing workshops in order to improve MCQs. However, the cost of doing so should be carefully weighed against the benefits of developing programmes of assessment.
关键词：Multiple choice question; Item writing flaws; Medical assessment; Psychometrics; Item difficulty; Item discrimination