期刊名称:Journal of Methods and Measurement in the Social Sciences
电子版ISSN:2159-7855
出版年度:2011
卷号:2
期号:2
页码:80-101
DOI:10.2458/v2i2.15990
出版社:University of Arizona Libraries
摘要:Measuring individuals or groups longitudinally is frequently necessary in social science research and applications.Substantial research and discussion has focused on the statistical properties of measures of change and some of the psychometric problems involved This monte-carlo simulation study focused on properties of the measurement instruments used for obtaining scores that represent change or growth over five time points and examined how well scores from conventional tests and computerized adaptive tests used to measure individual growth curves reflect true change.Data representing four different patterns of individual change and a baseline no-change condition were generated from an item response theory (IRT) model.Different tests simulated were conventional peaked tests with narrow and wider difficulties and three levels of discrimination, and computerized adaptive tests (CATs) drawn from banks with the same levels of discrimination.Conventional tests were scored by number correct and IRT weighted maximum likelihood.Results showed that as the examinees’ scores moved from the difficulty levels at which the tests were concentrated, number-correct scores over-estimated true change and had increasing amounts of error.High discrimination conventional tests had the poorest recovery of change for both groups and individuals.IRT scoring of the conventional tests improved recovery of change somewhat.By contrast, CATs consistently estimated growth with minimum and consistent error and performed best with highly discriminating items.
其他摘要:Measuring individuals or groups longitudinally is frequently necessary in social science research and applications. Substantial research and discussion has focused on the statistical properties of measures of change and some of the psychometric problems involved This monte-carlo simulation study focused on properties of the measurement instruments used for obtaining scores that represent change or growth over five time points and examined how well scores from conventional tests and computerized adaptive tests used to measure individual growth curves reflect true change. Data representing four different patterns of individual change and a baseline no-change condition were generated from an item response theory (IRT) model. Different tests simulated were conventional peaked tests with narrow and wider difficulties and three levels of discrimination, and computerized adaptive tests (CATs) drawn from banks with the same levels of discrimination. Conventional tests were scored by number correct and IRT weighted maximum likelihood. Results showed that as the examinees’ scores moved from the difficulty levels at which the tests were concentrated, number-correct scores over-estimated true change and had increasing amounts of error. High discrimination conventional tests had the poorest recovery of change for both groups and individuals. IRT scoring of the conventional tests improved recovery of change somewhat. By contrast, CATs consistently estimated growth with minimum and consistent error and performed best with highly discriminating items. DOI:10.2458/azu_jmmss_v2i2_weiss