摘要:Surveys have long been used in physics education research to understand student reasoning and inform course improvements. However, to make analysis of large sets of responses practical, most surveys use a closed-response format with a small set of potential responses. Open-ended formats, such as written free response, can provide deeper insights into student thinking, but take much longer to analyze, especially with a large number of responses. Here, we explore natural language processing as a computational solution to this problem. We create a machine learning model that can take student responses from the Physics Measurement Questionnaire as input, and output a categorization of student reasoning based on different reasoning paradigms. Our model yields classifications with the same level of agreement as that between two humans categorizing the data, but can be done by a computer, and thus can be scaled for large datasets. In this work, we describe the algorithms and methodologies used to create, train, and test our natural language processing system. We also present the results of the analysis and discuss the utility of these approaches for analyzing open-response data in education research.