文章基本信息

标题：Models of dataset size, question design, and cross-language speech perception for speech crowdsourcing applications
本地全文：下载
作者：Mark Hasegawa-Johnson ; Jennifer Cole ; Preethi Jyothi 等
期刊名称：Laboratory Phonology: Journal of the Association for Laboratory Phonology
电子版ISSN：1868-6354
出版年度：2015
卷号：6
期号：3-4
页码：381-431
DOI：10.1515/lp-2015-0012
摘要：Abstract Transcribers make mistakes. Workers recruited in a crowdsourcing marketplace, because of their varying levels of commitment and education, make more mistakes than workers in a controlled laboratory setting. Methods for compensating transcriber mistakes are desirable because, with such methods available, crowdsourcing has the potential to significantly increase the scale of experiments in laboratory phonology. This paper provides a brief tutorial on statistical learning theory, introducing the relationship between dataset size and estimation error, then presents a theoretical description and preliminary results for two new methods that control labeler error in laboratory phonology experiments. First, we discuss the method of crowdsourcing over error-correcting codes. In the error-correcting-code method, each difficult labeling task is first factored, by the experimenter, into the product of several easy labeling tasks (typically binary). Factoring increases the total number of tasks, nevertheless it results in faster completion and higher accuracy, because workers unable to perform the difficult task may be able to meaningfully contribute to the solution of each easy task. Second, we discuss the use of explicit mathematical models of the errors made by a worker in the crowd. In particular, we introduce the method of mismatched crowdsourcing, in which workers transcribe a language they do not understand, and an explicit mathematical model of second-language phoneme perception is used to learn and then compensate their transcription errors. Though introduced as technologies that increase the scale of phonology experiments, both methods have implications beyond increased scale. The method of easy questions permits us to probe the perception, by untrained listeners, of complicated phonological models; examples are provided from the prosody of English and Hindi. The method of mismatched crowdsourcing permits us to probe, in more detail than ever before, the perception of phonetic categories by listeners with a different phonological system.
关键词：crowdsourcing ; rapid prosody transcription ; second-language phonology ; statistical learning theory