文章基本信息

标题：Learning Domain-Speciﬁc, L1-Speciﬁc Measures of Word Readability
本地全文：下载
作者：Shane Bergsma ; David Yarowsky
期刊名称：Traitement Automatique des Langues
印刷版ISSN：1248-9433
电子版ISSN：1965-0906
出版年度：2013
卷号：54
期号：1
出版社：ATALA - Assoc Traitement Automatique Langues
摘要：Improved readability ratings for second-language readers could have a huge impact in areas such as education, advertising, and information retrieval. We propose ways to adapt readability measures for users who (a) are proﬁcient in a particular domain, and (b) have a particular native language (L1). Speciﬁcally, we predict the readability of individual words. Our learned models use a range of creative features based on diverse statistical, etymological, lexical, and morphological information. We evaluate on a corpus of computational linguistics articles divided according to seven L1s ; we show that we can accurately predict the target readability scores in this domain. Our technique improves over several reasonable baselines. We provide an in-depth analysis showing which kinds of information are most predictive of word difﬁculty in different L1s, and show how this differs for style and content words.