文章基本信息

标题：Neural Networks Classifier for Data Selection in Statistical Machine Translation
本地全文：下载
作者：Álvaro Peris ; Mara Chinea-Ríos ; Francisco Casacuberta 等
期刊名称：The Prague Bulletin of Mathematical Linguistics
印刷版ISSN：0032-6585
电子版ISSN：1804-0462
出版年度：2017
卷号：108
期号：1
页码：283-294
DOI：10.1515/pralin-2017-0027
语种：English
出版社：Walter de Gruyter GmbH
摘要：Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.