首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Neural Networks Classifier for Data Selection in Statistical Machine Translation
  • 本地全文:下载
  • 作者:Álvaro Peris ; Mara Chinea-Ríos ; Francisco Casacuberta
  • 期刊名称:The Prague Bulletin of Mathematical Linguistics
  • 印刷版ISSN:0032-6585
  • 电子版ISSN:1804-0462
  • 出版年度:2017
  • 卷号:108
  • 期号:1
  • 页码:283-294
  • DOI:10.1515/pralin-2017-0027
  • 语种:English
  • 出版社:Walter de Gruyter GmbH
  • 摘要:Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.
国家哲学社会科学文献中心版权所有