首页    期刊浏览 2024年11月07日 星期四
登录注册

文章基本信息

  • 标题:Domain Adaptation for Machine Translation with Instance Selection
  • 作者:Ergun Biçici
  • 期刊名称:The Prague Bulletin of Mathematical Linguistics
  • 印刷版ISSN:0032-6585
  • 电子版ISSN:1804-0462
  • 出版年度:2015
  • 卷号:103
  • 期号:1
  • 页码:5-20
  • DOI:10.1515/pralin-2015-0001
  • 语种:English
  • 出版社:Walter de Gruyter GmbH
  • 摘要:Domain adaptation for machine translation (MT) can be achieved by selecting training instances close to the test set from a larger set of instances. We consider 7 different domain adaptation strategies and answer 7 research questions, which give us a recipe for domain adaptation in MT. We perform English to German statistical MT (SMT) experiments in a setting where test and training sentences can come from different corpora and one of our goals is to learn the parameters of the sampling process. Domain adaptation with training instance selection can obtain 22% increase in target 2-gram recall and can gain up to 3:55 BLEU points compared with random selection. Domain adaptation with feature decay algorithm (FDA) not only achieves the highest target 2-gram recall and BLEU performance but also perfectly learns the test sample distribution parameter with correlation 0:99. Moses SMT systems built with FDA selected 10K training sentences is able to obtain F1 results as good as the baselines that use up to 2M sentences. Moses SMT systems built with FDA selected 50K training sentences is able to obtain F1 point better results than the baselines.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有