首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
  • 本地全文:下载
  • 作者:Wei Wang ; Jonathan May ; Kevin Knight
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2010
  • 卷号:36
  • 期号:2
  • 页码:247-277
  • DOI:10.1162/coli.2010.36.2.09054
  • 语种:English
  • 出版社:MIT Press
  • 摘要:This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation.
国家哲学社会科学文献中心版权所有