文章基本信息

标题：Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
本地全文：下载
作者：Wei Wang ; Jonathan May ; Kevin Knight 等
期刊名称：Computational Linguistics
印刷版ISSN：0891-2017
电子版ISSN：1530-9312
出版年度：2010
卷号：36
期号：2
页码：247-277
DOI：10.1162/coli.2010.36.2.09054
语种：English
出版社：MIT Press
摘要：This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation.