期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2012
卷号:2012
出版社:ACL Anthology
摘要:Two decades after their invention, the IBM
word-based translation models, widely available
in the GIZA++ toolkit, remain the dominant
approach to word alignment and an integral
part of many statistical translation systems.
Although many models have surpassed
them in accuracy, none have supplanted them
in practice. In this paper, we propose a simple
extension to the IBM models: an `0 prior to encourage
sparsity in the word-to-word translation
model.We explain how to implement this
extension efficiently for large-scale data (also
released as a modification to GIZA++) and
demonstrate, in experiments on Czech, Arabic,
Chinese, and Urdu to English translation,
significant improvements over IBM Model 4
in both word alignment (up to +6.7 F1) and
translation quality (up to +1.4 Bleu).