期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2012
卷号:2012
出版社:ACL Anthology
摘要:In this paper we show how to train statistical
machine translation systems on reallife
tasks using only non-parallel monolingual
data from two languages. We present a modification
of the method shown in (Ravi and
Knight, 2011) that is scalable to vocabulary
sizes of several thousand words. On the task
shown in (Ravi and Knight, 2011) we obtain
better results with only 5% of the computational
effort when running our method with
an n-gram language model. The efficiency
improvement of our method allows us to run
experiments with vocabulary sizes of around
5,000 words, such as a non-parallel version of
the VERBMOBIL corpus. We also report results
using data from the monolingual French
and English GIGAWORD corpora.