首页    期刊浏览 2024年10月08日 星期二
登录注册

文章基本信息

  • 标题:Bilingual Distributed Word Representations from Document-Aligned Comparable Data
  • 本地全文:下载
  • 作者:Ivan Vulić ; Marie-Francine Moens
  • 期刊名称:Journal of Artificial Intelligence Research
  • 印刷版ISSN:1076-9757
  • 出版年度:2016
  • 卷号:55
  • 页码:953-994
  • 出版社:American Association of Artificial
  • 摘要:We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs which heavily relied on parallel sentence-aligned corpora and/or readily available translation resources such as dictionaries, the article reveals that BWEs may be learned solely on the basis of document-aligned comparable data without any additional lexical resources nor syntactic information. We present a comparison of our approach with previous state-of-the-art models for learning bilingual word representations from comparable data that rely on the framework of multilingual probabilistic topic modeling (MuPTM), as well as with distributional local context-counting models. We demonstrate the utility of the induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) suggesting word translations in context for polysemous words. Our simple yet effective BWE-based models significantly outperform the MuPTM-based and context-counting representation models from comparable data as well as prior BWE-based models, and acquire the best reported results on both tasks for all three tested language pairs.
国家哲学社会科学文献中心版权所有