首页    期刊浏览 2024年12月05日 星期四
登录注册

文章基本信息

  • 标题:A Comparison of Methods for Identifying the Translation of Words in a Comparable Corpus: Recipes and Limits
  • 本地全文:下载
  • 作者:Laurent Jakubina ; Philippe Langlais
  • 期刊名称:Computación y Sistemas
  • 印刷版ISSN:1405-5546
  • 出版年度:2016
  • 卷号:20
  • 期号:3
  • 页码:449-458
  • 语种:English
  • 出版社:Instituto Politécnico Nacional
  • 其他摘要:Identifying translations in comparable corpora is a challenge that has attracted many researchers since a long time. It has applications in several applications including Machine Translation and Cross-lingual Information Retrieval. In this study we compare three state-of-the-art approaches for these tasks: the so-called context-based projection method, the projection of monolingual word embeddings, as well as a method dedicated to identify translations of rare words. We carefully explore the hyper-parameters of each method and measure their impact on the task of identifying the translation of English words in Wikipedia into French. Contrary to the standard practice, we designed a test case where we do not resort to heuristics in order to pre-select the target vocabulary among which to find translations, therefore pushing each method to its limit. We show that all the approaches we tested have a clear bias toward frequent words. In fact, the best approach we tested could identify the translation of a third of a set of frequent test words, while it could only translate around 10% of rare words.
  • 其他关键词:Comparable corpora; bilingual lexicon induction; distributional approaches; rare word translation.
国家哲学社会科学文献中心版权所有