首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:FaDA: Fast Document Aligner using Word Embedding
  • 本地全文:下载
  • 作者:Pintu Lohar ; Debasis Ganguly ; Haithem Afli
  • 期刊名称:The Prague Bulletin of Mathematical Linguistics
  • 印刷版ISSN:0032-6585
  • 电子版ISSN:1804-0462
  • 出版年度:2016
  • 卷号:106
  • 期号:1
  • 页码:169-179
  • DOI:10.1515/pralin-2016-0016
  • 语种:English
  • 出版社:Walter de Gruyter GmbH
  • 摘要:FaDA is a free/open-source tool for aligning multilingual documents. It employs a novel crosslingual information retrieval (CLIR)-based document-alignment algorithm involving the distances between embedded word vectors in combination with the word overlap between the source-language and the target-language documents. In this approach, we initially construct a pseudo-query from a source-language document. We then represent the target-language documents and the pseudo-query as word vectors to find the average similarity measure between them. This word vector-based similarity measure is then combined with the term overlap-based similarity. Our initial experiments show that s standard Statistical Machine Translation (SMT)- based approach is outperformed by our CLIR-based approach in finding the correct alignment pairs. In addition to this, subsequent experiments with the word vector-based method show further improvements in the performance of the system.
国家哲学社会科学文献中心版权所有