首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:A Scalable Distributed Syntactic, Semantic, and Lexical Language Model
  • 本地全文:下载
  • 作者:Ming Tan ; Wenli Zhou ; Lei Zheng
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2012
  • 卷号:38
  • 期号:3
  • 页码:631-671
  • DOI:10.1162/COLI_a_00107
  • 语种:English
  • 出版社:MIT Press
  • 摘要:This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.
国家哲学社会科学文献中心版权所有