首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Half-Context Language Models
  • 本地全文:下载
  • 作者:Hinrich Schütze ; Michael Walsh
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2011
  • 卷号:37
  • 期号:4
  • 页码:843-865
  • DOI:10.1162/COLI_a_00078
  • 语种:English
  • 出版社:MIT Press
  • 摘要:This article investigates the effects of different degrees of contextual granularity on language model performance. It presents a new language model that combines clustering and half-contextualization, a novel representation of contexts. Half-contextualization is based on the half-context hypothesis that states that the distributional characteristics of a word or bigram are best represented by treating its context distribution to the left and right separately and that only directionally relevant distributional information should be used. Clustering is achieved using a new clustering algorithm for class-based language models that compares favorably to the exchange algorithm. When interpolated with a Kneser-Ney model, half-context models are shown to have better perplexity than commonly used interpolated n-gram models and traditional class-based approaches. A novel, fine-grained, context-specific analysis highlights those contexts in which the model performs well and those which are better treated by existing non-class-based models.
国家哲学社会科学文献中心版权所有