首页    期刊浏览 2024年09月12日 星期四
登录注册

文章基本信息

  • 标题:A Generative Dependency N-gram Language Model: Unsupervised Parameter Estimation and Application
  • 本地全文:下载
  • 作者:Chenchen Ding ; Mikio Yamamoto
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2014
  • 卷号:9
  • 期号:4
  • 页码:857-885
  • DOI:10.11185/imt.9.857
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:We design a language model based on a generative dependency structure for sentences. The parameter of the model is the probability of a dependency N-gram , which is composed of lexical words with four types of extra tag used to model the dependency relation and valence. We further propose an unsupervised expectation-maximization algorithm for parameter estimation, in which all possible dependency structures of a sentence are considered. As the algorithm is language-independent, it can be used on a raw corpus from any language, without any part-of-speech annotation, tree-bank or trained parser. We conducted experiments using four languages, i.e., English, German, Spanish and Japanese, to illustrate the applicability and the properties of the proposed approach. We further apply the proposed approach to a Chinese microblog data set to extract and investigate Internet-based, non-standard lexical dependency features of user-generated content.
  • 关键词:N-gram language model;Generative dependency structure;Unsupervised algorithm;Microblog
国家哲学社会科学文献中心版权所有