首页    期刊浏览 2024年09月29日 星期日
登录注册

文章基本信息

  • 标题:Using WFSTs for Efficient EM Learning of Probabilistic CFGs and Their Extensions
  • 本地全文:下载
  • 作者:Yoshitaka Kameya ; Takashi Mori ; Taisuke Sato
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2014
  • 卷号:9
  • 期号:4
  • 页码:517-556
  • DOI:10.11185/imt.9.517
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:Probabilistic context-free grammars (PCFGs) are a widely known class of probabilistic language models. The Inside-Outside (I-O) algorithm is well known as an efficient EM algorithm tailored for PCFGs. Although the algorithm requires inexpensive linguistic resources, there remains a problem in its efficiency. This paper presents an efficient method for training PCFG parameters in which the parser is separated from the EM algorithm, assuming that the underlying CFG is given. A new EM algorithm exploits the compactness of well-formed substring tables (WFSTs) generated by the parser. Our proposal is general in that the input grammar need not take Chomsky normal form (CNF) while it is equivalent to the I-O algorithm in the CNF case. In addition, we propose a polynomial-time EM algorithm for CFGs with context-sensitive probabilities, and report experimental results with the ATR dialogue corpus and a hand-crafted Japanese grammar.
  • 关键词:Probabilistic context-free grammars;EM algorithm;Inside-Outside algorithm;Well-formed substring tables
国家哲学社会科学文献中心版权所有