首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning
  • 本地全文:下载
  • 作者:Shay B. Cohen ; Noah A. Smith
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2012
  • 卷号:38
  • 期号:3
  • 页码:479-526
  • DOI:10.1162/COLI_a_00092
  • 语种:English
  • 出版社:MIT Press
  • 摘要:Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk.
国家哲学社会科学文献中心版权所有