首页    期刊浏览 2025年07月10日 星期四
登录注册

文章基本信息

  • 标题:Using the Web to Obtain Frequencies for Unseen Bigrams
  • 本地全文:下载
  • 作者:Frank Keller ; Mirella Lapata
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2003
  • 卷号:29
  • 期号:3
  • 页码:459-484
  • DOI:10.1162/089120103322711604
  • 语种:English
  • 出版社:MIT Press
  • 摘要:This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudo disambiguation task.
国家哲学社会科学文献中心版权所有