首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:A Figure of Merit for the Evaluation of Web-Corpus Randomness
  • 本地全文:下载
  • 作者:Massimiliano Ciaramita ; Marco Baroni
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2006
  • 卷号:2006
  • 出版社:ACL Anthology
  • 摘要:In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness of a collection of documents (corpus), with respect to a number of biased partitions. The method is based on the comparison of the word frequency distribution of the target corpus to word frequency distributions from corpora built in deliberately biased ways. We apply the method to the task of building a corpus via queries to Google. Our results indicate that this approach can be used, reliably, to discriminate biased and unbiased document collections and to choose the most appropriate query terms.
国家哲学社会科学文献中心版权所有