首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems
  • 本地全文:下载
  • 作者:Ehud Reiter ; Anja Belz
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2009
  • 卷号:35
  • 期号:4
  • 页码:529-558
  • DOI:10.1162/coli.2009.35.4.35405
  • 语种:English
  • 出版社:MIT Press
  • 摘要:There is growing interest in using automatically computed corpus-based evaluation metrics to evaluate Natural Language Generation (NLG) systems, because these are often considerably cheaper than the human-based evaluations which have traditionally been used in NLG. We review previous work on NLG evaluation and on validation of automatic metrics in NLP, and then present the results of two studies of how well some metrics which are popular in other areas of NLP (notably BLEU and ROUGE) correlate with human judgments in the domain of computer-generated weather forecasts. Our results suggest that, at least in this domain, metrics may provide a useful measure of language quality, although the evidence for this is not as strong as we would ideally like to see; however, they do not provide a useful measure of content quality. We also discuss a number of caveats which must be kept in mind when interpreting this and other validation studies.
国家哲学社会科学文献中心版权所有