首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead
  • 本地全文:下载
  • 作者:Neslihan Iskender ; Tim Polzehl ; Sebastian Möller
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:86-96
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:Only a small portion of research papers with human evaluation for text summarization provide information about the participant demographics, task design, and experiment protocol. Additionally, many researchers use human evaluation as gold standard without questioning the reliability or investigating the factors that might affect the reliability of the human evaluation. As a result, there is a lack of best practices for reliable human summarization evaluation grounded by empirical evidence. To investigate human evaluation reliability, we conduct a series of human evaluation experiments, provide an overview of participant demographics, task design, experimental set-up and compare the results from different experiments. Based on our empirical analysis, we provide guidelines to ensure the reliability of expert and non-expert evaluations, and we determine the factors that might affect the reliability of the human evaluation.
国家哲学社会科学文献中心版权所有