首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation ofNLPSystems Using Checklist
  • 本地全文:下载
  • 作者:Shaily Bhatt ; Rahul Jain ; Sandipan Dandapat
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:120-130
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:Despite state-of-the-art performance, NLP systems can be fragile in real-world situations. This is often due to insufficient understanding of the capabilities and limitations of models and the heavy reliance on standard evaluation benchmarks. Research into non-standard evaluation to mitigate this brittleness is gaining increasing attention. Notably, the behavioral testing principle ‘Checklist’, which decouples testing from implementation revealed significant failures in state-of-the-art models for multiple tasks. In this paper, we present a case study of using Checklist in a practical scenario. We conduct experiments for evaluating an offensive content detection system and use a data augmentation technique for improving the model using insights from Checklist. We lay out the challenges and open questions based on our observations of using Checklist for human-in-loop evaluation and improvement of NLP systems. Disclaimer: The paper contains examples of content with offensive language. The examples do not represent the views of the authors or their employers towards any person(s), group(s), practice(s), or entity/entities.
国家哲学社会科学文献中心版权所有