首页    期刊浏览 2024年11月10日 星期日
登录注册

文章基本信息

  • 标题:Data Strategies for Low-Resource Grammatical Error Correction
  • 本地全文:下载
  • 作者:Simon Flachs ; Felix Stahlberg ; Shankar Kumar
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:117-122
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:Grammatical Error Correction (GEC) is a task that has been extensively investigated for the English language. However, for low-resource languages the best practices for training GEC systems have not yet been systematically determined. We investigate how best to take advantage of existing data sources for improving GEC systems for languages with limited quantities of high quality training data. We show that methods for generating artificial training data for GEC can benefit from including morphological errors. We also demonstrate that noisy error correction data gathered from Wikipedia revision histories and the language learning website Lang8, are valuable data sources. Finally, we show that GEC systems pre-trained on noisy data sources can be fine-tuned effectively using small amounts of high quality, human-annotated data.
国家哲学社会科学文献中心版权所有