首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-DialectalArabic Sequence Labeling
  • 本地全文:下载
  • 作者:Muhammad Khalifa ; Muhammad Abdul-Mageed ; Khaled Shaalan
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:769-782
  • DOI:10.18653/v1/2021.eacl-main.65
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:A sufficient amount of annotated data is usually required to fine-tune pre-trained language models for downstream tasks. Unfortunately, attaining labeled data can be costly, especially for multiple language varieties and dialects. We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve performance on data-scarce varieties using only resources from data-rich ones. We demonstrate the utility of our approach in the context of Arabic sequence labeling by using a language model fine-tuned on Modern Standard Arabic (MSA) only to predict named entities (NE) and part-of-speech (POS) tags on several dialectal Arabic (DA) varieties. We show that self-training is indeed powerful, improving zero-shot MSA-to-DA transfer by as large as ˷10% F.
国家哲学社会科学文献中心版权所有