首页    期刊浏览 2024年11月14日 星期四
登录注册

文章基本信息

  • 标题:Challenges in Annotating and Parsing Spoken, Code-switched,Frisian-Dutch Data
  • 本地全文:下载
  • 作者:Anouck Braggaar ; Rob van der Goot
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:50-58
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:While high performance have been obtained for high-resource languages, performance on low-resource languages lags behind. In this paper we focus on the parsing of the low-resource language Frisian. We use a sample of code-switched, spontaneously spoken data, which proves to be a challenging setup. We propose to train a parser specifically tailored towards the target domain, by selecting instances from multiple treebanks. Specifically, we use Latent Dirichlet Allocation (LDA), with word and character N-grams. We use a deep biaffine parser initialized with mBERT. The best single source treebank (nl_alpino) resulted in an LAS of 54.7 whereas our data selection outperformed the single best transfer treebank and led to 55.6 LAS on the test data. Additional experiments consisted of removing diacritics from our Frisian data, creating more similar training data by cropping sentences and running our best model using XLM-R. These experiments did not lead to a better performance.
国家哲学社会科学文献中心版权所有