首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:Using a Frustratingly Easy Domain and Tagset Adaptation for CreatingSlavic Named Entity Recognition Systems
  • 本地全文:下载
  • 作者:Luis Adrián Cabrera-Diego ; Jose G. Moreno ; Antoine Doucet
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:98-104
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:We present a collection of Named Entity Recognition (NER) systems for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian and Ukrainian. These NER systems have been trained using different BERT models and a Frustratingly Easy Domain Adaptation (FEDA). FEDA allow us creating NER systems using multiple datasets without having to worry about whether the tagset (e.g. Location, Event, Miscellaneous, Time) in the source and target domains match, while increasing the amount of data available for training. Moreover, we boosted the prediction on named entities by marking uppercase words and predicting masked words. Participating in the 3rd Shared Task on SlavNER, our NER systems reached a strict match micro F-score of up to 0.908. The results demonstrate good generalization, even in named entities with weak regularity, such as book titles, or entities that were never seen during the training.
国家哲学社会科学文献中心版权所有