期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2021
卷号:2021
页码:192-202
DOI:10.18653/v1/2021.eacl-srw.25
语种:English
出版社:ACL Anthology
摘要:Type- and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word, which is encoded even in the higher layers of BERT representations. By reducing the influence of orthography we considerably improve BERT’s performance.