首页    期刊浏览 2024年11月06日 星期三
登录注册

文章基本信息

  • 标题:Text Document Clustering:Wordnet vs.TF-IDFvs. Word Embeddings
  • 本地全文:下载
  • 作者:Michał Marcińczuk ; Mateusz Gniewkowski ; Tomasz Walkowiak
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:207-214
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:In the paper, we deal with the problem of unsupervised text document clustering for the Polish language. Our goal is to compare the modern approaches based on language modeling (doc2vec and BERT) with the classical ones, i.e., TF-IDF and wordnet-based. The experiments are conducted on three datasets containing qualification descriptions. The experiments’ results showed that wordnet-based similarity measures could compete and even outperform modern embedding-based approaches.
国家哲学社会科学文献中心版权所有