首页    期刊浏览 2024年10月01日 星期二
登录注册

文章基本信息

  • 标题:Exploring Linguistically-Lightweight Keyword Extraction Techniques for Indexing News Articles in a Multilingual Set-up
  • 本地全文:下载
  • 作者:Jakub Piskorski ; Nicolas Stefanovitch ; Guillaume Jacquet
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:35-44
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:This paper presents a study of state-of-the-art unsupervised and linguistically unsophisticated keyword extraction algorithms, based on statistic-, graph-, and embedding-based approaches, including, i.a., Total Keyword Frequency, TF-IDF, RAKE, KPMiner, YAKE, KeyBERT, and variants of TextRank-based keyword extraction algorithms. The study was motivated by the need to select the most appropriate technique to extract keywords for indexing news articles in a real-world large-scale news analysis engine. The algorithms were evaluated on a corpus of circa 330 news articles in 7 languages. The overall best F1 scores for all languages on average were obtained using a combination of the recently introduced YAKE algorithm and KPMiner (20.1%, 46.6% and 47.2% for exact, partial and fuzzy matching resp.).
国家哲学社会科学文献中心版权所有