首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Pulling Out the Stops: Rethinking Stopword Removal for Topic Models
  • 本地全文:下载
  • 作者:Alexandra Schofield ; Måns Magnusson ; David Mimno
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2017
  • 卷号:2017
  • 页码:432-436
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:It is often assumed that topic models benefit from the use of a manually curated stopword list. Constructing this list is time-consuming and often subject to user judgments about what kinds of words are important to the model and the application. Although stopword removal clearly affects which word types appear as most probable terms in topics, we argue that this improvement is superficial, and that topic inference benefits little from the practice of removing stopwords beyond very frequent terms. Removing corpus-specific stopwords after model inference is more transparent and produces similar results to removing those words prior to inference.
国家哲学社会科学文献中心版权所有