首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Distributed Document and Phrase Co-embeddings for Descriptive Clustering
  • 本地全文:下载
  • 作者:Motoki Sato ; Austin J. Brockmeier ; Georgios Kontonatsios
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2017
  • 卷号:2017
  • 页码:991-1001
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co-embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.
国家哲学社会科学文献中心版权所有