首页    期刊浏览 2024年10月01日 星期二
登录注册

文章基本信息

  • 标题:Word Sense Clustering and Clusterability
  • 本地全文:下载
  • 作者:Diana McCarthy ; Marianna Apidianaki ; Katrin Erk
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2016
  • 卷号:42
  • 期号:2
  • 页码:245-275
  • DOI:10.1162/COLI_a_00247
  • 语种:English
  • 出版社:MIT Press
  • 摘要:Word sense disambiguation and the related field of automated word sense induction traditionally assume that the occurrences of a lemma can be partitioned into senses. But this seems to be a much easier task for some lemmas than others. Our work builds on recent work that proposes describing word meaning in a graded fashion rather than through a strict partition into senses; in this article we argue that not all lemmas may need the more complex graded analysis, depending on their partitionability. Although there is plenty of evidence from previous studies and from the linguistics literature that there is a spectrum of partitionability of word meanings, this is the first attempt to measure the phenomenon and to couple the machine learning literature on clusterability with word usage data used in computational linguistics. We propose to operationalize partitionability as clusterability, a measure of how easy the occurrences of a lemma are to cluster. We test two ways of measuring clusterability: (1) existing measures from the machine learning literature that aim to measure the goodness of optimal k-means clusterings, and (2) the idea that if a lemma is more clusterable, two clusterings based on two different “views” of the same data points will be more congruent. The two views that we use are two different sets of manually constructed lexical substitutes for the target lemma, on the one hand monolingual paraphrases, and on the other hand translations. We apply automatic clustering to the manual annotations. We use manual annotations because we want the representations of the instances that we cluster to be as informative and “clean” as possible. We show that when we control for polysemy, our measures of clusterability tend to correlate with partitionability, in particular some of the type-(1) clusterability measures, and that these measures outperform a baseline that relies on the amount of overlap in a soft clustering.
国家哲学社会科学文献中心版权所有