期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2010
卷号:2010
出版社:ACL Anthology
摘要:Existing word similarity measures are not
robust to data sparseness since they rely
only on the point estimation of words’
context profiles obtained from a limited
amount of data. This paper proposes a
Bayesian method for robust distributional
word similarities. The method uses a distribution
of context profiles obtained by
Bayesian estimation and takes the expectation
of a base similarity measure under
that distribution. When the context profiles
are multinomial distributions, the priors
are Dirichlet, and the base measure is
the Bhattacharyya coefficient, we can derive
an analytical form that allows efficient
calculation. For the task of word similarity
estimation using a large amount ofWeb
data in Japanese, we show that the proposed
measure gives better accuracies than
other well-known similarity measures.