期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2010
卷号:2010
出版社:ACL Anthology
摘要:Motivated by the recent interest in streaming
algorithms for processing large text
collections, we revisit the work of
Ravichandran et al. (2005) on using the
Locality Sensitive Hash (LSH) method of
Charikar (2002) to enable fast, approximate
comparisons of vector cosine similarity.
For the common case of feature
updates being additive over a data stream,
we show that LSH signatures can be maintained
online, without additional approximation
error, and with lower memory requirements
than when using the standard
offline technique.