期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2012
卷号:2012
出版社:ACL Anthology
摘要:The named entity disambiguation task is to resolve
the many-to-many correspondence between
ambiguous names and the unique realworld
entity. This task can be modeled as a
classification problem, provided that positive
and negative examples are available for learning
binary classifiers. High-quality senseannotated
data, however, are hard to be obtained
in streaming environments, since the
training corpus would have to be constantly
updated in order to accomodate the fresh data
coming on the stream. On the other hand, few
positive examples plus large amounts of unlabeled
data may be easily acquired. Producing
binary classifiers directly from this data,
however, leads to poor disambiguation performance.
Thus, we propose to enhance the
quality of the classifiers using finer-grained
variations of the well-known Expectation-
Maximization (EM) algorithm. We conducted
a systematic evaluation using Twitter streaming
data and the results show that our classifiers
are extremely effective, providing improvements
ranging from 1% to 20%, when
compared to the current state-of-the-art biased
SVMs, being more than 120 times faster.