期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:The challenges of Named Entities Recognition
(NER) for tweets lie in the insufficient
information in a tweet and the unavailability
of training data. We propose to combine
a K-Nearest Neighbors (KNN) classifier
with a linear Conditional Random Fields
(CRF) model under a semi-supervised learning
framework to tackle these challenges. The
KNN based classifier conducts pre-labeling to
collect global coarse evidence across tweets
while the CRF model conducts sequential labeling
to capture fine-grained information encoded
in a tweet. The semi-supervised learning
plus the gazetteers alleviate the lack of
training data. Extensive experiments show the
advantages of our method over the baselines
as well as the effectiveness of KNN and semisupervised
learning.