首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Named Entity Recognition Using Appropriate Unlabeled Data, Post-processing and Voting
  • 本地全文:下载
  • 作者:A. Ekbal ; S. Bandyopadhyay
  • 期刊名称:Informatica
  • 印刷版ISSN:1514-8327
  • 电子版ISSN:1854-3871
  • 出版年度:2010
  • 卷号:34
  • 期号:1
  • 出版社:The Slovene Society Informatika, Ljubljana
  • 摘要:This paper reports how the appropriate unlabeled data, post-processing and voting can be effective to improve the performance of a Named Entity Recognition (NER) system. The proposed method is based on a combination of the following classifiers: Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). The training set consists of approximately 272K wordforms. The proposed method is tested with Bengali. A semi-supervised learning technique has been developed that uses the unlabeled data during training of the system. We have shown that simply relying upon the use of large corpora during training for performance improvement is not in itself sufficient. We describe the measures to automatically select effective documents and sentences from the unlabeled data. In addition, we have used a number of techniques to post-process the output of each of the models in order to improve the performance. Finally, we have applied weighted voting approach to combine the models. Experimental results show the effectiveness of the proposed approach with the overall average recall, precision, and f-score values of 93.79%, 91.34%, and 92.55%, respectively, which shows an improvement of 19.4% in f-score over the least performing baseline ME based sy
  • 关键词:named entity recognition; maximum entropy; conditional random ield; support vector machine; weighted;voting; Bengali
国家哲学社会科学文献中心版权所有