首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:Improving the Performance of a Tagger Generator in an Information Extraction Application
  • 作者:José A. Troyano ; Fernando Enríquez ; Fermín Cruz
  • 期刊名称:Journal of Universal Computer Science
  • 印刷版ISSN:0948-6968
  • 出版年度:2007
  • 卷号:13
  • 期号:9
  • 页码:1287-1299
  • 出版社:Graz University of Technology and Know-Center
  • 摘要:In this paper we present an experience in the extraction of named entities from Spanish texts using stacking. Named Entity Extraction (NEE) is a subtask of Information Extraction that involves the identification of groups of words that make up the name of an entity, and the classification of these names into a set of predefined categories. Our approach is corpus-based, we use a re-trainable tagger generator to obtain a named entity extractor from a set of tagged examples. The main contribution of our work is that we obtain the systems needed in a stacking scheme without making use of any additional training material or tagger generators. Instead of it, we have generated the variability needed in stacking by applying corpus transformation to the original training corpus. Once we have several versions of the training corpus we generate several extractors and combine them by means of a machine learning algorithm. Experiments show that the combination of corpus transformation and stacking improve the performance of the tagger generator in this kind of natural language processing applications. The best of our experiments achieves an improvement of more than six percentual points respect to the predefined baseline.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有