首页    期刊浏览 2025年06月14日 星期六
登录注册

文章基本信息

  • 标题:Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text
  • 本地全文:下载
  • 作者:Alexandre Pinto ; Hugo Gon{\c{c}}alo Oliveira ; Ana Oliveira Alves
  • 期刊名称:OASIcs : OpenAccess Series in Informatics
  • 电子版ISSN:2190-6807
  • 出版年度:2016
  • 卷号:51
  • 页码:1-16
  • DOI:10.4230/OASIcs.SLATE.2016.3
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Nowadays, there are many toolkits available for performing common natural language processing tasks, which enable the development of more powerful applications without having to start from scratch. In fact, for English, there is no need to develop tools such as tokenizers, part-of-speech (POS) taggers, chunkers or named entity recognizers (NER). The current challenge is to select which one to use, out of the range of available tools. This choice may depend on several aspects, including the kind and source of text, where the level, formal or informal, may influence the performance of such tools. In this paper, we assess a range of natural language processing toolkits with their default configuration, while performing a set of standard tasks (e.g. tokenization, POS tagging, chunking and NER), in popular datasets that cover newspaper and social network text. The obtained results are analyzed and, while we could not decide on a single toolkit, this exercise was very helpful to narrow our choice.
  • 关键词:Natural language processing; toolkits; formal text; social media; benchmark
国家哲学社会科学文献中心版权所有