首页    期刊浏览 2024年07月19日 星期五
登录注册

文章基本信息

  • 标题:Improving NLTK for Processing Portuguese
  • 本地全文:下载
  • 作者:João Ferreira ; Hugo Gonçalo Oliveira ; Ricardo Rodrigues
  • 期刊名称:OASIcs : OpenAccess Series in Informatics
  • 电子版ISSN:2190-6807
  • 出版年度:2019
  • 卷号:74
  • 页码:1-9
  • DOI:10.4230/OASIcs.SLATE.2019.18
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.
  • 关键词:NLP; Tokenization; PoS tagging; Lemmatization; Named Entity Recognition
国家哲学社会科学文献中心版权所有