首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Megalite: A New Spanish Literature Corpus for NLP Tasks
  • 本地全文:下载
  • 作者:Luis-Gil Moreno-Jiménez ; Juan-Manuel Torres-Moreno
  • 期刊名称:Computer Science & Information Technology
  • 电子版ISSN:2231-5403
  • 出版年度:2021
  • 卷号:11
  • 期号:1
  • 语种:English
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:In this work we introduce the Spanish Literary corpus MegaLite, a new corpus well adapted to Natural Language Processing (NLP), Computational Creativity (CC), Text generation and others studies. We address the creation of this corpus of literary documents to evaluate or design algorithms in automatic text generation, classification, stylometry and rhetorical analysis, sentiment detection, among other tasks. We have constituted this corpus manually in order to avoir genre classification errors. Near of 5 200 works on the genres narrative, poetry and plays constitute this corpus. Some statistics and applications of MegaLite corpus are presented and discussed. The MegaLite corpus will be available to the community as a free resource, under several adequate formats.
  • 关键词:Emotion Corpus;Spanish Literary Corpus;Learning algorithms;Linguistic resources
国家哲学社会科学文献中心版权所有