首页    期刊浏览 2025年12月27日 星期六
登录注册

文章基本信息

  • 标题:Quality of Word Vectors and its Impact on Named Entity Recognition in Czech
  • 本地全文:下载
  • 作者:František Dařena ; Martin Süss
  • 期刊名称:European Journal of Business Science and Technology
  • 印刷版ISSN:2336-6494
  • 出版年度:2020
  • 卷号:6
  • 期号:2
  • 页码:154-169
  • DOI:10.11118/ejobsat.2020.010
  • 语种:English
  • 出版社:Mendel University, Brno
  • 摘要:Named Entity Recognition (NER) focuses on finding named entities in text and classifying them into one of the entity types. Modern state-of-the-art NER approaches avoid using handcrafted features and rely on feature-inferring neural network systems based on word embeddings. The paper analyzes the impact of different aspects related to word embeddings on the process and results of the named entity recognition task in Czech, which has not been investigated so far. Various aspects of word vectors preparation were experimentally examined to draw useful conclusions. The suitable settings in different steps were determined, including the used corpus, number of word vectors dimensions, used text preprocessing techniques, context window size, number of training epochs, and word vectors inferring algorithms and their specific parameters. The paper demonstrates that focusing on the process of word vectors preparation can bring a significant improvement for NER in Czech even without using additional language independent and dependent resources.
  • 关键词:Named Entity Recognition;word embeddings;word vectors training;natural language processing;Czech language
国家哲学社会科学文献中心版权所有