首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Neural Language Models for Nineteenth-Century English
  • 本地全文:下载
  • 作者:Kasra Hosseini ; Kaspar Beelen ; Giovanni Colavizza
  • 期刊名称:Journal of Open Humanities Data
  • 电子版ISSN:2059-481X
  • 出版年度:2021
  • 卷号:7
  • DOI:10.5334/johd.48
  • 语种:English
  • 出版社:Ubiquity Press
  • 摘要:We present four types of neural language models trained on a large historical dataset of books in English, published between 1760 and 1900, and comprised of ≈5.1 billion tokens. The language model architectures include word type embeddings (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the type embeddings, and four instances considering different time slices for BERT. Our models have already been used in various downstream tasks where they consistently improved performance. In this paper, we describe how the models have been created and outline their reuse potential.
国家哲学社会科学文献中心版权所有