首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Semi-automating the reading programme for a historical dictionary project
  • 作者:Tim van Niekerk ; Johannes Schäfer ; Ulrich Heid
  • 期刊名称:Lexikos
  • 印刷版ISSN:1684-4904
  • 电子版ISSN:2224-0039
  • 出版年度:2018
  • 卷号:28
  • 页码:343-360
  • 出版社:Bureau of the WAT
  • 摘要:

    This paper describes the resources and software procedures used or developed in a major enabling step towards the revision of the scholarly reference work A Dictionary of South African English on Historical Principles ( DSAE , Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dictionary entries will be based; as well as the addition, in parallel, of a new corpus of South African English (SAE) to the project. Drawing on online data sources and an extensive list of known SAE word forms, we have developed a software toolchain to gather, encode, annotate and collate textual sources, producing: (i) a 3.1-billion part-of-speech-annotated corpus of South African English; (ii) a lexical database of illustrative quotations for over 20,000 known SAE word forms, available for selection at the entry-revision stage; and (iii) a list of potential new variant spellings and headword inclusion candidates. These steps replace, where recent electronic sources are concerned, the mechanical aspects of quotation gathering, normally undertaken manually through a reading programme requiring years of teamwork to acquire sufficient coverage (cf. Hicks 2010).

  • 关键词:corpora; dictionary workflows; historical lexicography; language varieties; lexical databases; reading programmes; South African English
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有