首页    期刊浏览 2024年10月01日 星期二
登录注册

文章基本信息

  • 标题:Building a Corpus-based Historical Portuguese Dictionary : Challenges and Opportunities
  • 本地全文:下载
  • 作者:Arnaldo Candido Junior ; Sandra Maria Aluísio
  • 期刊名称:Traitement Automatique des Langues
  • 印刷版ISSN:1248-9433
  • 电子版ISSN:1965-0906
  • 出版年度:2009
  • 卷号:50
  • 期号:2
  • 出版社:ATALA - Assoc Traitement Automatique Langues
  • 摘要:Historical corpora are important resources for different areas. Philology, Human Language Technology, Literary Studies, History, and Lexicography are some that benefit from them. However, compiling historical corpora is different from compiling contemporary corpora. Corpus designers have to deal with several characteristics inherent in historical texts, such as : absence of a spelling standard, pervasive use of abbreviations plus their spelling variations, lack of space between words, irregular use of hyphenation, non-standard typographical symbols. This paper addresses the challenges posed in processing the corpus designed for the Historical Dictionary of Brazilian Portuguese (HDBP) project, which is composed of texts from the sixteenth through the beginning of the nineteenth century, and the solutions found to support the compilation of a Historical Portuguese dictionary based on this corpus.
国家哲学社会科学文献中心版权所有