文章基本信息

标题：Using the OED quotations database as a corpus – a linguistic appraisal
本地全文：下载
作者：Sebastian Hoffmann
期刊名称：ICAME Journal
印刷版ISSN：0801-5775
电子版ISSN：1502-5462
出版年度：2004
卷号：2004
期号：28
页码：17-17–30
出版社：School of Computing
摘要：Over the past decades, the number of historical corpora available has steadily grown. Perhaps the best-known and most widely used is the Helsinki Corpus. (See Kytö 1996[1991] for a description of the corpus and Rissanen et al. 1993 for a range of possible applications.) Other historical corpora include ARCHER (A Representative Corpus of Historical English Registers), the Corpus of Early English Correspondence (CEEC), the Innsbruck Computer Archive of Machine- Readable English Texts (ICAMET), the Lampeter Corpus of Early Modern English Tracts, and the Zurich English Newspaper Corpus (ZEN), to name just a few (cf. Biber et al. 1994; Fries 1994; Schmied 1994; Keränen 1998; Markus 1999a). However, given their relatively small size, these historical corpora are unfortunately only of limited value for the study of less frequent features of the English language. The Helsinki Corpus, for instance, spans almost a thousand years (ca. 750 to 1700) but contains only 1.57 million words. Even for the period of Late Modern English, suitable corpus data is not in great abundance. For example, although ARCHER covers a smaller time-span from 1650 to 1990 and offers detailed categorization by register, its overall size of less than two million words still results in many of the same limitations as the Helsinki Corpus.1