摘要:This paper explains the rationale for a new corpus being assembled at Lancaster
University to complement the existing Brown ‘family’ of corpora: that is,
English language corpora modelled on the original Brown University corpus,
such as LOB, Frown, FLOB, Wellington, etc. The purpose of the new corpus,
called Lancaster1931, is to extend the chronological span of these corpora into
the first half of the twentieth century, and so to afford researchers a stronger
empirical basis for examining recent changes in grammatical usage in English.
We discuss some methodological issues encountered in extending the Brown
model to earlier historical periods. We also outline some developments under
way to permit more rigorous computer-assisted analyses within and across these
corpora, namely (i) encoding of all the corpora with XML, (ii) adoption of a
common grammatical tagset, known as ‘C8’, and (iii) implementation of a
semantic annotation scheme.