摘要:This paper profiles significant differences in syntactic distribution and differences in
word class frequencies for two treebanks of spoken and written German: the TüBa-D/S,
a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of
newspaper articles published in the German daily newspaper ‘die tageszeitung’ (taz).
The approach can be used more generally as a means of distinguishing and classifying
language corpora of different genres.