摘要:Morphological annotation constitutes essential,very useful and very common linguistic information presented in corpora,especially for highly inflectional languages.The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable,without sacrificing their informational contents.The tags consist of ASCII letters,numbers and several other characters.In general,they have a variable number of symbols,but their order is obligatory,and each category or specific feature is assigned a particular character,which can be shared among several parts of speech.The tagset is highly functional and pragmatic,although some allowances had to be made to accommodate the traditional analysis of Slovak morphology and part of speech categories.
关键词:Slovak language;corpus;tagset;morphology;part of speech;grammatical categories