摘要:The paper introduces the ORTOFON corpus of spontaneous spoken Czech and the DIALEKT corpus of Czech dialects, their design principles and practical solutions adopted during data collection.
关键词:dialectology ; lemmatization ; spoken corpus ; tagging ; transcription