摘要:This paper presents design, development and contents of Lithuanian continuous speech corpus LRN 0.1 (Lithuanian Radio News, prototype-version 0.1). The corpus contains 17 hours 23 minutes of records from radio broad-cast news read by 31 speakers. The recorded material is segmented into sentence-length records that are divided into training, development, and evaluation sets. Speech recordings are accompanied by word level transcriptions and auto-matically generated word-to-phone lexicon. The corpus is designed for the constructing and evaluating speaker-inde-pendent continuous speech recognition systems, and may also be used for linguistic research.