文章基本信息

标题：Morpho-Syntactic Descriptions in MULTEXT-East — the Case of Serbian
本地全文：下载
作者：Cvetana Krstev ; Duško Vitas ; Tomaž Erjavec 等
期刊名称：Informatica
印刷版ISSN：1514-8327
电子版ISSN：1854-3871
出版年度：2004
卷号：28
期号：4
页码：431-436
出版社：The Slovene Society Informatika, Ljubljana
摘要：MULTEXT-East is a multilingual dataset for language engineering research and development. This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes the EAGLES-based morphosyntactic specifications, defining the features that describe word-level syntactic annotations; medium scale morphosyntactic lexica; and annotated parallel, comparable, and speech corpora. The most important component is the linguistically annotated corpus consisting of Orwell's novel ``1984'' in the English original and translations. MULTEXT-East has already seen several editions, with the latest one being Version 3, where the most important addition are the Serbian language resources, including the structurally annotated ``1984'', the morphosyntactic specifications, the morphosyntactic lexicon and the linguistically annotated ``1984''. The complete dataset, unique in terms of languages and the wealth of encoding, is extensively documented, and freely available for research purposes
关键词：natural language processing; language resources; Serbian language; multilinguality