首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:Morpho-Syntactic Descriptions in MULTEXT-East — the Case of Serbian
  • 本地全文:下载
  • 作者:Cvetana Krstev ; Duško Vitas ; Tomaž Erjavec
  • 期刊名称:Informatica
  • 印刷版ISSN:1514-8327
  • 电子版ISSN:1854-3871
  • 出版年度:2004
  • 卷号:28
  • 期号:4
  • 页码:431-436
  • 出版社:The Slovene Society Informatika, Ljubljana
  • 摘要:MULTEXT-East is a multilingual dataset for language engineering research and development. This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes the EAGLES-based morphosyntactic specifications, defining the features that describe word-level syntactic annotations; medium scale morphosyntactic lexica; and annotated parallel, comparable, and speech corpora. The most important component is the linguistically annotated corpus consisting of Orwell's novel ``1984'' in the English original and translations. MULTEXT-East has already seen several editions, with the latest one being Version 3, where the most important addition are the Serbian language resources, including the structurally annotated ``1984'', the morphosyntactic specifications, the morphosyntactic lexicon and the linguistically annotated ``1984''. The complete dataset, unique in terms of languages and the wealth of encoding, is extensively documented, and freely available for research purposes
  • 关键词:natural language processing; language resources; Serbian language; multilinguality
国家哲学社会科学文献中心版权所有