首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:Automatic Detection of Language and Annotation Model Information in CoNLL Corpora
  • 本地全文:下载
  • 作者:Frank Abromeit ; Christian Chiarcos
  • 期刊名称:OASIcs : OpenAccess Series in Informatics
  • 电子版ISSN:2190-6807
  • 出版年度:2019
  • 卷号:70
  • 页码:23:1-23:9
  • DOI:10.4230/OASIcs.LDK.2019.23
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3.
  • 关键词:LLOD; CoNLL; OLiA
国家哲学社会科学文献中心版权所有