文章基本信息

标题：Automatic Detection of Language and Annotation Model Information in CoNLL Corpora
本地全文：下载
作者：Frank Abromeit ; Christian Chiarcos
期刊名称：OASIcs : OpenAccess Series in Informatics
电子版ISSN：2190-6807
出版年度：2019
卷号：70
页码：23:1-23:9
DOI：10.4230/OASIcs.LDK.2019.23
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3.
关键词：LLOD; CoNLL; OLiA