首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar
  • 本地全文:下载
  • 作者:Christian Chiarcos ; Christian F{"a}th
  • 期刊名称:OASIcs : OpenAccess Series in Informatics
  • 电子版ISSN:2190-6807
  • 出版年度:2019
  • 卷号:70
  • 页码:9:1-9:11
  • DOI:10.4230/OASIcs.LDK.2019.9
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG). RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its cross-linguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually. A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update.
  • 关键词:Role and Reference Grammar; NLP; Corpus; Semantic Web; LLOD; Syntax; Semantics
国家哲学社会科学文献中心版权所有