首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Semi-Semantic Annotation: A guideline for the URDU.KON-TB treebank POS annotation
  • 本地全文:下载
  • 作者:Qaiser ABBAS
  • 期刊名称:Acta Linguistica Asiatica
  • 印刷版ISSN:2232-3317
  • 出版年度:2016
  • 卷号:6
  • 期号:2
  • 页码:97-134
  • 语种:English
  • 出版社:Znanstvena založba Filozofske fakulte / Ljubljana University Press, Faculty of Arts
  • 摘要:This work elaborates the semi-semantic part of speech annotation guidelines for the URDU.KON-TB treebank: an annotated corpus. A hierarchical annotation scheme was designed to label the part of speech and then applied on the corpus. This raw corpus was collected from the Urdu Wikipedia and the Jang newspaper and then annotated with the proposed semi-semantic part of speech labels. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. This exercise finally contributed a part of speech annotation to the URDU.KON-TB treebank. Twenty-two main part of speech categories are divided into subcategories, which conclude the morphological, and semantical information encoded in it. This article reports the annotation guidelines in major; however, it also briefs the development of the URDU.KON-TB treebank, which includes the raw corpus collection, designing & employment of annotation scheme and finally, its statistical evaluation and results. The guidelines presented as follows, will be useful for linguistic community to annotate the sentences not only for the national language Urdu but for the other indigenous languages like Punjab, Sindhi, Pashto, etc., as well.
  • 关键词:semi-semantic part of speech;rich information;deep learning;parsing aid;linguistically motivated annotation;humanistic annotation;polsemantična besedna vrsta;številne informacije;globoko učenje;pomoč pri razvrščanju;jezikoslovno utemeljeno označevanje;humanistično označevanje
国家哲学社会科学文献中心版权所有