首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Some Challenges of Automated Annotation in A Multilingual Scenario
  • 本地全文:下载
  • 作者:Arindam Roy ; Sunita Sarkar ; B. S. Purkayastha
  • 期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
  • 印刷版ISSN:2347-6710
  • 电子版ISSN:2319-8753
  • 出版年度:2014
  • 卷号:3
  • 期号:12
  • 页码:18230
  • DOI:10.15680/IJIRSET.2014.0312066
  • 出版社:S&S Publications
  • 摘要:A key ingredient of today’s NLP scenario is annotation and this paper discusses challenges involved inone of the toughest annotation tasks which is sense marking. A large amount of data needs to be sense markedaccurately by human annotators in order to train the machine to understand the spoken languages. The sense markedcorpus for various languages facilitate the task of Word Sense Disambiguation (WSD) which is required for translation.For accurately sense marking voluminous data, a standard and definitive lexicon is required. In the work reported here,the corpus is taken from the newspaper domain and tourism domain. The Princeton WordNet (Version 2.1) is used asthe sense repertoire for English text while the Hindi and Nepali WordNets have been used for Hindi and Nepali textsrespectively. The corpus was independently tagged by different annotators and it was found that the agreement level onword sense disambiguation was about 85% across the three languages, i.e., English, Hindi and Nepali. Different sensesof a particular word in WordNet are quite specific, yet there have been cases when the senses provided had limitationsand posed challenges to the human sense markers.
  • 关键词:Sense-marking; Synset; WordNet; Word sense disambiguation; Expansion approach
国家哲学社会科学文献中心版权所有