首页    期刊浏览 2026年01月01日 星期四
登录注册

文章基本信息

  • 标题:Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields
  • 本地全文:下载
  • 作者:Pallavi, K. P. ; Sobha, L. ; Ramya, M. M.
  • 期刊名称:Journal of Computer Science
  • 印刷版ISSN:1549-3636
  • 出版年度:2018
  • 卷号:14
  • 期号:5
  • 页码:645-653
  • DOI:10.3844/jcssp.2018.645.653
  • 出版社:Science Publications
  • 摘要:Named Entities (NEs) that exist in the sentences are essential to build Natural Language Processing (NLP) applications for Information Extraction (IE) from large corpora. However, generating a large corpus is challenging for resource poor languages, such as Kannada. Further, there is no annotated corpus available online. The challenges faced in annotating NEs with pre-defined classes are: It is morphologically joined with other words and the spelling variations are more frequent for Kannada words. Sentence structure varies according to morphology, parts of speech (pos) and chunking of a language. These parameters differ from one language to another. To address these challenges, a novel application system is proposed to identify NEs in Kannada using a large corpus of 73,676 tokens. The Named Entity Recognition (NER) system consist of a robust pos tagger and Noun Phrase (NP) chunker developed for generic data. Five gazetteer lists were created from many orthographic patterns for each word. Context information such as previous two words, next two words, word morphology and gazetteer lists were added to feature lists. An unigram-bigram template was designed and incorporated into Conditional Random Fields (CRFs) to generate conditional feature functions. The proposed system resulted in 86.85% and 71.01% f-measure for gold test data and newspaper data respectively.
  • 关键词:Named Entities; Natural Language Processing; Noun Phrase Chunker; Conditional Random Fields
国家哲学社会科学文献中心版权所有