首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:RULE-BASED ANNOTATION OF LITHUANIAN TEXT CORPORA
  • 本地全文:下载
  • 作者:Jurgita Kapočiūtė ; Gailius Raškinis
  • 期刊名称:European Integration Studies
  • 印刷版ISSN:2335-8831
  • 出版年度:2015
  • 卷号:34
  • 期号:3
  • DOI:10.5755/j01.itc.34.3.12012
  • 语种:English
  • 出版社:Kaunas University of Technology
  • 摘要:In this paper we present an algorithm that automatically recognizes and annotates person and place names, contractions, acronyms, foreign language phrases, dates and sentence boundaries in Lithuanian texts. The algorithm is based on a set of manually developed template matching rules and a few specialized lexicons. The algorithm performs annotation by making several passes over the text. It can operate in automatic and semi-automatic annotation modes. In the semi-automatic annotation mode, the user is allowed to intervene in cases where automatic decision is uncertain. Users’ feedback is memorized and stored in the lexicons. Rules and lexicons were developed after a careful examination of the text corpus of 600 thousand words. The algorithm was evaluated on a separate corpus of 400 thousand words and achieved ~93% annotation accuracy.
国家哲学社会科学文献中心版权所有