首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Towards Facts Extraction from Texts in the Polish Language
  • 本地全文:下载
  • 作者:Tomasz Boiński ; Adam Brzeski
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2014
  • 卷号:2
  • 期号:8
  • 出版社:S&S Publications
  • 摘要:The Polish language differs from English in many ways. It has more complicated conjugation anddeclination. Because of that automatic facts extraction from texts is difficult. In this paper we present basic differencesbetween those languages. The paper presents an algorithm for extraction of facts from articles from Polish Wikipedia.The algorithm is based on 7 proposed facts schemes that are searched for in the analyzed text. The analysis includesmorphosyntactic tagging, named entity extraction and relation identification. The results acquired for an exemplaryWikipedia text is presented. We indicate the free word formation principle as the main difficulty in the Polish textsanalysis. At the same time satisfactory performance of the tagging and analysis tools for the Polish language wasconfirmed in the conducted experiment.
  • 关键词:natural language processing; text analysis; knowledge extraction; unstructured information; tagging;named-entity recognition
国家哲学社会科学文献中心版权所有