首页    期刊浏览 2024年10月01日 星期二
登录注册

文章基本信息

  • 标题:Similar Text Fragments Extraction for Identifying Common Wikipedia Communities
  • 本地全文:下载
  • 作者:Svitlana Petrasova ; Nina Khairova ; Włodzimierz Lewoniewski , Orken Mamyrbayev
  • 期刊名称:Data
  • 印刷版ISSN:2306-5729
  • 出版年度:2018
  • 卷号:3
  • 期号:4
  • 页码:66-74
  • DOI:10.3390/data3040066
  • 出版社:MDPI Publishing
  • 摘要:Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.
  • 关键词:information extraction; short text fragment similarity; Wikipedia communities; NLP information extraction ; short text fragment similarity ; Wikipedia communities ; NLP
国家哲学社会科学文献中心版权所有