首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:A Keyphrase Graph-Based Method for Document Similarity Measurement
  • 本地全文:下载
  • 作者:ThanhThuong T.Huynh ; TruongAn PhamNguyen ; Nhon V.Do
  • 期刊名称:Engineering Letters
  • 印刷版ISSN:1816-093X
  • 电子版ISSN:1816-0948
  • 出版年度:2022
  • 卷号:30
  • 期号:2
  • 页码:692-710
  • 语种:English
  • 出版社:Newswood Ltd
  • 摘要:Measuring similarity between texts is an essential task in a large variety of applications. Contemporary approaches for this task rely heavily on statistical and lexical information to represent text. They thus produce opaque and hard to interpret models that could be hard to adapt in some applications and hamper the user experience. To represent the text document more interpretable, we propose a graph-based semantic model that integrates more semantic information among keyphrases as well as the structural information of the text. The utilization of large knowledge bases (e.g. DBpedia, Wikipedia) makes available fine-grained information about concepts, entities, and their semantic relations, thus resulting in a knowledge-rich interpretation. The relevance evaluation between two documents can then be performed by calculating the semantic similarity between two keyphrase graphs that represent them. The final result comes close in performance to the specialized black-box methods particularly tuned to this task on a traditional dataset.
  • 关键词:Document representation;Graph-based document model;Keyphrase Extraction;Document similarity;Graph matching
国家哲学社会科学文献中心版权所有