文章基本信息

标题：A Keyphrase Graph-Based Method for Document Similarity Measurement
本地全文：下载
作者：ThanhThuong T.Huynh ; TruongAn PhamNguyen ; Nhon V.Do 等
期刊名称：Engineering Letters
印刷版ISSN：1816-093X
电子版ISSN：1816-0948
出版年度：2022
卷号：30
期号：2
页码：692-710
语种：English
出版社：Newswood Ltd
摘要：Measuring similarity between texts is an essential task in a large variety of applications. Contemporary approaches for this task rely heavily on statistical and lexical information to represent text. They thus produce opaque and hard to interpret models that could be hard to adapt in some applications and hamper the user experience. To represent the text document more interpretable, we propose a graph-based semantic model that integrates more semantic information among keyphrases as well as the structural information of the text. The utilization of large knowledge bases (e.g. DBpedia, Wikipedia) makes available fine-grained information about concepts, entities, and their semantic relations, thus resulting in a knowledge-rich interpretation. The relevance evaluation between two documents can then be performed by calculating the semantic similarity between two keyphrase graphs that represent them. The final result comes close in performance to the specialized black-box methods particularly tuned to this task on a traditional dataset.
关键词：Document representation;Graph-based document model;Keyphrase Extraction;Document similarity;Graph matching