期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2019
卷号:7
期号:2
页码:1054-1060
DOI:10.15680/IJIRCCE.2019. 0702092
出版社:S&S Publications
摘要:This paper presents a method for measuring the semantic similarity between concepts in Knowledge
Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic similarity methods have focused on either
the structure of the semantic network between concepts (e.g., path length and depth), or only on the Information
Content (IC) of concepts. We propose a semantic similarity method, namely wpath, to combine these two approaches,
using IC to weight the shortest path length between concepts. Conventional corpus-based IC is computed from the
distributions of concepts over textual corpus, which is required to prepare a domain corpus containing annotated
concepts and has high computational cost. As instances are already extracted from textual corpus and annotated by
concepts in KGs, graph-based IC is proposed to compute IC based on the distributions of concepts over instances.
Measuring the similarity between documents is an important operation in the text processing field. This project
proposed a new similarity measure. Discovering hyponym relations among domain-specific terms is a fundamental task
in taxonomy learning and knowledge acquisition. However, the great diversity of various domain corpora and the lack
of labeled training sets make this task very challenging for conventional methods that are based on text content. The
hyperlink structure of Wikipedia article pages was found to contain recurring network motifs in this study, indicating
the probability of a hyperlink being a hyponym hyperlink. Hence, a novel hyponym relation extraction approach based
on the network motifs of Wikipedia hyperlinks was proposed. This approach automatically constructs motif-based
features from the hyperlink structure of a domain; every hyperlink is mapped to a 13-dimensional feature vector based
on the 13 types of three-node motifs.