期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2016
卷号:84
期号:1
出版社:Journal of Theoretical and Applied
摘要:Heterogeneous information network includes multiple types of objects and multiple types of links. Compared with Homogeneous information network which only contains objects of the same type, heterogeneous information network has more abundant semantic information. Heterogeneous information network is very common in our daily life, such as social networks. Similarity search in heterogeneous information network can mine more precise and accurate knowledge. However, real social networks such as Sina Microblog and Facebook have a huge amount of data, which significantly increases the difficulty of similarity search. Unfortunately, many existing methods can only measure similarities between objects of the same type, moreover, the limitation of computing memory size results in quite limited measurable data amount, thus they can't be actually applied to real relation networks. In this paper, we propose a novel measure, called AvgSim, which can measure similarity between objects at the ends of any searching path in heterogeneous information networks. In addition, we apply parallel computing method in the realization of AvgSim in order to enable the handle of massive data and the application in real networks. Experiments on real datasets verify the effectiveness and efficiency of this novel algorithm.
关键词:Heterogeneous Information Network; Similarity Search; Random Walk; MapReduce