期刊名称:Indian Journal of Innovations and Developments
印刷版ISSN:2277-5382
电子版ISSN:2277-5390
出版年度:2016
卷号:5
期号:5
页码:1-6
语种:English
出版社:Indian Society for Education and Environment
摘要:Objectives : To analysis various similarity join techniques to improve the data mining process. Findings : Similarity join is an evaluation of similarity between any two objects. Many applications such as data cleaning, data integration, near duplicate detection and all data mining process can extensively benefit from the similarity join measure. Thus the similarity join can be performed between objects or strings or nodes etc. It finds all pairs of objects whose similarity is not smaller than the similarity threshold. There are different techniques and approaches are used to find the similarity join between objects in homogeneous information network. This paper provides detailed information about the different similarity join techniques. Results : In this paper various similarity join techniques are compared through parameters to prove path based similarity join is better than other techniques. Application/Improvements : The findings of this work prove that the path based similarity join provides better result than other approaches.
其他摘要:Objectives : To analysis various similarity join techniques to improve the data mining process. Findings : Similarity join is an evaluation of similarity between any two objects. Many applications such as data cleaning, data integration, near duplicate detection and all data mining process can extensively benefit from the similarity join measure. Thus the similarity join can be performed between objects or strings or nodes etc. It finds all pairs of objects whose similarity is not smaller than the similarity threshold. There are different techniques and approaches are used to find the similarity join between objects in homogeneous information network. This paper provides detailed information about the different similarity join techniques. Results : In this paper various similarity join techniques are compared through parameters to prove path based similarity join is better than other techniques. Application/Improvements : The findings of this work prove that the path based similarity join provides better result than other approaches.
关键词:Similarity Join; Data Cleaning; Data Integration; Near Duplicate Detection.