文章基本信息

标题：Clustering For Similarity Search And Privacy Guaranteed Publishing Of Hi-Dimensional Data
本地全文：下载
作者：Ashwini.R ; K.Praveen ; R.V.Krishnaiah 等
期刊名称：International Journal of Computer Trends and Technology
电子版ISSN：2231-2803
出版年度：2013
卷号：4
期号：10-4
出版社：Seventh Sense Research Group
摘要：Data mining discovers knowhow required for decision making. In real world highdimensional data is frequently used. Therefore it is essential for data mining techniques to work on highdimensional data. Especially clustering algorithm has to work with highdimensional data. In this paper we explore the similarity search mechanisms with respect to highdimensional data. The existing techniques for indexing have certain drawbacks as they do not consider dependencies. For this reason their performance is suboptimal. In the process of clustering finding correlations of different dimensions is required. Pruning is a process of removing unnecessarydata is part of the techniques. Bounding hyper sphere and bounding rectangles are the main techniques used for pruning. They are n to efficient in Nearest Neighbor (NN) search. In this paper we proposed a novel algorithm to overcome the problem. Our technique is known as clusteradaptive bounding which makes use of cluster based index. Our algorithm also features spatial filtering for reducing computational and storage overhead. The similarity measures such as Euclidean and Mahalanobis can also be used with our approach. We also built an application to show the proof of concept. The empirical results reveal that the proposed approach is effective with highdimensional data for performing NN search.
关键词：Data mining; high-dimensional data; similarity measures; indexing