期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:3
页码:2326-2329
出版社:TechScience Publications
摘要:High Dimensional data is need of world as social networking sites, biomedical data, sports, etc. Many data sets are represented with hundreds or thousands of dimensions. Dimensions are increasing, so due to “Curse of Dimensionality”, traditional outlier detection methods not working efficiently. Increasing dimensions of data objects, makes difficult to find out points, which are not fitting in group (cluster), called Outlier. The outlier detection method has important applications in the field of fraud detection, network robustness analysis, error elimination in scientific data, sports data analysis and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Spam can be linked based or content based. Ensemble subspace clustering is paradigm in which spam outlier detection is done for high dimensional data sets is proposed in this paper. The proposed method divides original high dimensional data set in subspace clusters using subspace clustering algorithm. By using improved k-means algorithms outlier cluster is found, which is further merged with other clusters depending upon consensus function. Outlier cluster, which is not going to merge with any other subspace cluster, is called as final outlier.
关键词:Outlier; high dimensional data; subspace;ensemble; clustering