期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2016
卷号:5
期号:12
页码:21324
DOI:10.15680/IJIRSET.2016.0512088
出版社:S&S Publications
摘要:Clustering is to categorize data into groups or clusters such that the data in the same cluster are moresimilar to each other than to those in different clusters. The problem of clustering categorical data is to find a newpartition in dataset to overcome the problem of clustering categorical data via cluster ensembles, result is observed thatthese techniques unluckily generate a final data partition based on incomplete information. The underlying ensembleinformationmatrix presents only cluster-data point relations, with many entries being left unknown. This problemdegrades the quality of the clustering result. To improve clustering quality a new link-based approach the conventionalmatrix by discovering unknown entries through similarity between clusters in an ensemble and an efficient link-basedalgorithm is proposed for the underlying similarity assessment. In this paper propose C-Rank link-based algorithmimprove clustering quality and ranking clusters in weighted networks. C-Rank consists of three major phases: (1)identification of candidate clusters; (2) ranking the candidates by integrated cohesion; and (3) elimination of nonmaximalclusters. The finally apply this clustering result in graph partitioning technique is applied to a weightedbipartite graph that is formulated from the refined matrix.
关键词:Clustering; Data mining; Categorical data; Cluster Ensemble; link-based similarity; refined matrix; and;C-Rank link based cluster.