期刊名称:International Journal on Computer Science and Engineering
印刷版ISSN:2229-5631
电子版ISSN:0975-3397
出版年度:2011
卷号:3
期号:6
页码:2544-2553
出版社:Engg Journals Publications
摘要:A cluster is a collection of data objects that are similar to one another within the cluster and are dissimilar to the objects in the other cluster. Data clustering is studied in statistical, machine learning and data mining. The dissimilarity between two clusters is defined as the distance between their centroids or the distance between two closest (or farthest) data points. The measures are vulnerable to outliers and removing the outliers is yet another difficult task. A new similarity measure, cohesion is used to measure the inter cluster distance. By using cohesion measure, a new two phase clustering algorithm is designed called as �Cohesion Based Self Merging Algorithm (CSM)�. This is a combination of partitional (KHarmonic mean) clustering and hierarchical clustering (CURE) algorithms. CSM partitions the input dataset into several small clusters in the first phase and then continuously merges the sub clusters based on similarity measures cohesion in a hierarchical manner in the second phase. Run time behaviors of these algorithms are analyzed and compared using the existing method combining k-mean and hierarchical algorithms for robust and efficient data clustering with cohesion-self merging algorithm.