期刊名称:International Journal of Grid and Distributed Computing
印刷版ISSN:2005-4262
出版年度:2016
卷号:9
期号:11
页码:119-132
出版社:SERSC
摘要:K-Means is a widely used partition based clustering algorithm famous for its simplicity and speed. It organizes input dataset into predefined number of clusters. K-Means has a major limitation -- the number of clusters, K, need to be pre-specified as an input. Pre-specifying K in the K-Means algorithm sometimes becomes difficult in absence of thorough domain knowledge, or for a new and unknown dataset. This limitation of advance specification of cluster number can lead to “forced” clustering of data and proper classification does not emerge. In this paper, a new algorithm based on the K-Means is developed. This algorithm has advance features of intelligent data analysis and automatic generation of appropriate number of clusters. The clusters generated by the new algorithm are compared against results obtained with the original K-Means and various other famous clustering algorithms. This comparative analysis is done using sets of real data.
关键词:Clustering; K-Means; Automatic generation of clusters