期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:3
页码:3998-4003
出版社:TechScience Publications
摘要:Number of variables or attributes of any data set effect to a large extent clustering of that particular data. These attributes directly affect the dissimilarity or distance measures thereby effecting accuracy of data. So dimensionality reduction techniques can definitely improve clustering. As clustering is a unsupervised machine learning technique, the validation of results obtained from application of clustering algorithm to a particular data set is a big issue. This paper formulates a new model for data clustering using combination of feature extraction, data clustering algorithm and clustering validity index/indices. The data clustering algorithm used is Agglomerative Hierarchal Clustering Algorithm. The different features reduction techniques used are PCA, CMDS, ISOMAP and HLLE. The clustering validity indices used are Silhouette index, Dunn index, Davies Bouldin Index and Calinski Harbasaz index.
关键词:Agglomerative Hierarchal Clustering; PCA; CMDS;ISOMAP; HLLE; Silhouette Index; Dunn Index; Davies-Bouldin;Index; Calinski Harbasaz Index