期刊名称:Canadian Journal on Artificail Intelligence, Machin Learning and Pattern Recognition
出版年度:2010
卷号:1
期号:3
页码:26-41
出版社:AM Publishers Corporation Canada
摘要:Analysis of voluminous data generated over the years by business houses, genome projects or elsewhere reveals important findings that advance research activity in respective fields. Clustering of data is one of the most important methods to mine patterns hidden in the large repositories. However, creation of stable clusters is still an unsolved problem. An integrated approach has been proposed in the paper, that first creates clusters by applying a new clustering algorithm and later quality of clusters is measured to validate them. The proposed method takes into account the stability of clusters on partition basis by calculating the stability factor, a real value in the range [0, 1]. The implication of stability factor is that higher the value, more stable the cluster is and of better quality. Initially, clusters are categorized as stable and unstable clusters based on a threshold set to the highest stability factor. As a next step, the data objects of unstable clusters are analyzed in order to place them in the most appropriate clusters. Thus stable clusters are obtained without losing any potentially relevant patterns. The performance of the method has been demonstrated using gene expression data set and benchmark data sets.