期刊名称:International Journal of Computer Information Systems and Industrial Management Applications
印刷版ISSN:2150-7988
电子版ISSN:2150-7988
出版年度:2014
期号:6
页码:494-504
出版社:Machine Intelligence Research Labs (MIR Labs)
摘要:Validating a given clustering result is a very chal- lenging task in real world. So for this purpose, several clus- ter validity indices have been developed in the literature. Clus- ter validity indices are divided into two main categories: ex- ternal and internal. External cluster validity indices rely on some supervised information available and internal validity in- dices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI and its normalized version NMMI have been implemented based on Max-Min dis- tance along data points and prior information using structure of data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Different possibilities for probabilistic approaches have been considered and tried to rectify their problems. Ge- netic K-means clustering algorithm (GAK-means) and single linkage clustering technique have been used as the underlying clustering techniques. Results of proposed index for classifying the true partitioning results have been shown for six artificial and two real-life data sets. GAK-means and single linkage clus- tering techniques are used as the underlying partitioning tech- niques with the number of clusters varied in a range. The MMI and NMMI index are then used to determine the appropriate number of clusters. Performance of MMI along with its two ver- sions MMI old and MMI new along with its normalized version NMMI are compared with the existing external cluster valid- ity indices, F-measure, purity, normalized mutual information (NMI), rand index (RI), adjusted rand index (ARI). Proposed MMI index works well for two class and multi class data sets.