文章基本信息

标题：Performance Evaluations of κ-Approximate Modal Haplotype Type Algorithms for Clustering Categorical Data
本地全文：下载
作者：Ali Seman ; Azizian Mohd Sapawi ; Mohd Zaki Salleh 等
期刊名称：Research Journal of Information Technology
印刷版ISSN：1815-7432
电子版ISSN：2151-7959
出版年度：2015
卷号：7
期号：2
页码：112-120
DOI：10.3923/rjit.2015.112.120
出版社：Academic Journals Inc., USA
摘要：The effectiveness of the performance of κ-Approximate Modal Haplotype (κ-AMH)-type algorithms for clustering Y-short tandem repeats (Y-STR) of categorical data has been demonstrated previously. However, newly introduced κ-AMH-type algorithms, including the new κ-AMH I (Nκ-AMH 1), the new κ-AMH II (Nκ-AMH II) and the new κ-AMH III (Nκ-AMH III), are derived from the same κ-AMH optimization and fuzzy procedures but with the inclusion of two new methods, namely, new initial center selection and new dominant weighting methods. This study evaluates and presents the performance of κ-AMH-type algorithms for clustering five categorical data sets-namely, soybean, zoo, hepatitis, voting and breast. The performance criteria include accuracy, precision and recall analyses. Overall, κ-AMH-type algorithms perform well when clustering all of the categorical data sets mentioned above. Specifically, the N κ-AMH I algorithm exhibits the best performance when clustering the five categorical data sets; this algorithm obtained the highest combined mean accuracy score (at 0.9130), compared to those of κ-AMH (0.8971), N κ-AMH II (0.8885) and N κ-AMH III (0.9011). This high score is associated with the newly introduced initial center selection, combined with the original dominant weighting method. These results present a new and significant benchmark, indicating that κ-AMH-type algorithms can be generalized for any categorical data.