期刊名称:IIASS : Innovative Issues and Approaches in Social Sciences
印刷版ISSN:1855-0541
出版年度:2015
DOI:10.12959/issn.1855-0541.IIASS-2015-no1-art11
出版社:CEOs Ltd.
摘要:A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method), and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups). We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering), the maximum (complete linkage clustering), the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA) methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method) in cases when data is assumed to be fairly compact and well separated
关键词:hierarchical clustering; agglomerative methods; divisive methods; simulated data