期刊名称:Electronic Journal of Applied Statistical Analysis
电子版ISSN:2070-5948
出版年度:2016
卷号:9
期号:1
页码:134-153
语种:English
出版社:University of Salento
摘要:Clustering of variables is one possible approach for reducing the dimensionality of a dataset. However, all the variables are usually assigned to one of the clusters, even the scattered variables associated with atypical or noise information. The presence of this type of information could obscure the interpretation of the latent variables associated with the clusters, or even give rise to artificial clusters. We propose two strategies to address this problem. The first is a "K +1" strategy, which consists of introducing an additional group of variables, called the "noise cluster" for simplicity. The second is based on the definition of sparse latent variables. Both strategies result in refined clusters for the identification of more relevant latent variables.
关键词:dimensionality reduction;clustering of variables;noise cluster;sparse latent variables