首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Regularized k-means clustering of high-dimensional data and its asymptotic consistency
  • 本地全文:下载
  • 作者:Wei Sun ; Junhui Wang ; Yixin Fang
  • 期刊名称:Electronic Journal of Statistics
  • 印刷版ISSN:1935-7524
  • 出版年度:2012
  • 卷号:6
  • 页码:148-167
  • DOI:10.1214/12-EJS668
  • 语种:English
  • 出版社:Institute of Mathematical Statistics
  • 摘要:K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clustering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stability is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering.
  • 关键词:K-means;diverging dimension;lasso;selection consistency;variable selection;stability.
国家哲学社会科学文献中心版权所有