首页    期刊浏览 2025年02月23日 星期日
登录注册

文章基本信息

  • 标题:kamila: Clustering Mixed-Type Data in R and Hadoop
  • 本地全文:下载
  • 作者:Alexander H. Foss ; Marianthi Markatou
  • 期刊名称:Journal of Statistical Software
  • 印刷版ISSN:1548-7660
  • 电子版ISSN:1548-7660
  • 出版年度:2018
  • 卷号:83
  • 期号:1
  • 页码:1-44
  • DOI:10.18637/jss.v083.i13
  • 语种:English
  • 出版社:University of California, Los Angeles
  • 摘要:In this paper we discuss the challenge of equitably combining continuous (quantitative) and categorical (qualitative) variables for the purpose of cluster analysis. Existing techniques require strong parametric assumptions, or difficult-to-specify tuning parameters. We describe the kamila package, which includes a weighted k-means approach to clustering mixed-type data, a method for estimating weights for mixed-type data (ModhaSpangler weighting), and an additional semiparametric method recently proposed in the literature (KAMILA). We include a discussion of strategies for estimating the number of clusters in the data, and describe the implementation of one such method in the current R package. Background and usage of these clustering methods are presented. We then show how the KAMILA algorithm can be adapted to a map-reduce framework, and implement the resulting algorithm using Hadoop for clustering very large mixed-type data sets.
国家哲学社会科学文献中心版权所有