首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Noisy, Greedy and Not so Greedy k-Means++
  • 本地全文:下载
  • 作者:Anup Bhattacharya ; Jan Eube ; Heiko R{"o}glin
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2020
  • 卷号:173
  • 页码:18:1-18:21
  • DOI:10.4230/LIPIcs.ESA.2020.18
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:The k-means++ algorithm due to Arthur and Vassilvitskii [David Arthur and Sergei Vassilvitskii, 2007] has become the most popular seeding method for Lloyd’s algorithm. It samples the first center uniformly at random from the data set and the other k-1 centers iteratively according to D²-sampling, i.e., the probability that a data point becomes the next center is proportional to its squared distance to the closest center chosen so far. k-means++ is known to achieve an approximation factor of ð'ª(log k) in expectation. Already in the original paper on k-means++, Arthur and Vassilvitskii suggested a variation called greedy k-means++ algorithm in which in each iteration multiple possible centers are sampled according to D²-sampling and only the one that decreases the objective the most is chosen as a center for that iteration. It is stated as an open question whether this also leads to an ð'ª(log k)-approximation (or even better). We show that this is not the case by presenting a family of instances on which greedy k-means++ yields only an Ω(ð"â<.log k)-approximation in expectation where ð" is the number of possible centers that are sampled in each iteration. Inspired by the negative results, we study a variation of greedy k-means++ which we call noisy k-means++ algorithm. In this variation only one center is sampled in every iteration but not exactly by D²-sampling. Instead in each iteration an adversary is allowed to change the probabilities arising from D²-sampling individually for each point by a factor between 1-εâ, and 1+εâ,, for parameters εâ, â^^ [0,1) and εâ,, ≥ 0. We prove that noisy k-means++ computes an ð'ª(log² k)-approximation in expectation. We use the analysis of noisy k-means++ to design a moderately greedy k-means++ algorithm.
  • 关键词:k-means++; greedy; adaptive sampling
国家哲学社会科学文献中心版权所有