首页    期刊浏览 2024年11月05日 星期二
登录注册

文章基本信息

  • 标题:Distributed K-means based-on Soft Constraints
  • 本地全文:下载
  • 作者:Y.C. Yu ; J.D. Wang ; G.S. Zheng
  • 期刊名称:Journal of Software Engineering
  • 印刷版ISSN:1819-4311
  • 电子版ISSN:2152-0941
  • 出版年度:2011
  • 卷号:5
  • 期号:4
  • 页码:116-126
  • DOI:10.3923/jse.2011.116.126
  • 出版社:Academic Journals Inc., USA
  • 摘要:Pairwise constraints can effectively improve the clustering results. However, noise constraints will seriously affect the performance of clustering. To improve the distributed clustering with constraints, distributed k-means based-on soft constraints, which constraint violations can be effectively dealt with, is presented in this paper. Aiming at the limitation of distributed clustering, such as communication cost and data privacy etc., only positive constraints by chunklets are used in the proposed method. To simplify the treatment of constrained data points, the mean value of chunklet is used as the representative point. Then positive constraints among chunklet are approximately transformed into pairwise positive constraints between each data points from the chunklet and the mean value. Thus, the cluster label of each mean value is regarded as the label estimation of data points from the chunklet. Based on the above approximation, a new measure of partition cost used to deal with constraint violations is defined. Therefore, for unconstrained data points, the within-cluster sum of distance squares can be minimized. Meanwhile, for constrained data points, the sum of distance between data points and corresponding centriods and the cost of constraint violations is minimized too. The experimental results showed that the proposed method decreases the computation complexity of constraint violations. Compared with hard constrained distributed clustering, the clustering accuracy of the proposed method is increased.
国家哲学社会科学文献中心版权所有