首页    期刊浏览 2024年07月03日 星期三
登录注册

文章基本信息

  • 标题:An Improved K-means Algorithm based on Mapreduce and Grid
  • 本地全文:下载
  • 作者:Li Ma ; Lei Gu ; Bo Li
  • 期刊名称:International Journal of Grid and Distributed Computing
  • 印刷版ISSN:2005-4262
  • 出版年度:2015
  • 卷号:8
  • 期号:1
  • 页码:189-200
  • DOI:10.14257/ijgdc.2015.8.1.18
  • 出版社:SERSC
  • 摘要:The traditional K-means clustering algorithm is difficult to initialize the number of clusters K, and the initial cluster centers are selected randomly, this makes the clustering results very unstable. Meanwhile, algorithms are susceptible to noise points. To solve the problems, the traditional K-means algorithm is improved. The improved method is divided into the same grid in space, according to the size of the data point property value and assigns it to the corresponding grid. And count the number of data points in each grid. Selecting M(M>K) grids, comprising the maximum number of data points, and calculate the central point. These M central points as input data, and then to determine the k value based on the clustering results. In the M points, find K points farthest from each other and those K center points as the initial cluster center of K-means clustering algorithm. At the same time, the maximum value in M must be included in K. If the number of data in the grid less than the threshold, then these points will be considered as noise points and be removed. In order to make the improved algorithm can adapt to handle large data. We will parallel the improved k-mean algorithm and combined with the MapReduce framework. Theoretical analysis and experimental results show that the improved algorithm compared to the traditional K-means clustering algorithm has high quality results, less iteration and has good stability. Parallelized algorithm has a very high efficiency in data processing, and has good scalability and speedup
  • 关键词:Cluster analysis; K-means; Grid; DBSCAN; MapReduce
国家哲学社会科学文献中心版权所有