首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm
  • 本地全文:下载
  • 作者:Ahmed Fahim
  • 期刊名称:International Journal of Intelligent Systems and Applications
  • 印刷版ISSN:2074-904X
  • 电子版ISSN:2074-9058
  • 出版年度:2020
  • 卷号:12
  • 期号:6
  • 页码:1-20
  • DOI:10.5815/ijisa.2020.06.01
  • 出版社:MECS Publisher
  • 摘要:The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.
  • 关键词:Data clustering;k in k-means;initial centers in k-means;clustering algorithms
国家哲学社会科学文献中心版权所有