首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Clustering Data Streams Based On Shared Density between Micro-Clusters
  • 本地全文:下载
  • 作者:Karishma Nadhe ; P. M. Chawan
  • 期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
  • 印刷版ISSN:2347-6710
  • 电子版ISSN:2319-8753
  • 出版年度:2017
  • 卷号:6
  • 期号:6
  • 页码:12259
  • DOI:10.15680/IJIRSET.2017.0606296
  • 出版社:S&S Publications
  • 摘要:As more and more applications deliver streaming data, clustering data streams has become ancrucialmethod for data and knowledge engineering. A normal approach is to summarize the data stream in real-timewith an online process into countless called micro-clusters. Micro-clusters represent local density estimates byassemble the information of many data points in a defined area. On request, a (modified) traditional clusteringalgorithm is used in a second offline step to recluster the microclusters into larger final clusters. For reclustering, thecoordinator of the micro-clusters is used as pseudo points with the density estimates used as their weights. However,information about density in the area between micro-clusters is not preserved in the online process and reclustering isbased on possibly inaccurate assumptions about the distribution of data within and between micro-clusters (e.g.,uniform or Gaussian). This paper depicts DBSTREAM, the first micro-cluster-based online clustering component thatexplicitly captures the density between micro-clusters via a shared density graph. The density information in this graphis then exploited for reclustering based on actual density between modified micro-clusters. We discuss the space andtime complexity of maintaining the shared density graph. Experiments on a wide range of artificial and real data setshighlight that using shared density improves clustering quality over other popular data stream clustering methods whichrequire the creation of a larger number of smaller microclusters to achieve comparable results.
  • 关键词:Data mining; data stream clustering; density-based clustering; data stream clustering; density-based;clustering; ensemble clustering.
国家哲学社会科学文献中心版权所有