文章基本信息

标题：Clustering Data Streams Based On Shared Density between Micro-Clusters
本地全文：下载
作者：Karishma Nadhe ; P. M. Chawan
期刊名称：International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN：2347-6710
电子版ISSN：2319-8753
出版年度：2017
卷号：6
期号：6
页码：12259
DOI：10.15680/IJIRSET.2017.0606296
出版社：S&S Publications
摘要：As more and more applications deliver streaming data, clustering data streams has become ancrucialmethod for data and knowledge engineering. A normal approach is to summarize the data stream in real-timewith an online process into countless called micro-clusters. Micro-clusters represent local density estimates byassemble the information of many data points in a defined area. On request, a (modified) traditional clusteringalgorithm is used in a second offline step to recluster the microclusters into larger final clusters. For reclustering, thecoordinator of the micro-clusters is used as pseudo points with the density estimates used as their weights. However,information about density in the area between micro-clusters is not preserved in the online process and reclustering isbased on possibly inaccurate assumptions about the distribution of data within and between micro-clusters (e.g.,uniform or Gaussian). This paper depicts DBSTREAM, the first micro-cluster-based online clustering component thatexplicitly captures the density between micro-clusters via a shared density graph. The density information in this graphis then exploited for reclustering based on actual density between modified micro-clusters. We discuss the space andtime complexity of maintaining the shared density graph. Experiments on a wide range of artificial and real data setshighlight that using shared density improves clustering quality over other popular data stream clustering methods whichrequire the creation of a larger number of smaller microclusters to achieve comparable results.
关键词：Data mining; data stream clustering; density-based clustering; data stream clustering; density-based;clustering; ensemble clustering.