摘要:Clustering evolving data streams is
important to be performed in a limited time with a reasonable quality. The
existing micro clustering based methods do not consider the distribution of
data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based
clustering algorithm over evolving data Stream), a density-based
clustering algorithm using leader clustering. The algorithm is based on a
two-phase clustering. The online phase selects the proper mini-micro or
micro-cluster leaders based on the distribution of data points in the micro
clusters. Then, the leader centers are sent to the offline phase to form final
clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro
leaders, we decrease time complexity of the clustering while maintaining the
cluster quality. A pruning strategy is also used to filter out real data from
noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our
performance study over a number of real and synthetic data sets demonstrates
the effectiveness and efficiency of our method.
关键词:Evolving Data Streams; Density-Based Clustering; Micro Cluster; Mini-Micro Cluster