摘要:To solve the problem of distributed data streams clustering, the algorithm DB-DDSC (Density-Based Distribute Data Stream Clustering) was proposed. The algorithm consisted of two stages. First presented the concept of circular-point based on the representative points and designed the iterative algorithm to find the density-connected circular-points, then generated the local model at the remote site. Second designed the algorithm to generate global clusters by combining the local models at coordinator site. The DB-DDSC algorithm can find the the clusters of different shapes under the distributed data stream environment, avoid frequently sending data by using the test-update algorithm, and reduce the data transmission. The experiments show that the DB-DDSC algorithm is feasible and scale expandable.
关键词:data streams;data mining;clustering;distributed data stream