期刊名称:Oriental Journal of Computer Science and Technology
印刷版ISSN:0974-6471
出版年度:2011
卷号:4
期号:1
页码:29-39
出版社:Oriental Scientific Publishing Company
摘要:Clustering has become an increasingly important task in modern application domains such asmarketing and purchasing assistance, multimedia, molecular biology etc. The goal of clustering is todecompose or partition a data set into groups such that both the intra-group similarity and the inter-group dissimilarity are maximized. In many applications, the size of the data that needs to be clusteredis much more than what can be processed at a single site. Further, the data to be clustered could beinherently distributed. The increasing demand to scale up to these massive data sets which are inherentlydistributed over networks with limited bandwidth and computational resources has led to methods forparallel and distributed data clustering. In this thesis, we present CIODD, a cohesive framework forcluster identification and outlier detection for distributed data. The core idea is to generate independentlocal models and combine the local models at a central server to obtain global clusters. A feedbackloop is then provided from the central site to the local sites to complete and refine the global clustersobtained. Our experimental results show the efficiency and accuracy of the CIODD approach