期刊名称:International Journal of Hybrid Information Technology
印刷版ISSN:1738-9968
出版年度:2015
卷号:8
期号:2
页码:301-310
DOI:10.14257/ijhit.2015.8.2.28
出版社:SERSC
摘要:SnIClustering Algorithm is put forward to deal with the large number of intermediate values when processing MapReduce. SnIClustering Algorithm picks up a few representative data through cluster sampling, and then retains the useful data through filtration according to the distribution characteristics. By doing so, intermediate values of MapReduce can be reduced sharply, saving time and easing network load. The last step is to cluster the selected data and samples. Experimental results show that SnIClustering is suitable to process large-scale data, since it can both process large-scale data within a short time and maintain fine clustering effect.