期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2013
卷号:3
期号:6
出版社:S.S. Mishra
摘要:K-means is a well-known clustering algorithm in the field of data mining. It is simple to implement and its speed allows it to run on large data sets. However, it also has a drawback. Advancement in many data collection techniques has been generating enormous amount of data, leaving scientists with the challenging task of processing them. Its performance will not be sufficient when it has to deal with large data sets. To solve this problem, a method is proposed in this paper by which k-means will be implemented using OpenCL heterogeneous computing platform with the help of Hadoop-MapReduce framework. MapReduce is a framework which is pioneered by Goggle for distributed programming. It includes user specified Map and Reduce functions which process inputs in the form of key/value pairs. Along with the MapReduce paradigm, Hadoop also implements HDFS which is known distributed file system. GPU Computing with many-core graphics processors is playing today an important role in the advancements of modern highly concurrent processors. Their ability to accelerate computation is being explored under several scientific fields. OpenCL is a heterogeneous computing platform and one of the widely used for GPU Computing. In the current paper we present the acceleration of a widely used data clustering algorithm, K-means, implemented using Hadoop & MapReduce framework, in the context of heterogeneous computing devices like CPUs and GPUs.