文章基本信息

标题：Improved Map Reduce K Mean Clustering Algorithm for Hadoop Architecture
作者：Shweta Mishra ; Vivek Badhe
期刊名称：International Journal of Engineering and Computer Science
印刷版ISSN：2319-7242
出版年度：2016
卷号：5
期号：7
页码：17144-17147
DOI：10.18535/ijecs/v5i7.06
出版社：IJECS
摘要：Cluster is a gathering of information individuals having comparable qualities. The procedure of setting up aconnection or getting data from crude information by performing a few operations on the information set like groupingis known as information mining. Information gathered in reasonable situations is usually totally arbitrary andunstructured. Consequently, there is dependably a requirement for examination of unstructured information sets todetermine important data. This is the place unsupervised calculations come into picture to prepare unstructured oreven semi organized information sets by resultant. K-Means Clustering is one such method used to give a structure tounstructured information so that significant data can be separated. Discusses the implementation of the K-MeansClustering Algorithm over a distributed environment using Apache Hadoop. The key to the implementation of the KMeansAlgorithm is the design of the Mapper and Reducer routines which has been discussed in the later part of thepaper. The steps involved in the execution of the K-Means Algorithm has also been described and this based on asmall scale implementation of the K-Means Clustering Algorithm on an experimental setup to serve as a guide forpractical implementations.
关键词：K-Means Clustering; MapReduce; Hadoop;Data Mining; Distributed Computing