期刊名称:International Journal of Distributed and Parallel Systems
印刷版ISSN:2229-3957
电子版ISSN:0976-9757
出版年度:2012
卷号:3
期号:6
DOI:10.5121/ijdps.2012.3605
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:An effective technique to process and analyse large amounts of data is achieved through using the MapReduce framework. It is a programming model which is used to rapidly process vast amount of data in parallel and distributed mode operating on a large cluster of machines. Hadoop, an open-source implementation, is an example of MapReduce for writing and running MapReduce applications. The problem is to specify, which computing environment improves the performance of MapReduce to process large amounts of data? A standalone and cloud computing implementation are used for the experiment to evaluate whether the performance of running MapReduce system in cloud computing mode is better than in stand-alone mode or not, with respect to the speed of processing, response time and cost efficiency. This comparison uses different sizes of dataset to show the functionality of MapReduce to process large datasets in both modes. The finding is, running a MapReduce program to process and analysis of large datasets in a cloud computing environment is more efficient than running in a stand-alone mode
关键词:MapReduce; Hadoop; Cloud Computing; Data Processing; Parallel and Distributed Processing