文章基本信息

标题：Efficient Processing and recouping of data using combiners in Map Reduce framework
本地全文：下载
作者：V HEMANTH KUMAR ; M PURNA CHANDRA RAO ; CH NARAYANARAO 等
期刊名称：Indian Journal of Computer Science and Engineering
印刷版ISSN：2231-3850
电子版ISSN：0976-5166
出版年度：2017
卷号：8
期号：6
页码：674-678
出版社：Engg Journals Publications
摘要：Consider any data structure, an Array for instance and declare the size of an Array either usingstatic approach or dynamic approach. This cannot be a generic solution for large text files as this involvesin huge memory allocations for the data structure. Even this can be a difficult procedure as the data sizeincreases, processing the data will be time consuming process. Existing solutions such as lists and evenheap will process the data effectively for large text files even to a certain boundary level (depends on theram constraint). Addressing these huge volumes of data, the solution will not work in a single node and ithas to spread across the cluster (storing data on the disk) .Hadoop will address all these big dataproblems using map reduce technique, as processing will be done in parallel manner. Map reduce is afunctional programming model which has two functions map and reduce and will perform distributedparallel processing. In order to make the retrieval much faster, introducing the concept of implementingcombiners between mapper and reducer. Implement a combiner function after the mapper function asthe mapper generates output. The combined data that is performed by the combiners will be sent to theshuffle and sort functionality. And then from there it sends to the reduce function for obtaining the finaloutput. The time taken to retrieve the data after processing by map reduce without using combiners willbe more when compared with the map reduce processing using combiners. We generally make use ofcomputation time and data transfer time constraints to support the above statement. This paper presentsan effective approach for processing big data using combiners which will be also considered as map sidereducers or mini reducers.
关键词：Map Reduce; Cluster; HDFS; Yarn; Combiners; Hadoop