期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2018
卷号:6
期号:1
页码:302
DOI:10.15680/IJIRCCE.2017.0601051
出版社:S&S Publications
摘要:Hadoop parallelizes job execution with map and reduce tasks. Shuffle, the all-to-all input data fetchphase in a reduce task can remarkably affect job performance. To attribute the delay in job completion to the couplingof the shuffle phase and reduce tasks, fails to address data distribution skew among reduce tasks, and makes taskscheduling inefficient. In this work, a proposal is made to decouple shuffle from reduce tasks and convert it into aplatform service provided by Hadoop. To present iShuffle, a user-transparent shuffle service that pro-actively pushesmap output data to nodes via a novel shuffle-on-write operation and flexibly schedules reduce tasks consideringworkload balance.
关键词:MapReduce; iShuffle; Data distribution skew; Task scheduling.