首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Improving MapReduce Performance by Data Prefetching in Heterogeneous or Shared Environments
  • 本地全文:下载
  • 作者:Tao Gu ; Chuang Zuo ; Qun Liao
  • 期刊名称:International Journal of Grid and Distributed Computing
  • 印刷版ISSN:2005-4262
  • 出版年度:2013
  • 卷号:6
  • 期号:5
  • 页码:71-82
  • 出版社:SERSC
  • 摘要:MapReduce is an effective programming model for large-scale data-intensive computing applications. Hadoop, an open-source implementation of MapReduce, has been widely used. The communication overhead from the big data sets' transmission affects the performance of Hadoop greatly. In consideration of data locality, Hadoop schedules tasks to the nodes near the data locations preferentially to decrease data transmission overhead, which works well in homogeneous and dedicated MapReduce environments. However, due to practical considerations about cost and resource utilization, it is common to maintain heterogeneous clusters or share resources by multiple users. Unfortunately, it's difficult to take advantage of data locality in these heterogeneous or shared environments. To improve the performance of MapReduce in heterogeneous or shared environments, a data prefetching mechanism is proposed in this paper, which can fetch the data to corresponding compute nodes in advance. It is proved that the proposal of this paper reduces data transmission overhead effectively with theoretical analysis. The mechanism is implemented and evaluated on Hadoop-1.0.4. Experiment results on real applications show that the data prefetching mechanism can reduce data transmission time by up to 94%
  • 关键词:MapReduce; Hadoop; Data Prefetching; Data Transmission
国家哲学社会科学文献中心版权所有