首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs
  • 本地全文:下载
  • 作者:Kyong-Ha Lee ; Woo Lam Kang ; Young-Kyoon Suh
  • 期刊名称:Scientific Programming
  • 印刷版ISSN:1058-9244
  • 出版年度:2018
  • 卷号:2018
  • DOI:10.1155/2018/2682085
  • 出版社:Hindawi Publishing Corporation
  • 摘要:Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.
国家哲学社会科学文献中心版权所有