首页    期刊浏览 2024年07月16日 星期二
登录注册

文章基本信息

  • 标题:Managing Skew in Hadoop
  • 本地全文:下载
  • 作者:YongChul Kwon ; Kai Ren ; Magdalena Balazinska
  • 期刊名称:Bulletin of the Technical Committee on Data Engineering
  • 出版年度:2013
  • 卷号:36
  • 期号:1
  • 出版社:IEEE Computer Society
  • 摘要:Challenges in Big Data analytics stem not only from volume, but also variety: extreme diversity in bothdata types (e.g., text, images, and graphs) and in operations beyond relational algebra (e.g., machinelearning, natural language processing, image processing, and graph analysis). As a result, any com-petitive Big Data system must support some form of parallel user-defined operations (UDOs) that cancapture complex data processing tasks over complex data types without changing the core of the paralleldata processing engine. Hadoop and other popular systems have been shown to provide a convenientprogramming model for implementing parallel UDOs, but the "black-box" nature of UDOs compli-cates the automatic load balancing required to achieve parallel scalability. In this paper, we present anoverview of some of our recent work that tackles the problem of load imbalance (a.k.a. skew) in parallelUDO evaluation. We first discuss the prevalence of skew in today's applications and clusters. We thendiscuss our experience with static and dynamic methods for mitigating it.
国家哲学社会科学文献中心版权所有