首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Recurring Job Optimization for Massively Distributed Query Processing
  • 本地全文:下载
  • 作者:Nicolas Bruno ; Sapna Jain ; Jingren Zhou
  • 期刊名称:Bulletin of the Technical Committee on Data Engineering
  • 出版年度:2013
  • 卷号:36
  • 期号:1
  • 出版社:IEEE Computer Society
  • 摘要:Companies providing cloud-scale data services have increasing needs to store and analyze massive datasets. For cost and performance reasons, processing is typically done on large clusters of tens of thou-sands of commodity machines. Developers use high-level scripting languages that simplify understand-ing various system trade-offs, but introduce new challenges for query optimization. One key optimizationchallenge is missing accurate data statistics, typically due to massive data volumes and their distributednature, complex computation logic, and frequent usage of user-defined functions. In this paper we de-scribe a technique to optimize a class of jobs that are recurring over time in a cloud-scale computationenvironment. By leveraging information gathered during previous executions we are able to obtain ac-curate statistics for new instances of recurring jobs, resulting in better execution plans. Experiments ona large-scale production system show that our techniques significantly improve cluster utilization
国家哲学社会科学文献中心版权所有