期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2011
卷号:34
期号:01
出版社:IEEE Computer Society
摘要:Many practically important problems involve processing very large data sets, such as for web scale data
mining and indexing. An efficient method to manage such problems is to use data intensive distributed
programming paradigms such as MapReduce and Dryad, that allow programmers to easily parallelize
the processing of large data sets where parallelism arises naturally by operating on different parts of
the data. Such data intensive computing infrastructures are now deployed at scales where the resource
costs, especially the energy costs of operating these infrastructures, have become a significant concern.
Many opportunities exist for optimizing the energy costs for data intensive computing and this paper
addresses one of them. We dynamically right size the resource allocations to the parallelized tasks such
that the effective hardware configuration matches the requirements of each task. This allows our system
to amortize the idle power usage of the servers across a larger amount of workload, increasing energy
efficiency as well as throughput. This paper describes why such dynamic resource allocation is useful
and presents the key techniques used in our solutioN