期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2011
卷号:34
期号:04
页码:1-8
出版社:IEEE Computer Society
摘要:Adaptive techniques can dramatically improve performance and simplify tuning for MapReduce jobs.
However, their implementation often requires global coordination between map tasks, which breaks a
key assumption of MapReduce that mappers run in isolation. We show that it is possible to preserve fault-
tolerance, scalability, and ease of use of MapReduce by allowing map tasks to utilize a limited set of high-
level coordination primitives. We have implemented these primitives on top of an open source distributed
coordination service. We expose adaptive features in a high-level declarative query language, Jaql, by
utilizing unique features of the language, such as higher-order functions and physical transparency. For
instance, we observe that maintaining a small amount of global state could help improve performance
for a class of aggregate functions that are able to limit the output based on a global threshold. Such
algorithms arise, for example, in Top-K processing, skyline queries, and exception handling. We provide
a simple API that facilitates safe and efficient development of such functions.