出版社:Japan Science and Technology Information Aggregator, Electronic
摘要:In cloud computing, a large-scale parallel-distributed processing service is provided in which a huge task is split into a number of subtasks, which are processed independently on a cluster of machines referred to as workers. Those workers that take longer to process their assigned subtasks result in the processing delay of the task (the issue of stragglers). An efficient way to address this issue is for other workers to execute the troubled subtasks for backup purposes (task replication). In this paper, we evaluate the efficiency of task replication from a theoretical point of view. The mean value and standard deviation of the task-processing time are derived approximately using extreme value theory, while the mean total processing time is evaluated exactly, for cases in which the worker-processing time follows a hyper-exponential, Weibull, or Pareto distribution. The numerical results reveal that the efficiency of task replication depends significantly on the tail of the worker-processing time distribution. In addition, the optimal number of replications which achieves the shortest task-processing time mainly depends on the coefficient of variation of the worker-processing time. Furthermore, three replications are effective to guarantee a low variance of the task-processing time, regardless of the tail.
关键词:Mathematical modeling;parallel-distributed processing;task scheduling;task replication;extreme value theory;performance analysis