期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2013
卷号:49
期号:3
出版社:Journal of Theoretical and Applied
摘要:Nowadays multicore processors are increasingly being deployed in high performance computing systems. As the complexity of systems increases, the probability of failure increases substantially. Therefore, the system requires techniques for supporting fault tolerance. Checkpointing is one of the prevalent fault tolerant techniques reducing the execution time of long-running programs in presence of failures. Optimizing the number of checkpoints in a parallel application running on a multicore processor is a complicated and challenging task. Infrequent checkpointing results in long reprocessing time, while too short checkpointing intervals lead to high checkpointing overhead. Since this is a multi-objective optimization problem, trapping in local optimums is very plausible. This paper presents a novel 0-1 integer linear programming (ILP) formulation for solving optimal checkpoint placement problem for parallel applications running on a multicore machine. Our experimental results demonstrate that our solution leads to a better execution time saving over existing methods.
关键词:Fault Tolerance; Optimal Checkpoint Placement; Multicore Architectures; Integer Linear Programming