期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2012
卷号:7
期号:2
出版社:SERSC
摘要:In this paper, we design a low-cost checkpointing-based rollback recovery algorithm to address the traditional scalability problem of synchronous checkpointing in the completely different point of view compared with existing ones. This algorithm enables a cluster-wide set of processes to take their semi-global checkpointing procedure while a small set of cluster heads monitor local commit of their respective administrative areas and always observe the global consistency condition. It can considerably lower communication overhead that may occur in the previous ones. This feature can enormously decrease the frequency of cluster-to-cluster communications especially in large-scale hierarchical multi-cluster systems.