期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2014
卷号:11
期号:1
出版社:IJCSI Press
摘要:A current trend in high-performance computing is the use of large-scale computing grids. These platforms consist of geographically distributed cluster federations gathering thousands of nodes. At this scale, node and network failures are no more exceptions, but belong to the normal system behavior. Thus, grid applications must tolerate failures and their evaluation should take reaction to failures into account. The failures of distributed computing system can be divided into three categories: node crash, network failure and process fault. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and tolerate faults in distributed computing systems. We propose, in this paper, a decentralized model of fault tolerance based on dynamic colored graphs. From this model, we show through some experiments, the benefits of colored graphs to manage failures in grids.