首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:Supporting Fault-Tolerance for Time-Critical Events in Distributed Environments
  • 本地全文:下载
  • 作者:Qian Zhu ; Gagan Agrawal
  • 期刊名称:Scientific Programming
  • 印刷版ISSN:1058-9244
  • 出版年度:2010
  • 卷号:18
  • 期号:1
  • 页码:51-76
  • DOI:10.1155/2010/298578
  • 出版社:Hindawi Publishing Corporation
  • 摘要:

    In this paper, we consider the problem of supporting fault tolerance for adaptive and time-critical applications in heterogeneous and unreliable grid computing environments. Our goal for this class of applications is to optimize a user-specified benefit function while meeting the time deadline. Our first contribution in this paper is a multi-objective optimization algorithm for scheduling the application onto the most efficient and reliable resources. In this way, the processing can achieve the maximum benefit while also maximizing the success-rate , which is the probability of finishing execution without failures. However, for the cases where failures do occur, we have developed a hybrid failure recovery scheme to ensure that the application can complete within the pre-specified time interval. Our experimental results show that our scheduling algorithm can achieve better benefit when compared to several heuristics-based greedy scheduling algorithms, while still having a negligible overhead. Benefit is further improved when we apply the hybrid failure recovery scheme, and the success-rate becomes 100%.

  • 关键词:Fault tolerance; time-critical event; adaptive application; grid computing
国家哲学社会科学文献中心版权所有