首页    期刊浏览 2024年10月04日 星期五
登录注册

文章基本信息

  • 标题:Optimizing Checkpoint Restart with Data Deduplication
  • 本地全文:下载
  • 作者:Zhengyu Chen ; Jianhua Sun ; Hao Chen
  • 期刊名称:Scientific Programming
  • 印刷版ISSN:1058-9244
  • 出版年度:2016
  • 卷号:2016
  • DOI:10.1155/2016/9315493
  • 出版社:Hindawi Publishing Corporation
  • 摘要:The increasing scale, such as the size and complexity, of computer systems brings more frequent occurrences of hardware or software faults; thus fault-tolerant techniques become an essential component in high-performance computing systems. In order to achieve the goal of tolerating runtime faults, checkpoint restart is a typical and widely used method. However, the exploding sizes of checkpoint files that need to be saved to external storage pose a major scalability challenge, necessitating the design of efficient approaches to reducing the amount of checkpointing data. In this paper, we first motivate the need of redundancy elimination with a detailed analysis of checkpoint data from real scenarios. Based on the analysis, we apply inline data deduplication to achieve the objective of reducing checkpoint size. We use DMTCP, an open-source checkpoint restart package, to validate our method. Our experiment shows that, by using our method, single-computer programs can reduce the size of checkpoint file by 20% and distributed programs can reduce the size of checkpoint file by 47%.
国家哲学社会科学文献中心版权所有