首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Recovering Shared Objects Without Stable Storage
  • 本地全文:下载
  • 作者:Ellis Michael ; Dan R. K. Ports ; Naveen Kr. Sharma
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2017
  • 卷号:91
  • 页码:36:1-36:16
  • DOI:10.4230/LIPIcs.DISC.2017.36
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:This paper considers the problem of building fault-tolerant shared objects when processes can crash and recover but lose their persistent state on recovery. This Diskless Crash-Recovery (DCR) model matches the way many long-lived systems are built. We show that it presents new challenges, as operations that are recorded at a quorum may not persist after some of the processes in that quorum crash and then recover. To address this problem, we introduce the notion of crash-consistent quorums, where no recoveries happen during the quorum responses. We show that relying on crash-consistent quorums enables a recovery procedure that can recover all operations that successfully finished. Crash-consistent quorums can be easily identified using a mechanism we term the crash vector, which tracks the causal relationship between crashes, recoveries, and other operations. We apply crash-consistent quorums and crash vectors to build two storage primitives. We give a new algorithm for multi-writer, multi-reader atomic registers in the DCR model that guarantees safety under all conditions and termination under a natural condition. It improves on the best prior protocol for this problem by requiring fewer rounds, fewer nodes to participate in the quorum, and a less restrictive liveness condition. We also present a more efficient single-writer, single-reader atomic set - a virtual stable storage abstraction. It can be used to lift any existing algorithm from the traditional Crash-Recovery model to the DCR model. We examine a specific application, state machine replication, and show that existing diskless protocols can violate their correctness guarantees, while ours offers a general and correct solution.
  • 关键词:asynchronous system; fault-tolerance; crash-recovery; R/W register; state machine replication
国家哲学社会科学文献中心版权所有