摘要:A failure detection service is perfect if it eventually detects all failures and every detection correctly identifies a failure that has occurred. Such a perfect failure detection service serves as a basic building block for many reliable distributed systems, for example in distributed lock services. In this paper, we introduce a perfect failure detection scheme in order to improve the fault tolerance of the service. We provide the precise system model and specification for a failure detection service. We present two novel algorithms that implement the failure detection service. We further develop a set of quality-of-service (QoS) metrics for perfect failure detection services, and apply probabilistic analysis to quantify the QoS metrics of the two algorithms.
关键词:failure detection;distributed system;quality of service