Failure Detectors in Omission Failure Environments.

Authors: Danny Dolev, Roy Friedman, Idit Keidar and Dahlia Malkhi.

Technical Report CS96-13, Institute of Computer Science, The Hebrew University of Jerusalem. September, 1996. Also: Technical Report 96-1608, Department of Computer Science, Cornell University.


We study failure detectors in an asynchronous environment that admits message omission failures. In such environments, processes may fail by crashing, but may also disconnect from each other. We adapt Chandra and Toueg's definitions of failure detection completeness and accuracy to the omissions failure model, and define a weak failure detector that allows any majority of the processes that become connected to reach a Consensus decision, despite any number of transient communication failures in their past. We provide a protocol that solves the Consensus problem in this model whenever a majority of the processes become connected, regardless of past omissions. Moreover, in our protocol it is not necessary to save and repeatedly send all past messages, which makes it more efficient than previous protocols in this model.

Postscript Version: ps, ps.gz. Israel mirror site: ps.gz.

A brief announcement summarizing the contribution of this work will appear in the Sixteenth ACM Symposium on Principles of Distributed Computing (PODC '97), Santa Barbara, CA, USA, August 21-24, 1997. Postscript version of the brief announcement: ps, ps.gz. Israel mirror site: ps, ps.gz.
Last modified: Mon Jul 1 14:32:18 EDT 2002