XFT: Practical Fault Tolerance beyond Crashes

Authors: 

Shengyun Liu, National University of Defense Technology; Paolo Viotti, EURECOM; Christian Cachin, IBM Research–Zurich; Vivien Quéma, Grenoble Institute of Technology; Marko Vukolić, IBM Research–Zurich

Abstract: 

Despite years of intensive research, Byzantine faulttolerant (BFT) systems have not yet been adopted in practice. This is due to additional cost of BFT in terms of resources, protocol complexity and performance, compared with crash fault-tolerance (CFT). This overhead of BFT comes from the assumption of a powerful adversary that can fully control not only the Byzantine faulty machines, but at the same time also the message delivery schedule across the entire network, effectively inducing communication asynchrony and partitioning otherwise correct machines at will. To many practitioners, however, such strong attacks appear irrelevant.

In this paper, we introduce cross fault tolerance or XFT, a novel approach to building reliable and secure distributed systems and apply it to the classical state-machine replication (SMR) problem. In short, an XFT SMR protocol provides the reliability guarantees of widely used asynchronous CFT SMR protocols such as Paxos and Raft, but also tolerates Byzantine faults in combination with network asynchrony, as long as a majority of replicas are correct and communicate synchronously. This allows the development of XFT systems at the price of CFT (already paid for in practice), yet with strictly stronger resilience than CFT — sometimes even stronger than BFT itself.

As a showcase for XFT, we present XPaxos, the first XFT SMR protocol, and deploy it in a geo-replicated setting. Although it offers much stronger resilience than CFT SMR at no extra resource cost, the performance of XPaxos matches that of the state-of-the-art CFT protocols.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {199398,
author = {Shengyun Liu and Paolo Viotti and Christian Cachin and Vivien Quema and Marko Vukolic},
title = {{XFT}: Practical Fault Tolerance beyond Crashes},
booktitle = {12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16)},
year = {2016},
isbn = {978-1-931971-33-1},
address = {Savannah, GA},
pages = {485--500},
url = {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/liu},
publisher = {{USENIX} Association},
month = nov,
}

Presentation Audio