usenix conference policies
Surviving Internet Catastrophes
In this paper, we propose a new approach for designing distributed systems to survive Internet catastrophes called informed replication, and demonstrate this approach with the design and evaluation of a cooperative backup system called the Phoenix Recovery Service. Informed replication uses a model of correlated failures to exploit software diversity. The key observation that makes our approach both feasible and practical is that Internet catastrophes result from shared vulnerabilities. By replicating a system service on hosts that do not have the same vulnerabilities, an Internet pathogen that exploits a vulnerability is unlikely to cause all replicas to fail. To characterize software diversity in an Internet setting, we measure the software diversity of host operating systems and network services in a large organization. We then use insights from our measurement study to develop and evaluate heuristics for computing replica sets that have a number of attractive features. Our heuristics provide excellent reliability guarantees, result in low degree of replication, limit the storage burden on each host in the system, and lend themselves to a fully distributed implementation. We then present the design and prototype implementation of Phoenix, and evaluate it on the PlanetLab testbed.
author = {Flavio Junqueira and Ranjita Bhagwan and Alejandro Hevia and Keith Marzullo and Geoffrey M. Voelker},
title = {Surviving Internet Catastrophes},
booktitle = {2005 USENIX Annual Technical Conference (USENIX ATC 05)},
year = {2005},
address = {Anaheim, CA},
url = {https://www.usenix.org/conference/2005-usenix-annual-technical-conference/surviving-internet-catastrophes},
publisher = {USENIX Association},
month = apr
}
connect with us