Geographically Distributed System for Catastrophic Recovery

Kevin Adams

Geographically Distributed System for Catastrophic Recovery

This paper presents the results of a proof-of-concept implementation of an on-going project to create a cost effective method to provide geographic distribution of critical portions of a data center along with methods to make the transition to these backup services quick and accurate. The project emphasizes data integrity over timeliness and prioritizes services to be offered at the remote site. The paper explores the tradeoff of using some common clustering techniques to distribute a backup system over a significant geographical area by relaxing the timing requirements of the cluster technologies at a cost of fidelity.

The trade-off is that the fail-over node is not suitable for high availability use as some loss of data is expected and fail-over time is measured in minutes not in seconds. Asynchronous mirroring, exploitation of file commonality in file updates, IP Quality of Service and network efficiency mechanisms are enabling technologies used to provide a low bandwidth solution for the communications requirements. Exploitation of file commonality in file updates decreases the overall communications requirement. IP Quality of Service mechanisms are used to guarantee a minimum available bandwidth to ensure successful data updates. Traffic shaping in conjunction with asynchronous mirroring is used to provide an efficient use of network bandwidth.

Traffic shaping allows a maximum bandwidth to be set minimizing the impact on the existing infrastructure and provides a lower requirement for a service level agreement if shared media is used. The resulting disaster recovery site, allows off-line verification of disaster recovery procedures and quick recovery times of critical data center services that is more cost effective than a transactionally aware replication of the data center and more comprehensive than a commercial data replication solution used exclusively for data vaulting. The paper concludes with a discussion of the empirical results of a proof-of-concept implementation.

Kevin Adams, NSWCDD

BibTeX

@inproceedings {270527,
author = {Kevin Adams},
title = {Geographically Distributed System for Catastrophic Recovery},
booktitle = {16th Systems Administration Conference (LISA 02)},
year = {2002},
address = {Philadelphia, PA},
url = {https://www.usenix.org/conference/lisa-02/geographically-distributed-system-catastrophic-recovery},
publisher = {USENIX Association},
month = nov
}

Download

Links

Paper:

http://usenix.org/publications/library/proceedings/lisa02/tech/full_papers/adams/adams.pdf

Paper (HTML):

http://usenix.org/publications/library/proceedings/lisa02/tech/full_papers/adams/adams_html/index.html

USENIX Conference Policies

Geographically Distributed System for Catastrophic Recovery

Kevin Adams, NSWCDD

Links