Yank: Enabling Green Data Centers to Pull the Plug

Authors: 

Rahul Singh, David Irwin, and Prashant Shenoy, University of Massachusetts Amherst; K.K. Ramakrishnan, AT&T Labs—Research

Abstract: 

Balancing a data center’s reliability, cost, and carbon emissions is challenging. For instance, data centers designed for high availability require a continuous flow of power to keep servers powered on, and must limit their use of clean, but intermittent, renewable energy sources. In this paper, we present Yank, which uses a transient server abstraction to maintain server availability, while allowing data centers to “pull the plug” if power becomes unavailable. A transient server’s defining characteristic is that it may terminate anytime after a brief advance warning period. Yank exploits the advance warning—on the order of a few seconds—to provide high availability cheaply and efficiently at large scales by enabling each backup server to maintain “live” memory and disk snapshots for many transient VMs. We implement Yank inside of Xen. Our experiments show that a backup server can concurrently support up to 15 transient VMs with minimal performance degradation with advance warnings as small as 10 seconds, even when VMs run memory-intensive interactive web applications.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {180301,
author = {Rahul Singh and David Irwin and Prashant Shenoy and K.K. Ramakrishnan},
title = {Yank: Enabling Green Data Centers to Pull the Plug},
booktitle = {10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13)},
year = {2013},
isbn = {978-1-931971-00-3},
address = {Lombard, IL},
pages = {143--155},
url = {https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/singh},
publisher = {USENIX Association},
month = apr
}

Presentation Video 

Presentation Audio

Public Summary: 

by Ratul Mahajan

Current hypervisors offer live migration, a method to transfer the execution state (memory and disk) of a VM from one server to another, without any disruption in service. However, live migration requires the old server to stay online for a significant, undefined amount of time after migration is triggered, since a large amount of †state may need to be copied from the old to the new server. An alternative, called Remus, was proposed by Cully et al. in NSDI 2008. In Remus, an up-to-date copy of VM state is maintained on a backup server. Remus allows the old server to go offline at any instant (e.g., it can fail without warning), but its overhead in terms of network bandwidth and redundancy is high (1:1).

Yank offers an alternative between these extremes. It targets a setting in which advance warning, on the order of 5-10 seconds, is available as to when a server will go offline. The authors claim that such a warning may be available in green data centers of the future, where some "transient" servers run on unreliable, renewable energy. Once the renewable energy source goes away, a warning can be generated based on the capacity of the UPS (uninterrupted power supply). †Yank ensures that there will be no loss in state when the transient server goes offline by maintaining a slightly out-of-date version of VM state on a backup server. The amount of un-copied state on the transient server is maintained such that it can be copied to the backup server within the warning period. †In addition to bounding the amount of time within which the transient server can be taken offline, Yank needs less network bandwidth and redundancy than Remus; one backup server suffices for many (15 in the authors' experiments) transient servers.

The program committee liked the abstraction offered by Yank—managing the maximum backup time of a VM—and felt that it could be more broadly useful than its target setting. (In fact, to several reviewers, the focus on green data centers seemed a distraction as thepaper does not discuss important issues like the total power consumption of the DC and the energy implications of added redundancy.)  The paper does a good job in ironing out the details and optimizations that make Yank practical. †What remains to be seen, however, is the range of workloads for which Yank's proposed abstraction will prove most useful—these would be applications that prefer transparent durability for their state in the common case but can withstand occasional fail-stop failures (from which Yank offers no protection).

Comments

0 likes
0 dislikes