Check out the new USENIX Web site.


Sliver suitability to nodes over time

The suitability of a particular node to host a particular sliver depends not only on the resources available on that node, but also on the resource demands of that sliver over time. We therefore perform an analysis similar to that in the previous section, but accounting for both available node resources and application resource demand. Here we are interested not in the stability of available resources on individual nodes, but rather in the stability of the fraction of slivers whose resource requirements are met after deployment. It is the rate of decline of this fraction that dictates an appropriate migration interval for the application--very rapid decline will require prohibitively frequent migration, while very slow decline means migration will add little to a simple policy of intelligent initial sliver placement and re-deployment upon failure.

Thus, we ask what fraction of nodes onto which slivers are deployed at time T meet the requirements of their sliver at time T+x, for various values of x. For each T+x value, we average this measure over every possible deployment time T in our trace. A large fraction means that most slivers will be running on satisfactory hosts at the corresponding time. As in Section 3.2, we say that a node meets the requirements of a sliver if the node has enough free CPU and network bandwidth resources to support a new sliver assigned from the set of sliver resource demands found at that timestep in the trace, according to the load-sensitive or random placement policy.

Figures 14 and 15 show the fraction of slivers whose resource requirements were met at the time indicated on the X-axis, under both the random and load-sensitive schemes for initially mapping slivers to nodes at time X=0. Note that the random placement line is simply a horizontal line at the value corresponding to average across all time intervals from Figure 9 in the case of OpenDHT and Figure 10 in the case of Coral.

We make two primary observations from these graphs. First, the quality of the initially load-sensitive assignment degrades over time as node resources and sliver demands become increasingly mismatched. This argues for periodic migration to re-match sliver needs and available host resources. Second, the benefit of load-sensitive placement over random placement--the distance between the load-sensitive and random lines--erodes over time for the same reason, but persists. This persistence suggests that informed initial placement can be useful even in the absence of migration.

Choosing a desirable migration period requires balancing the cost of migrating a particular application's slivers against the rate at which the application mapping's quality declines. For example, in OpenDHT, migration is essentially ``free'' since data is stored redundantly--an OpenDHT instance can be killed on one node and re-instantiated on another node (and told to ``own'' the same DHT key range as before) without causing the service to lose any data. Coral and CoDeeN can also be migrated at low cost, as they are ``soft state'' services, caching web sites hosted externally to their service. An initially load-sensitive sliver mapping for OpenDHT has declined to close to its asymptotic value within 30 minutes, arguing for migrating poorly-matched slivers at that timescale or less. If migration takes place every 30 minutes, then the quality of the match will, on average, traverse the curve from t=0 to t=30 every 30 minutes, returning to t=0 after each migration. Coral and CoDeeN placement quality declines somewhat more quickly than OpenDHT, but migrating poorly matched slivers of these services on the order of every 30 minutes is unlikely to cause harm and will allow the system to maintain a somewhat better mapping than would be achieved with a less aggressive migration interval.

A comprehensive investigation of what application characteristics make migration more beneficial or less beneficial for one application compared to another is left to future work, as is emulation-based verification of our results (i.e., implementing informed resource selection and migration in real PlanetLab applications, and measuring user-perceived performance with and without those techniques under repeatable system conditions). Our focus in this paper is a simulation-based analysis of whether designers of future resource selection systems should consider including informed placement and migration capabilities, by showing that those techniques are potentially beneficial for several important applications on a popular existing platform.

Figure 14: Fraction of OpenDHT slivers hosted on nodes that meet the sliver's requirements at the time indicated on the X-axis.

Figure 15: Fraction of Coral slivers hosted on nodes that meet the sliver's requirements at the time indicated on the X-axis. CoDeeN showed similar results.

David Oppenheimer 2006-04-14