Check out the new USENIX Web site.

System environment

PlanetLab is a large-scale federated computing platform. It consists of over 500 nodes belonging to more than 150 organizations at over 250 sites in more than 30 countries. In general, approximately two-thirds of these nodes are functioning at any one time. PlanetLab currently performs no coordinated global scheduling, and users may deploy applications on any set of nodes at any time.

All PlanetLab nodes run identical versions of Linux on x86 CPUs. Applications on PlanetLab run in slices. A slice is a set of allocated resources distributed across platform nodes. From the perspective of an application deployer, a slice is roughly equivalent to a user in a traditional Unix system, but with additional resource isolation and virtualization [3], and with privileges on many nodes. The most common slice usage pattern is for an application deployer to run a single distributed application in a slice. This one-to-one correspondence between slices and applications is true for the applications we characterize in Section 3.2, and for many other slices as well. The set of allocated resources on a single node is referred to as a sliver. When we discuss ``migrating a sliver,'' we mean migrating the process(es) running on behalf of one slice on one node, to another node. We study approximately six months of PlanetLab node and network resource utilization data, from August 12, 2004 to January 31, 2005, collected from the following data sources. All sources periodically archive their measurements to a central node.

CoTop [17] collects data every 5 minutes on each node. It collects node-level information about 1-, 5-, and 15-minute load average; free memory; and free swap space. Additionally, for each sliver, CoTop collects a number of resource usage statistics, including average send and receive network bandwidth over the past 1 and 15 minutes, and memory and CPU utilization. All-pairs pings [24] measures the latency between all pairs of PlanetLab nodes every 15 minutes using the Unix ``ping'' command. iPerf [5] measures the available bandwidth between every pair of PlanetLab nodes once to twice a week, using a bulk TCP transfer.

In this study, we focus on CPU utilization and network bandwidth utilization. With respect to memory, we observed that almost all PlanetLab nodes operated with their physical memory essentially fully committed on a continuous basis, but with a very high fraction of their 1 GB of swap space free. In contrast, CPU and network utilization were highly variable.

In our simulations, we use CoTop's report of per-sliver network utilization as a measure of the sliver's network bandwidth demand, and we assume a per-node bandwidth capacity of 10 Mb/s, which was the default bandwidth limit on PlanetLab nodes during the measurement period. We use CoTop's report of per-sliver %CPU utilization as a proxy for the sliver's CPU demand. We use (1/(load+1))*100% to approximate the %CPU that would be available to a new application instance deployed on a node. Thus, for example, if we wish to deploy a sliver that needs 25% of the CPU cycles of a PlanetLab node and 1 Mb/s of bandwidth, we say that it will receive ``sufficient'' resources on any node with load <= 3 and on which competing applications are using at most a total of 9 Mb/s of bandwidth.

Note that this methodology of using observed resource utilization as a surrogate for demand tends to under-estimate true demand, due to resource competition. For example, TCP congestion control may limit a sliver's communication rate when a network link the sliver is using cannot satisfy the aggregate demand on that link. Likewise, when aggregate CPU demand on a node exceeds the CPU's capabilities, the node's process scheduler will limit each sliver's CPU usage to its ``fair share.'' Obtaining true demand measurements would require running each application in isolation on a cluster, subjecting it to the workload it received during the time period we studied. Note also that the resource demand of a particular sliver of an isolated application may change once that application is subjected to resource competition, even if the workload remains the same, if that competition changes how the application distributes work among its slivers. (Only one of the applications we studied took node load into account when assigning work; see Section 3.2.)

Additionally, recent work has shown that using (1/(load+1))*100% to estimate the %CPU available to a new process under-estimates actual available %CPU, as measured by an application running just a spin-loop, by about 10 percentage points most of the time [23]. The difference is due to PlanetLab's proportional share node CPU scheduling policy, which ensures that all processes of a particular sliver together consume no more than 1/m of the CPU, where m is the number of slivers demanding the CPU at that time. Spin-loop CPU availability measurements are not available for the time period in our study.

David Oppenheimer 2006-04-14