Conclusions and Future Work

Resource competition is a fact of life when time-shared distributed platforms attract a substantial number of users. In this paper, we argued that careful application placement and migration are promising techniques to help mitigate the impact of resource variability resulting from this competition. We also studied techniques to reduce measurement overhead for a placement and migration system, including using stale or predicted data.

Resource selection and application migration techniques complement the application-specific techniques that some distributed services employ internally to balance load or to select latency-minimizing network paths. Those techniques optimize application performance given the set of nodes already supporting the application, and generally only consider the application's own workload and structure as opposed to resource constraints due to competing applications. In contrast, this paper focused on where to deploy--and possibly re-deploy--application instances based on information about application resource demand and available node and network resources. Once an application's instances have been mapped to physical nodes, application-internal mechanisms can then be used on finer timescales to optimize performance. In general, application-internal load balancing, external service placement, or a combination of the two can be used to match application instance to available nodes based on resource demand and resources offered.

We expect our observations on placement and migration to generalize to other applications built on top of location-independent data storage; the commonalities we observed among CoDeeN, Coral, and Bamboo, all of which use request hashing in one form or another to determine where data objects are stored, provide initial evidence to support such an expectation. Given the popularity of content-based routing and storage as organizing principles for emerging wide-area distributed systems, this application pattern will likely remain pervasive in the near future. A major class of distributed application generally not built in this way is monitoring applications. A monitoring system could store its data in a hash-based storage system running on a subset of platform nodes, making its behavior similar to the applications we examined in this paper (indeed the SWORD system [15] does exactly that). But another common pattern for these applications is to couple workload to location, storing monitoring data at the node where it is produced and setting up an overlay or direct network connections as needed to route data from nodes of interest to the node that issues a monitoring query [13,27]. In such systems migration is not feasible. Likewise, data-intensive scientific applications that analyze data collected by a high-bandwidth instrument (e.g., a particle accelerator) may wish to couple processing to the location where the data is produced, in which case migration is not feasible. On the other hand, emerging ``data grids'' that enable cross-site data sharing and federation may reduce this location dependence for some scientific applications, thereby make computation migration more feasible for data-intensive scientific applications in the future.

PlanetLab is the largest public, shared distributed platform in terms of number of users and sites. Thus, we believe that the platform-specific conclusions we have drawn in this paper can extrapolate to future time-shared distributed platform used for developing and deploying wide-area applications that allow users to deploy their applications on as many nodes as they wish and to freely migrate those application instances when desired. A platform with cost-based or performance-based disincentives to resource consumption would likely result in smaller-scale deployments and more careful resource usage, but variability in resource utilization across nodes and over time should persist, in which case the usefulness of matching (and re-matching) application resource demand to node resource availability would too.

On the other hand, our ``black-box'' view of background platform utilization means our results cannot be easily extrapolated to environments that perform global resource scheduling (e.g., all application deployers submit their jobs to a centralized scheduler that makes deployment and migration decisions in a coordinated way), or in which multiple applications make simultaneous placement and migration decisions. Detailed simulation of platform-wide scheduling policies, and the aggregate behavior that emerges from systems with multiple interacting per-application scheduling policies, are challenging topics for future work. Nonetheless, our analysis methodology represents a starting point for evaluating more complex system models and additional placement and migration strategies. As future PlanetLab-like systems such as GENI [9] come online, and as wide-area Grid systems become more widely used, examining how well these results extrapolate to other environments and application classes will become key research questions.

David Oppenheimer 2006-04-14