;login: The Magazine
of USENIX & SAGE

 

isolation with flexibility

by David G. Sullivan

Faculty Advisor: Margo Seltzer, Harvard University

The USENIX Scholars Program provides support for student stipends, tuition and other expenses for students with exceptional research ability and promise. This article is an example of the kind of work resultings from this support.

A Resource Management Framework for Central Servers

In managing computational resources, an operating system must balance a variety of goals, including maximizing resource utilization, minimizing latency, and providing fairness. The relative importance of these goals for a particular system depends on the nature of the system and the ways in which it is used. For supercomputers running compute-intensive applications, for example, the primary goal may be to maximize throughput, while for personal computers used to enhance a single user's productivity, the chief goal may be to maximize responsiveness.

In today's computing environments, users increasingly compete for the resources of server systems, whether to access central databases or to view content on virtually-hosted Web sites. On such systems, fairness becomes a critical resource-management goal. Proportional-share mechanisms allow this goal to be met by providing resource principals (users, applications, threads, etc.) with specified resource rights. For example, customers who pay Internet service providers to virtually host their Web sites can be guaranteed shares of the hosting machine that are commensurate with the price they pay. Service providers who can make such guarantees can offer larger resource shares to principals willing to pay a premium for better quality of service.

Although its full promise is yet to be realized, thin-client computing is another domain in which proportional-share resource management is desirable. Administrators of such systems are often forced to host one application per server to provide predictable levels of service [Sun98]. Proportional-share techniques enable the consolidation of multiple applications onto a single server by giving each application a dedicated share of the machine.

A system that supports proportional-share resource management must isolate resource principals from each other, so that a given principal's resource rights are protected from the activities of other principals. To provide such isolation, a system must necessarily impose limits on the flexibility with which resource allocations can be modified. Such limits work well when the resource needs of applications are well known and unchanging, because a system administrator can assign the appropriate resource shares and leave the system to run. Unfortunately, these conditions frequently do not hold. Even if the applications' current resource needs are adequately understood, they will typically change over time. For example, as a Web site's working set of frequently accessed documents expands, the site may require an increasing share of the server's disk bandwidth in order to offer reasonable responsiveness. Moreover, it would be preferable if system administrators could be freed from the need to make detailed characterizations of applications' resource needs. Ideally, the applications themselves should be able to modify their own resource rights in response to their needs and the current state of the system.

Our work provides extensions to the lottery-scheduling resource management framework [Wal94, Wal96] that allow resource principals to safely overcome the limits on flexible allocation that proportional-share frameworks impose for the sake of secure isolation. Our extended framework supports both absolute resource reservations (hard shares) and proportional allocations that change in size as resource principals enter and leave the competition for a resource (soft shares). It also includes a system of access controls to protect the isolation properties that lottery scheduling provides. And our framework offers the means for applications to modify their own resource rights without compromising the rights of other resource principals. One of these mechanisms, called ticket exchanges, allows applications to coordinate their use of the system's resources by bartering with each other over resource rights. Our extended framework thereby provides isolation with increased flexibility: the flexibility to safely overcome the limits on resource allocation that standard proportional-share frameworks enforce.

Overcoming Upper Limits on Resource Allocation

The resource management framework developed for lottery scheduling is based on two key abstractions, tickets and currencies. Tickets are used to encapsulate resource rights. Resource principals receive resource rights that are proportional to the number of tickets that they hold for a resource; changing the number of tickets held by a resource principal automatically leads to a change in its resource rights.

Tickets are issued by currencies, which allow resource principals to be grouped together and isolated from each other. Principals funded by a currency share the resource rights allotted to that currency; currencies thus enable hierarchical resource management. Each currency effectively maintains its own exchange rate with a central base currency, and tickets from different currencies can be compared by determining their value with respect to the base currency (their base value). The more tickets a currency issues, the less each ticket is worth with respect to the base currency, and their total base value can never exceed the value of the tickets used to back the currency itself.

When a resource principal is funded by a currency other than the root currency, its resource rights can usually be increased by giving it additional tickets from that currency. However, because issuing more of a currency's tickets decreases their value, the resulting increase in the principal's resource rights is less than the increase in the number of tickets it holds. Indeed, no matter how many currency tickets a resource principal receives, the resource rights imparted by those tickets cannot exceed the overall rights given to the currency itself. This upper limit is essential to providing isolation. Without it, the resource rights of principals funded by other currencies could be reduced.

Despite the need for the upper limits imposed by currencies, these limits may often be unnecessarily restrictive. This is especially true on central servers, because the large number of resource principals a server must accommodate makes it difficult for a single allocation policy to adequately address their different and dynamically changing resource needs. Instead, some simple policy for ensuring fairness is likely to be used, such as giving users equal resource rights to divide among their applications, or allocating resource shares based on how much a user has paid.

Because certain resources may be more important than others to the performance of an application, applications may benefit from giving up a fraction of their resource rights for one resource in order to receive a larger share of another resource. We have therefore developed a mechanism called ticket exchanges that allows applications to take advantage of their differing resource needs by bartering with each other over resource-specific tickets. For example, a CPU-intensive application could exchange some of its disk tickets for some of the CPU tickets of an I/O-intensive application.

While ticket exchanges allow principals to obtain additional resource rights, they do so without compromising the isolation properties of the lottery-scheduling framework. Because the total values of the tickets outstanding for each resource are unchanged by an exchange, only the resource rights of principals participating in an exchange are affected by it; the resource rights of nonparticipants are unaffected.

Ticket exchanges enable applications to coordinate with each other in ways that are mutually beneficial and that may increase the overall efficiency of the system. Various levels of sophistication could be employed by applications to determine what types of exchanges they are willing to make, and at what rates of exchange.

Certain types of resource principals may primarily need extra tickets for one particular resource. For example, consider two Web sites that are virtually hosted on the same server. Site A has a small number of frequently accessed files that it could keep in memory if it had additional memory tickets for its currency. Site B has a uniformly accessed working set that is too large to fit in memory; it would benefit from giving up some of its currency's memory tickets for some of A's disk tickets.

Applications could also apply economic and decision-theoretic models to determine, based on information about their performance (such as how often they are scheduled per second and how many page faults they incur per second) and the current state of the system (such as how many tickets of each type are active in the system), when to initiate an exchange and at what rate. This determination could be made by the application process itself, or by a separate resource negotiator process that monitors the relevant variables and initiates exchanges on the application's behalf.

Applications are free to cancel exchanges in which they are involved. This allows them to take a trial-and-error approach, experimenting with a variety of exchange rates until they achieve an acceptable level of performance, and to adapt their resource usage over time. Our initial experiments suggest that this type of approach will be necessary for most applications, because of the interdependencies that exist between different clients' resource usage.

Applications or their negotiators initiate exchanges by sending the appropriate information to a central dealer. The dealer maintains queues of outstanding exchange proposals, attempts to match up complementary requests, and carries out the resulting exchanges. If an exchange request cannot be satisfied immediately, the dealer returns a message that includes any proposals with conflicting exchange rates (e.g., process A requests 20 CPU tickets for 10 memory tickets, while process B requests 10 memory tickets for 10 CPU tickets). In this way, applications or their negotiators can decide whether to modify their proposed exchange rate and try again for a compromise deal. In environments where isolation is less important, the dealer could be modified to carry out exchanges that processes propose on the processes themselves (e.g., to take away 20 CPU tickets from a process and give it 20 extra memory tickets in return).

Future research is needed to develop negotiators suitable for a wide variety of applications and environments. Among the questions that still need to be addressed are: How can a negotiator determine what exchanges are beneficial to its associated process? When should a negotiator accept a trade less desirable than the one it proposed? Will a system involving dynamic ticket exchanges be stable (i.e., how can oscillatory behavior be avoided)? Can general-purpose negotiators be written that avoid the need to craft one for each application? In addition, the central dealer must be designed to deal fairly with requests that have complementary but differing exchange rates.

Overcoming Lower Limits on Resource Allocation

Currencies can also impose lower limits on resource rights. These limits materialize when only one of the resource principals funded by a currency is in active contention for a given resource. In such circumstances, that principal receives all of the currency's resource rights, no matter how few tickets have been used to fund it.

As a result, currencies make it difficult for the lottery-scheduling framework to support the semantics of the nice utility found on conventional UNIX systems. For example, a user running a lengthy, CPU-intensive job may reduce its CPU funding as a favor to other users. But if the other tasks funded by the same currency are all idle, the CPU-intensive job will still get the currency's full CPU share. The user would presumably be allowed to decrease the number of tickets backing the user currency itself, but then other processes funded by that currency would also be affected when they became runnable.

While upper limits are necessary for providing isolation, lower limits are an undesirable side-effect of isolation. These limits could be overcome by funding CPU-intensive applications directly from the base currency. However, for reasons of security, access controls on currencies would presumably be used to prevent unprivileged users from issuing base-currency tickets.

To circumvent this restriction, our framework allows a user to issue base-currency tickets as long as the total value of currencies owned by that user never exceeds some upper bound. In this way, users can give up a small amount of their currencies' funding and then issue that same amount from the base currency to fund resource-intensive jobs. This approach leaves the user's total resource rights unchanged (preserving the isolation of other users), and the new job can run at a reduced priority without crippling the user's other applications.

Current Status

A paper that describes our initial ideas for extending lottery scheduling appeared in the 1999 HotOS Workshop [Sul99]. Since then, we have implemented the extended framework in VINO 0.50 ([Sel96], <http://www.eecs.harvard.edu/~vino/vino>), and used it to manage CPU time, memory, and disk bandwidth. A paper that describes our extended framework and its implementation in more detail appears in the 2000 USENIX Annual Technical Conference Proceedings <http://www.usenix.org/events/usenix2000/technical.html>. This paper includes descriptions of the allocation and scheduling mechanisms we have used in our prototype, along with experiments which demonstrate that they provide accurate proportional-share guarantees and effective isolation. It also presents results of experiments which test the impact of ticket exchanges on two sets of applications, demonstrating that the extended lottery-scheduling frame-work enables server applications to achieve improved performance under realistic usage scenarios.

Our work thus far makes several contributions. First, we have extended the lottery-scheduling framework to securely support the management of multiple resources, providing both soft and hard resource shares. To our knowledge, our prototype is the first implementation of a proportional-share framework to support both types of shares for multiple resources. Second, we have pointed out an important tension between the conflicting goals of secure isolation and flexible resource allocation--a tension that exists in any proportional-share resource management framework (e.g., [Ban99, Brun98]), not just lottery scheduling--and we have presented mechanisms that allow for more flexible allocation while preserving secure isolation. Third, we have illustrated the value of a system that can support dynamic adjustments to the resource allocations applications receive.

In order for our extended framework to be fully effective on large central servers, more work needs to be done to develop negotiators that can intelligently carry out ticket exchanges on behalf of users and applications. We have recently begun work on this part of the project. Developing such negotiators will be a challenging task, but one with potentially significant rewards.

REFERENCES

[Ban99] Banga, G., Druschel, P., Mogul, J.C., "Resource Containers: A New Facility for Resource Management in Server Systems," Proc. of the Third Symposium on Operating Systems Design and Implementation, February 1999.

[Bru98] Bruno, J., Gabber, E., Özden, B., Silberschatz, A., "The Eclipse Operating System: Providing Quality of Service via Reservation Domains," Proc. of the USENIX 1998 Annual Technical Conference, June 1998.

[Sel96] Seltzer, M., Endo, Y., Small, C., Smith, K., "Dealing with Disaster: Surviving Misbehaved Kernel Extensions," Proc. of the Second Symposium on Operating System Design and Implementation, October 1996.

[Sul99] Sullivan, D., Haas, R., Seltzer, M., "Tickets and Currencies Revisited: Extending Multi-Resource Lottery Scheduling," Proc. of the Seventh Workshop on Hot Topics in Operating Systems, March 1999.

[Sun98] "Solaris Resource Manager 1.0: Controlling System Resources Effectively: A White Paper," <http://www.sun.com/software/white-papers/wp-srm/>.

[Wal94] Waldspurger, C.A., Weihl, W., "Lottery Scheduling: Flexible Proportional-Share Resource Management," Proc. of the First Symposium on Operating System Design and Implementation, November 1994.

[Wal96] Waldspurger, C.A., Weihl, W., "An Object-Oriented Framework for Modular Resource Management," Proc. of the Fifth Int'l Workshop on Object Orientation in Operating Systems, October 1996.  

 

?Need help? Use our Contacts page.
Last changed: 28 nov. 2000 ah
Issue index
;login: index
USENIX home