################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the USENIX Summer 1993 Technical Conference Cincinnati, Ohio June 21-25, 1993 For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org Optimizing Unix Resource Scheduling for User Interaction Steve Evans, Kevin Clarke, Dave Singleton, Bart Smaalders SunSoft Inc. Abstract Techniques for improving system responsiveness for interactive end users of Unix workstations are explored. After a discussion of the current state of resource scheduling, a model is presented in which dynamic input from the human user is combined with data from user interaction software to supply a centralized manager with the information necessary to determine what processes are involved with interacting with the user at any given moment. This service then communicates this process set information to the kernel, which uses it to manage memory and CPU resource allocation on the behalf of the user. Experience with a prototype of this environment is reported. An argument for an interactive scheduling class is made, along with other infrastructure changes needed to take advantage of it. 1.0 Introduction Most implementations of the Unix operating system are not oriented to bring the power of today's workstations to the single dedicated end user's full advantage. In particular, most Unix resource management policies are oriented toward overall system throughput and fairness, not dedicated end user interactive responsiveness. In typical implementations, the portions of the kernel responsible for resource allocation cannot take into account higher level data describing user/computer interaction. The paper describes the principal factors leading to poor interactive performance in modern Unix workstations. It outlines a user/computer interaction model that may be applied to adjust current and (near) future resource requirements in a way that favors interactivity. When this model is used to allocate system resources, a significant improvement in end user responsiveness is measured compared with traditional resource allocation algorithms. This paper does not focus on creating a traditional real time operating environment, as has been the subject of numerous papers on deadline vs. priority scheduling, time critical computing [Khanna92], etc. Instead, the paper is about changing the operating system's focus from throughput to interactivity. 2.0 Time-sharing Today Unix grew up trying to satisfy the needs of rooms full of university students editing, compiling programs and running troff jobs from character terminals. The technologies developed to deliver computing resources in that environment have been transferred to dedicated single user workstations. This section describes how traditional Unix time-share scheduling affects end user interactivity in workstation environments. 2.1 Problem Significance The problem of quick and predictable interactive responsiveness within Unix is significant for several reasons. 1. Tasks Expand to Fill Available Resource - No matter what the CPU power and available memory of a system, users, system vendors and application writers inevitably find ways to load the machine to the point at which interactivity suffers. 2. PC User Expectations - As Unix based and PC based operating environments are more frequently compared in the marketplace, PC users are not tolerant of user interactivity inconsistencies that more traditional Unix users can understand and explain away in terms of time-sharing notions like throughput. 3. Human/Computer Interface Should be Smooth - Interactive responsiveness is the most user visible aspect of a system. The user is sitting in front of the monitor seeing and experiencing responsiveness first hand. Breaks in interactivity jar the user away from the task at hand and into thinking about what is happening with the computer. The need for good interactive response is heightened by the proliferation of direct-manipulation user interfaces such as sliders, drag & drop, etc. 4. Hungry Software - With current resource allocation algorithms, once the available memory becomes scarce, interactive performance decreases precipitously. Demands for interactive working set for the typical windowed environment have increased significantly, exacerbating the responsiveness prob- lem. In particular, the following innovations have contributed to the problem: increasing use of layered architectures, object-oriented languages, multiple processes providing client/server services even to the single end user, the use of inter-application messaging, multimedia, etc. 5. Throughput is Important - In contrast to many simpler systems, Unix workstation users expect to be able to perform many tasks simultaneously. Background compiles and links, the sending and receiving of mail, printing are all expected to run quietly in the background and not to significantly impact the "feel" or interactivity of the system. Yet at the same time, the user expects these tasks to complete quickly. 2.2 Current Practice Current kernel memory and CPU resource control mechanisms and policies are derived from the time-sharing environment. They were born in an environment in which many terminal users were looking for good keyboard response and fair assignment of resources for background processing. 2.2.1 Processor Cycle Scheduling Scheduling priority is the mechanism for management of CPU cycles. The scheduler factors in how much CPU time a process has used recently, how long a process has been waiting to run, a user preference as expressed by the "nice" value for the process, and other factors when determining the scheduling priority of a process. The result is that a process' scheduling priority moves like a yo-yo. Processes creep up slowly in priority as time passes when they are not allowed to run. They quickly decay in priority as they chew up cycles. There are tweaks that allow processes waiting for user input to zoom up in priority, only to decay very quickly. The kernel is actually trying to figure out which processes are interactive in nature and trying to favor them. However, it is doing this based on the assumptions germane to editing at character terminals, which are not true for many users today. Processor scheduling is completely in the hands of the kernel. The only controls that are available to users, other than the super user, are the ability to reduce the priority of a process via nice(1). This remains an effective tool for those users willing to use it. However, most users are unwilling to engage in this task. What is more, many of the processes in the system are inconvenient or unavailable for nice(1) or renice(8) style control: daemons are owned by root, services and subprocesses are not easily located without digging around to find process ids. In addition, many processes are engaged in both interactive and batch style processing and thus have optimal nice values that change much too quickly for human management. The only "control" available to a programmer that wants to improve interactive processing is to do very little processing at each user action. The kernel tweaks that favor such processes, mentioned above, were tuned based on the assumption that about 10,000 machine instructions would be executed by a combination of a single process and the kernel to handle a single keystroke. This tuning assumption has not changed in most Unix implementations even though the number of instructions is now many times higher, thanks to richer GUIs, and the number of processes is higher due to the multiple process architec- ture of the X Window System [Scheiffler86]. In Unix System V, Release 4, there has been some progress in providing an architecture in which alternative scheduling approaches can be implemented. Multiple scheduling classes may be written by Unix ven- dors. To date, the default time-sharing scheduling class described above and a real-time scheduling class are available to user processes [AT&T 90]. However, the time-sharing class still lacks some of the optimizations made at Berkeley to improve interactive performance. 2.2.2 Memory Scheduling Early Unix memory scheduling started out with swapping, executable sharing and administrator imposed memory limits as effective mechanisms for utilizing memory in a time-sharing environment. Swapping was used as the memory sharing mechanism. Executable sharing was very effective in reducing memory requirements because most of the time-sharing users were using the same small set of small programs. A fixed-size buffer cache constrained the impact of large sequential i/o operations on overall memory consumption. Presently, many Unix systems are used as single user workstations rather than time-share systems. Here, the application mix has changed from many copies of the same small programs to many different and large programs, often infrequently accessed. The kernel has changed to rely on global page replacement as the memory sharing algorithm, with swapping used for demand reduction. There are now additional demands on memory in the form of page caches and shared memory pages[Gingell87], and the memory-based filesystem, tmpfs[Gingell87]. In SVR4, memory allocation policy is not tied to process scheduling priority. The page replacement mechanism attempts to maintain a list of free pages that are immediately available for use. Unfortunately, there is no concept of targeting non-interactive process pages when building this free list nor is there the notion of reserving the pages on the free list for interactive process demands. Memory is allocated on demand equally to all classes of processes. This has made it easier for unusually large memory demands by a single process to dramatically reduce overall performance. 2.3 Barriers to Improvement There are practical concerns and philosophical reasons why more attention hasn't been focused on problems of interactive responsiveness. The practical concerns include: 1. Time-sharing is Useful - Even though systems appear to be dedicated to a single user, networked Unix boxes are actively engaged in interactions with other machines. It is very reasonable to have time-sharing oriented resource policies as the default. 2. Prevention of Resource High-jacking - Even on a single user machine, it is bad for overall system responsiveness if user processes are able to easily say, "make me more important". Good play-by-the- conventions applications suffer at the expense of the self centered ones. As abuses get out of hand, soon all application writers are forced to break the conventions and the situation is back to getting the user directly involved with complicated resource management. 3. Insulating the Kernel from User Level Abuse - The kernel has to protect itself and the machine from abuse by intentional or inadvertent resource hogs. If too much rope is given user processes, then effective use of the machine can be compromised; fair, but jerky, is better than having important programs being starved for resources to the point of corrupted functionality. Kernel folks also have to protect themselves from chasing "bug" reports about poor system performance brought about by abusive user programs. The philosophical reasons include: 1. Insulate the Programmer - There has been a theme running through Unix that the programmer should not need to be involved with details of resource management. For example, C library support for grouping of dynamic memory allocations together in memory is not part of the definition of Unix. If it were, programmers would have a simple but powerful tool with which to help themselves improve localization of memory references that could lead to a dramatic reduction in working set for many programs. 2. Don't Tie the Kernel's Hands - The more influence that user processes have over kernel resource management the more constrained the kernel is when improving its own internal algorithms. The actual problem today is that the kernel simply does not have enough information with which to adequately improve these algorithms. 3. It Isn't Broken - Some Unix programmer don't consider the current situation to be broken. They don't have interactive benchmarks for which they have to find ways of improving results. They run with fast, large memory machines and are smart enough to nice compiles. Engineers must ensure that they test interactivity on low end machines in order to satisfy that important portion of the customer base using those machines. 3.0 Interactive Scheduling A solution to improving system responsiveness should: 1. Dynamically identify processes that are involved with user interactivity. 2. Favor CPU scheduling of interactive processes. 3. Favor memory allocation towards interactive processes. There is simply not enough information in the kernel upon which to base a predictable and understandable algorithm. Any kernel-only solution that demonstrated some success along these lines would probably incorporate too much knowledge about what is going on with user processes. The solution would not generate a mechanism that would be generally applicable as user level architecture evolved with time. As a result, a direct approach is described in this paper in which: 1. There is a user process agent that is cognizant of user actions and can translate that behavior into user intentions. 2. Scheduling hints appropriate to carry out those intentions are formed and communicated through new kernel interfaces. 3. The kernel is augmented to respond to this new information. 3.1 The User Model Central to identifying interactive processes is the abstract notion of user modelling. User modelling encapsulates the best information available about what the user has done, is doing and may be about to do, and combines that information with data from processes that are responsible for interacting with the user. The more relevant information fed into the user model, the greater its ability to identify interactive processes. The model can be subdivided along the following lines: 1. User Behavior and History - Collection of information from the user. 2. Programmatic Input - Collection of information from programs. 3. Interactive Process Identification - Digestion of user focus input and programmatic input to form interactive process set identification. The point in the system responsible for this job is the user interaction manager. 4. Resource Scheduling - Allocation of resources to interactive processes by the kernel based upon user interaction manager hints. User behavior & history => User Interaction Mgr => Resource scheduling Programmatic input => FIGURE 1. User Model Control Flow 3.2 User Behavior and History There are two general kinds of interactivity information that can be collected from the user: 1. Focus - This information is collected from the user as human interactions take place with the system. The information communicates what the user is most likely focusing upon. Input device focus, e.g., keyboard and pointer focuses, if different, are the most important pieces of data. Tracking of mouse trajectory can be used to anticipate focus changes. Overall user activity monitoring can be used to detect if the user is even in front of the display. 2. History - Historical information can be collected for predictive purposes. Users have interaction patterns that can be exploited in terms of pre-allocating resources. The system can notice what gets used first after a period of inactivity and making sure that it has the resources necessary to run instantly. Noticing common pairs of drag sources and drop targets would allow early identification of the drop target process as interactive. To overcome application start-up wait time, the system could pre-load the application that the user is likely to use next based on historical patterns. Also, the user could supply the system with tuning information about how to behave when trying to meet interactive needs. One could imagine a knob that ranged from "just like today" to "I value interactivity above all else" with respect to time-sharing vs. interactivity. Another knob might be used to balance a user's preference for interactivity vs. other resource demands. 3.3 Programmatic Input Running programs supply information about which processes are interactive candidates. Running programs may also supply hints about their interactive status. More specifically: 1. Process Identification - The user interaction manager needs to be dynamically updated about the makeup of interactive process sets. These sets need to be associated with windows on the screen, which in turn are used as the granularity of focus control. In addition, certain processes are always required to maintain satisfactory interactive response. Examples include the X server and the window manager. In some cases the processes interacting with the user will change while the user interacts with a single window; running vi(1) in an terminal emulator window is a good example. 2. Application Hints - Programs can be the source of valuable interactivity state information. For example, noting that an application is no longer visible to the user or has become iconic may be reason enough to temporarily strip a program of its interactive state. Applications may also report their expected resource utilization patterns; madvise(3) is a good example. 3.4 User Interaction Manager The user interaction manager is responsible for analysis of user behavior and programmatic input to form interactive process sets. It should have the following properties: 1. Centralized - The purpose of the user interaction manager is to form a model of what the user is doing, associate those actions with system components and communicate recommendations to the kernel. If the information is not centralized, it is not likely that a complete picture of the user can be correctly constructed. Thus, a model of a user interaction manager per CPU on a multiple processor host is not viable. Another reason that having the user interaction manager centralized is that one might want to move beyond a binary indication of interactive and non-interactive. One could imagine having differing interactive priorities based upon some sort of LRU algorithm or some other weighting scheme. 2. Secure - The user interaction manager should be thought of as a system daemon, running as root, that is trusted to make reasonable requests of the kernel, even for processes not necessarily owned by the user being modelled. One of the main reasons that it is not completely baked into the kernel is due to the importance of using window system level information. 3. Efficient - The user interaction manager had better be efficient in terms of its own CPU and memory requirements or it is possible that it could make interactivity worse instead of better. 4. Distributed - Even though the user model needs to be centralized, there is a control portion of the user model manager that should be distributed. For example, in the situation where an application is running on another host from where the user is interacting with it via the X Window System, the user interaction manager should be able to make interactive requests to the host running the application, on the behalf of the user, when the application has the user's focus. 3.5 Resource Scheduling Interactive process information supplied to the kernel allows it to discard or recycle resources, concentrate current resources and pre-load soon-to-be needed resources. Resources involved include CPU cycles and memory. The kernel's responsibilities include: 1. Favoring Interactivity - The only real requirement is that the kernel favor interactive processes over time-sharing processes; how it does so is its business. 2. Interactive Process Identification - As a practical matter, part of the user interaction manager's functionality may be best implemented inside of the kernel, especially for "legacy" situations where the kernel actually has enough information to detect interactive processes on its own. For example, in the prototype, the kernel transfers interactivity to whatever process is in the terminal`s controlling process group. 3. Starvation Handling - While maintaining interactive response is important, other tasks need to complete as well. Care must be taken that CPU intensive interactive processes do not prevent eventual completion of other important background tasks such as receiving mail, spooling print jobs, etc. 4.0 Experience A prototype exists that incorporates enough of the aspects of the ideal solution discussed above to demonstrate the overall concept. Components of Solaris 2.2 are augmented for the prototype, including the GUI toolkit, the window manager and the kernel. 4.1 User Focus Input In the prototype, keyboard and pointer focus transitions are tracked by the window manager, which also serves as the user interaction manager. For systems using "focus-follows-mouse", the keyboard and pointer focuses are the same. For systems using "click-to-type", separate keyboard and pointer focuses are maintained. This is used to support using menus from another applications or the root window while the keyboard focus is left elsewhere. Thus, the mouse becomes a magic wand with which to dispense interactive fairy dust on running applications. The window manager conveys it's knowledge about which processes have gained or lost interactivity to the kernel via ioctl(2) calls to a pseudo device called the interactive driver. In addition to dynamically identified processes, the X server and window manager are always identified as interactive processes. There are tuning parameters that control kernel resource management, but the end user is not expected to modify them in the prototype. Nothing is done to collect or exploit user interaction patterns. This is clearly an interesting area, e.g., typical application usage patterns during software development such as edit, compile, debug are completely cyclical in nature and hence could benefit greatly from anticipatory scheduling. 4.2 Identifying Interactive Processes In the prototype, process identification information is communicated across the X Window System connection and maintained on the top level windows owned by the X client process. The GUI toolkit sets the process id of the client process. Widgets that manage subprocesses, e.g., the terminal emulator widget, add subprocess ids to the property. The window server generates EnterNotify and LeaveNotify events for tracking the pointer, FocusIn and FocusOut events for tracking keyboard focus, PropertyNotify events for tracking changes to interactive process identification properties and VisibilityNotify events for tracking visibility of windows. The interactive manager is provided with enough information to track client status changes. The kernel identifies interactive processes in two ways. First, the user interaction manager tells the kernel which processes to mark as interactive. This is done through the interactive driver. The user interaction manager calls the driver with the pid of the process it wishes to make interactive. The driver marks this process as interactive, boosts its priority and gives it a large nice value (-20). The user interaction manager can also make a process non-interactive which has the opposite effect, the processes priority is set artificially low and its nice value is set to the default (20). Secondly, the prototype kernel transfers the interactive status of a session leader (typically a shell) when another process group becomes the controlling process group for the attached terminal. In this way, as various processes are started or stopped, placed in the background, etc., interactive status is always conferred on those processes that would be reacting to user input. This allows users to place jobs such as make, etc. in the background to reduce their impact on the rest of the system, or leave them in the foreground and maintain their high priority as long as the keyboard focus remains in that window. If a background process is brought back to the foreground, the kernel sets it to be an interactive processes again. Other than the foreground process management, the interactive state of a process is not inherited by children. While one can imagine cases where it would increase interactivity to have a child process inherit interactivity status, the prototype has taken a conservative approach. It is important to prevent the proliferation of interactive subprocesses that are unknown to the user interaction manager. This policy keeps the kernel out of the business of interactive process guessing and puts a greater burden on the user model infrastructure to identify interactive processes. Although not done in the prototype, the property that contains the interactive process information is suitable for augmentation by application code with process ids that the GUI toolkit doesn't know about. For example, a calendar front end could discover the process id of its calendar service and add that process to the calendar's interactive set. As another example, the window manager could include the ToolTalk process id in its list of interactive system processes. That way, ToolTalk doesn't become a scheduling bottleneck in the flow of control between multiple processes engaged in inter-application messaging. Another possibility for improvement in the prototype lies in augmenting the window's process information with internet addresses. Doing so would ensure that the user interaction manager wouldn't confuse a remote process for a local one. In addition, the interactive manager could someday make resource requests of the remote host. 4.3 User Interaction Manager The user interaction manager could be a separate process that uses window server notification events, along with XGetInputFocus and XQueryPointer to provide window manager independent functionality, at some additional overhead. But, for the purposes of the prototype, the window manager is used as the implementation vehicle for the user interaction manager. The window manager is convenient because it already is: 1. Centralized - It already is a process that is dedicated to managing user interactions with the system. 2. Secure - Well, almost. Window managers can be allowed to run as root by changing the way that they fork programs so as to run them as a non-root user. For the prototype, the kernel allows non-root pro- cesses to adjust its interactivity assignments. This could also be handled by having the owner of the login terminal become the owner of an interactive device used for the duration of that session. 3. Efficient - The window manager is a particularly efficient place in the system to implement the user interaction manager: it knows about keyboard and pointer focus, it already watches for changes to top level window properties, it is involved with interesting interactivity related events like making windows iconic, and it caches information about the window tree layout. 4.4 Resource Scheduling The prototype interactive kernel has been enhanced to simulate an interactive scheduling class. It schedules both CPU and memory together, giving precedence to interactive processes. The kernel allows all processes to equally share memory and CPU as long as a minimum amount of memory is immediately available for interactive processes. As long as this minimum amount of memory exists the user can switch between applications and faulting in of the pages associated with the application with the focus is relatively quick. Waiting for the page daemon to identify dirty pages and do disk writes to flush them is relatively expensive. 4.4.1 Interactive Process Scheduling The prototype kernel gives interactive processes precedence by marking their interactive state in the proc structure, boosting their user priority, and setting their nice value to a -20. As the time-share scheduler recalculates priorities once a second, the -20 nice value out weighs the other factors and the interactive process stays at a fairly high priority [Leffler89][Bach86]. The prototype kernel extends the allowed nice values so that the prototype may artificially set an interactive processes priority high and keep it high. The prototype kernel treats non-interactive processes fairly, within the constraints of priority recalculation, until free memory gets scarce. As an aside, another modification to the time-share scheduler has been to make it act more like the BSD scheduler, that is, child process' priority and time usage at the time of a fork are a function of their parent's priority and time usage. This change stops processes from being able to completely consume the machine by forking an inordinate number of child processes that immediately run with a high priority and no time accounted to their quantum. This is not so much an interactive problem as a generic SVR4 problem. SVR4 treats the dispatch table as a state machine and does not preserve the relationship between parent and child processes as BSD does. Executing for i in `find /usr -print` do echo $i done shouldn't bring the system to its knees. The time-share scheduler does most of the work for the interactive scheduling class. The user interaction manager and interactive driver reset priorities and nice values and let the time-share scheduler function as designed. The prototype has extended the time-share class and superimposed an interactive model on top of it, not actually implemented an interactive class scheduler. 4.4.2 Non-interactive Process Throttling The prototype has a prejudice of memory over CPU. It doesn't matter how high a process' priority is if there is no free memory available. To prove this point, processes related to interactive use were set to run as real-time processes. They were scheduled to run within a bounded time period, but they had to wait for memory once they were on the processor. There was not a substantial boost in interactivity. The goal of the interactive kernel is to keep enough free memory around to allow an interactive process to immediately run. Finding enough free memory just at the time of need is a relatively expensive operation. The kernel throttles a non-interactive process' CPU and memory usage when free memory falls below a system minimum. This value is tunable and is set in our prototype to be 1/16 of the amount of physical memory available to user processes. Throttling occurs only when memory is tight. This happens in two ways. First, each non-interactive process is preempted in favor of interactive processes whenever it exits a system call. The second throttling is for memory hogs. If the non-interactive process is a memory hog, determined by looking at the number of valid translations the address space is using, it is preempted off the processor (also at system call exit), it has its priority lowered and its nice value is adjusted to force it to continually stay in a low priority range. A better approach would be to just gather some pages from the memory hogs, but the page daemon doesn't know which process owns which pages. While memory is tight, non-interactive processes can use no more than 1/32 of the amount of physical memory available to user processes without being labelled a memory hog. A better implementation would involve a calculation based on the amount of physical memory and the number of extant processes. This calculation was not implemented due to time constraints and because the performance improvements already gained from other modifications were quite dramatic. Doing throttling at system call exit has proved very successful for the prototype. However, in a production system, one would probably want to the same calculation at the expiration of a scheduling quantum to catch pathological cases like mapping a large amount of memory and touching all the pages without doing any system calls. 4.4.3 Non-Interactive Process Starvation In the prototype, there was a concern that there might be starvation of non-interactive processes with the throttling methods used. In fact, this was not noted with the set of lifelike loads and benchmarks used. No matter how low a process' priority was set, it would still run. There are just too many milliseconds where the interactive processes are either sleeping or polling waiting for input. During these lull periods the non-interactive processes get to time-share the system, and until memory throttling was working correctly, these low priority non-interactive memory hogs could steal all of physical memory, even with the lowest possible priority. If one could demonstrate starvation problems, the distance between interactive and non-interactive nice values could be reduced until the problem effectively went away while maintaining the majority of the interactive benefit. 5.0 Performance In this section, the performance of the prototype is measured under load and characterized by both objective measurements and subjective observation. 5.1 Characterizing Non-interactive Loads Assuming that anything characterized as a background load would peg CPU utilization, minus i/o wait time, the factors that characterize loads are related to memory utilization: 1. Page Demand - Number of different pages touched per unit of CPU virtual time. This reflects memory requirements with respect to CPU load. 2. Page Reuse - The percentage of time a page is part of the page demand. For pages referenced once, this approaches 0, whereas pages accessed frequently approach 1. As used below, it refers to the aver- age value for all pages accessed during the applications lifetime. Using this characterization, extremes are typified by the following: 1. Small CPU Bound Job - Low page demand, high page reuse. This case is well-handled by the time-share scheduler; these processes decay very quickly to a low priority, and do not cause significant interactivity problems. Ray-tracing or extended-precision math programs are good examples. 2. Multiple Process Jobs - Low page demand, low page reuse. This is the case where a parent process creates many short-lived processes which perform some small task and then exit, typical of shell scripts, some make environments, etc. If the standard SVR4 time-sharing scheduler is fixed to act like the BSD scheduler with respect to charging child processing time to parent's doing a wait(2V) this is not a major problem for interactivity. 3. Big CPU Bound Jobs - High page demand, high page reuse. Typical examples are simulations, multimedia or large finite element programs. These type of processes do not coexist well with interactive use on small systems unless sufficient memory is available to contain the interactive processes working set. 4. Streaming Memory Jobs - High page demand, low page reuse. These are typical of programs performing a great deal of sequential disk i/o or creating large files in /tmp. Examples include copying large amounts of data via NFS, creating large temporary files in /tmp, and some compiler optimization passes. These are currently often pathological cases; the rapid page demand causes these processes to force all other processes out of memory. Some work has been done to reduce this [Mcvoy91], but this problem has not been solved. This can be prevented if either the offending process pages against itself or if the process is throttled to the point where the page daemon would grab a substantial amount of its pages. The latter is done in the prototype. With the appropriate virtual memory resource scheduling, streaming memory processes can run effectively without destroying interactivity. When running performance tests, a loading mechanism is used that scales between high and low extremes for each type of load. The load is a process that can fork increasing numbers of child processes that perform increasing amounts of CPU activity while modifying more memory and exiting after a given amount of time. CPU and memory utilization are linked to increase at the same time in order to reduce the complexity of the analysis, to better simulate real world loads and because more repeatable results are obtained. 5.2 Characterizing Interactive Activity In order to measure interactive responsiveness, one has to pick meaningful interactive activities. The processes active in providing interactive response in X Window environments are the X server, window manager and client process, along with any subprocesses. In most cases, each process does a small amounts of CPU intensive work, touching a typically large number of pages, and then blocks on a system call. For even a fast typist, the various processes receive on the order of 1 char every 100 milliseconds. It is this blocking that makes boosting the CPU priority of processes stopped in the kernel so effective in boosting user interactivity in the face of CPU bound background jobs, and makes these processes so vulnerable to having their pages stolen in the case of memory intensive background loads described above. The following scenarios are tested: 1. Steady Interactive Activity (Typing Test) - While the background load is running, the user is typing steadily. The load increases with time. The kernel and window server are instrumented to capture the elapsed time between keystrokes and character echoing. Character echoing should happen within 50 milliseconds to maintain the illusion of "instantaneous" action. 2. Heavy Interactive Activity (Scrolling Test) - While the background load is running at a constant rate the user types a command, dd if=/etc/termcap bs=100, that causes a large amount of scrolling in the terminal emulator. The test is rerun under various constant loads. A stop watch measurement of the elapsed time to complete the terminal output is made. Degradation of scrolling performance with increasing background load should be smooth and minimal. 3. Changing Interactive Focus (System Test) - While the background load is running at a constant rate, a script is run that drives a selection of GUI tools through a series of typical user interactions using synthetically generated input events. Numerous keyboard and pointer focuses change occur during the run. Elapsed time is measured. The test is run with control and prototype software to measure any user interaction management overhead. 4. Warm Start of Interactive Activity (Warm Start Test) - While the background load is running at a constant rate and the user is typing at a constant rate, the user stops for 30 seconds and then begins to type again. Elapsed time is measured for the first keystroke to appear. Response to the first character may be delayed, but overall performance should be regained almost immediately. 5.3 Test Environment All tests were run on a SparcStation IPX with 16mb main memory and a local disk. The test machine was running Solaris 2.2. Typing latency measurements were gathered using an internal tracing facility [Bon- wick in progress]. Trace points were inserted in the keyboard driver and the X window server. 5.4 Benchmark Results Here are the results of the tests. The control system represents the state of the software before being modified for the prototype. TYPING TEST 10 | o o o o Delay in | o 1. | o Seconds | o .1 | o |x x x x x x x x x x x x 0. |_________________________ 0 5 10 15 20 Load Average x - Prototype o - Control SCROLLING TEST 15 |xxxxxxx Delay in | o xxxxxxxxxxxxxxxx 10 | o Seconds | o 5. | o | o 0. |______________________o__ 0 5 10 15 20 Load Average FIGURE 2. Typing and Scrolling Test Results In Figure 2, the gray lines indicate results in the control system and the solid lines are with the prototype system. In both tests, the prototype can maintain steady interaction under various loads. If the user stops to think too long, and the memory load is high enough, this then becomes similar to the Warm Start test. The Warm Start test showed that, on the average, there was a delay of .1 seconds before the first character was echoed for the prototype system vs. 1 second for the control system. System No Load Background make Heavy Synthetic Load Prototype 49 56 96 Control 50 89 789 TABLE 1. System Test Benchmark Results (elapsed time in seconds) In Table 1, the System test results are shown for various loads. The prototype holds up well under the make(1) load and very well under the heavy synthetic load. 5.5 Subjective Evaluation User of the system were generally pleased with the prototype and reported that their perception of the system was that it was faster all around; windows come up faster and things in general seem snappier. There were some interesting observations: 1. Dueling Demos - Two simpleminded, flat-out, non-interactive graphics demos showed uneven bursts in behavior. The window server had a tendency to completely drain the request queues before either client process had a chance to run and add anything more. When the server did wait for more requests, a client had roughly a single scheduling quanta in which to fill the request queue. The server then drained the request queue and around it went again. 2. Asked More of the System - People reported being able to run a few more applications with the prototype before they started feeling paging effects similar to the non-prototype environment. 3. Consistent Response - Users were surprised by the levels of interactivity on heavily loaded machines. Although pathological loads can still cause problems, responsiveness during typical compilation and other software development tasks was much improved over the control system. 6.0 Future Work Throughout the paper, mention has been made about work that was not actually tested in the prototype. Of these, there are a variety of areas of interest that are generic to multiple scheduling classes, i.e., similar issues exist for real-time and interactive scheduling classes: 1. Page Replacement - Global virtual memory page replacement is adequate for a machine running time-sharing. In single user workstations, some mechanism needs to be made available to apportion memory so as to maintain interactivity. The virtual memory system is going to have to utilize priority and scheduling information, in addition to LRU information, in order to provide the highest levels of interactivity. This also implies that the virtual memory system needs to become aware of which pages are used by which processes. 2. Threads - Scheduling in the presence of user level threads needs further analysis. 3. Starvation - Detection and alleviation of pathological starvation of lower priority scheduling classes by higher priority scheduling classes could bear further investigation. A fair-share mechanism, in which resources are allocated in pie wedges per share group, could more directly control resource allocation [Essick 90]. 4. Cooperation - The implications of cross network cooperative scheduling is an interesting area, particularly concerning security issues. Areas that are more specific to interactive resource management include: 1. Broaden Programmatic Involvement - Other conduits for programmatic interactive process notification should be explored so that services could report themselves as interactive to the user interaction manager, instead of having the user interaction manager determine process information. 2. Refinement of User Model - Expanding the input to the user interaction manager by exploring some of the ideas put forth in the third section should continue. 3. Learn from Prototype - So that all windows are registered, Xlib may be a better place in the system to stamp each window with a process id. Also, implementing the user interactive manager as an extension to the X window server would allow the user interaction manager independence from window manager choice and could make better internal scheduling decisions armed with interactive client connection data. 4. Attitude Adjustment - There has been considerable resistance to providing more explicit control of machine resources to application developers due to the limitations this places on the kernel. The pre- ferred interface is for applications to report their own expected behavior to the kernel, and for the kernel to take actions as appropriate. Thus, the interactive driver interface merely communicates a process' interactive state, rather than altering priority, etc. To promote adoption of the technologies discussed in this paper, the following should occur: 1. Interactive Scheduling Class - This class should actually be implemented and should follow the model provided by existing time-sharing and real-time classes. 2. ICCCM - A new inter-client communications convention should be adopted for the X Window System that should specify a standard for attaching interactive process ids to parts of the window tree. These would provide a baseline set of information on which any user interaction manager could build. 7.0 Conclusions Something between time-sharing scheduling and real-time scheduling is needed for Unix. Interactive scheduling addresses that need. The combination of an interactive scheduling class in the kernel, along with a user interaction infrastructure in user-land, suggests how it might be designed and implemented. Modern systems have tremendous processing power. The Unix community needs to harness that power on the behalf of the end user by providing mechanisms and policies dedicated to split-second user responsiveness. The users expect it. It is up to us to deliver. 8.0 References [AT&T 90] "Unix System V Release 4 Internals Student Guide", Vol. I Unit 2.4.2. AT&T, 1990. [Bach86] Maurice J. Bach, "The Design of the Unix Operating System," Chapter 8. Process Scheduling and Time, PrenticeHall, 1986. [Bonwick In Progress] Bonwick, J., "Kernel Tracing in SunOS 5.0", in progress. [Essick 90] Essick, R., "An Event-based Fair Share Scheduler," Proceedings 1990 USENIX Winter Technical Conference. [Gingell87] Gingell, R., Moran, J., and Shannon, W., "Virtual Memory Architecture in SunOS," Proceedings 1987 USENIX Summer Technical Conference. [Khanna92] Khanna, S., Sebree, M., and Zolnowsky, J., "Realtime Scheduling in SunOS 5.0," Proceedings 1992 USENIX Winter Technical Conference. [Leffler89] Leffler, McKusick, Karels, Quarterman, "4.3BSD UNIX Operating System," Chapter 4.4 Process Scheduling, Addison Wesley, 1989. [Mcvoy 91] McVoy L., Kleiman, S., "Extent-like Performance from a UNIX File System," Proceedings 1991 USENIX Winter Technical Conference. [Scheiffler86] Scheiffler, R., and Gettys, J., "The X Window System," ACM Transactions on Graphics, 5(2), April, 1986. 9.0 Acknowledgments The authors would especially like to thank James Hanko of Sun Microsystems Laboratories for his work on the time-share scheduler and scheduling class. Jackson Wong and Stuart Marks offered technical advice on the X server and OPEN LOOK window manager. We would also like to thank those who ran the prototype and provided valuable feedback. 10.0 Biographies Kevin Clarke (kevin.clarke@eng.sun.com) is a Member of the Technical Staff at SunSoft, Inc., working on OpenWindows' performance issues. He is SunSoft's representative to the X Performance Characterization group. Steve Evans (steve.evans@eng.sun.com) is a Senior Staff Engineer at SunSoft, Inc. He is currently working on architectural issues for SunSoft for the Common Open Software Environment desktop. In the past, he has worked on 2D graphics, window system components, user interface toolkits, structured graphics editing software and has done a fair bit of management. David Singleton (dave.singleton@eng.sun.com) is a Member of the Technical Staff at SunSoft, Inc. working on OS performance. Bart Smaalders (bart.smaalders@eng.sun.com) is a Staff Engineer at SunSoft, Inc. He is currently handling Common Open Software Environment performance issues for SunSoft. In the past, he has been the OpenWindows performance lead, multithreaded GUI toolkits, developed distributed system administration software and built a steamboat.