USENIX '05 Paper
[USENIX '05 Technical Program]
Providing Dynamic Update in an Operating System
Dynamic update is a mechanism that allows software updates and patches to be applied to a running system without loss of service or down-time. Operating systems would benefit from dynamic update, but place unique demands on any implementation of such features. These demands stem from the event-driven nature of operating systems, from their restricted run-time execution environment, and from their role in simultaneously servicing multiple clients.
We have implemented a dynamic update mechanism in the K42 research operating system, and tested it using previous modifications of the system by kernel developers. Our system supports updates changing both kernel code and data structures. In this paper we identify requirements needed to provide a dynamically updatable operating system, describe our implementation, and present our experiences in designing and using the dynamic update mechanism. We also discuss its applicability to other operating systems.
As computing infrastructure becomes more widespread, there has been an increasing number of patches for functionality, performance, and security reasons. To take effect, these patches traditionally require either restarting system services, or often rebooting the machine. This results in downtime. Sometimes this downtime can be scheduled, if for example the patch adds a feature, improves performance, etc. However, in some situations, such as applying a security patch, delaying the update is not desirable. Users and system administrators are forced to trade off the increased vulnerability of a security flaw against the cost of unplanned downtime.
Dynamic update  is used to avoid such downtime. It involves on-the-fly application of software updates to a running system without loss of service. The increased unplanned down-time of computing infrastructure to apply updates, combined with the demand for continuous availability, provides strong motivation to investigate dynamic update techniques for operating systems.
In addition to the above mentioned impact on availability, dynamically updatable systems have other benefits. Such systems provide a good prototyping environment. They allow, for example, a new page replacement, file system, or network policy to be tested without rebooting. Further, in more mature systems such as mainframes, some user constraints prevent the system from ever being shutdown. In such an environment, users can only get new functionality into the system by performing a dynamic update.
An operating system is a unique environment with special constraints, and additional challenges must be solved to provide dynamic update functionality. We have addressed these challenges in the implementation of a dynamic update mechanism for K42, an object-oriented research operating system supporting hot-swapping. The focus of this paper is on the implementation and mechanisms needed to provide dynamic update. This work builds on previously reported work [28,6], and on other K42 features. Some of the requisite characteristics we identify for dynamic update exist in other systems or have recently been incorporated , while others require additional support. Where appropriate, we point out the generality of our techniques to other operating systems, as well as what infrastructure would be required to take full advantage of dynamic update techniques. In addition to describing our implementation, we describe our experiences applying dynamic update in K42, using three motivating examples taken from changes made by K42 kernel developers.
The rest of this paper is organised as follows: Section 2 discusses the system requirements for supporting dynamic update, Section 3 describes our implementation of dynamic update in K42, and Section 4 discusses how the same functionality might be implemented in other operating systems. Next, Section 5 describes our experiences applying dynamic update to K42 using three motivating examples, Section 6 discusses the limitations of our implementation and our plans for future work, Section 7 compares related work, and Section 8 concludes.
There are several fundamental requirements in providing a dynamic update capability. Here we identify them, in Section 3.2 we describe how we satisfy them in K42, and then in Section 4 we generalise to other operating systems.
At a minimum, dynamic update needs to support changes to the code of a system, however there are varying levels of support possible for updates which also affect data. We classify dynamic updates in this way:
Having classified the possible updates, we now introduce a set of fundamental requirements for dynamic update.
While creating updatable units is easier with support from languages such as C++, it is still possible without such support. Primarily, providing updatable units means designing with good modularity and obeying module boundaries. The structure of the system dictates what is feasible.
15]. Thus, system support is required to achieve and detect a safe point. Potential solutions involve requiring the system to be programmed with explicit update points, or blocking accesses to a unit, and detecting when it becomes idle, or quiescent.
An operating system is fundamentally event-driven, responding to application requests and hardware events, unlike most applications, which are structured as one or more threads of execution. As discussed later, this event-based model can be used to detect when an updatable unit of the system has reached a safe point. Additional techniques can be employed to handle blocking I/O events or long running daemon threads.
12]. Note that the first two classes of update, dynamic update to code and dynamic update to single-instance data, are still possible without factories, but it is not possible to support dynamic update affecting multiple-instance data without some kind of state tracking mechanism.
The level of support required for version management is affected by the complexity of update interdependencies, but at a minimum it should be possible to track a version number for each update present in the system, and for these version numbers to be checked before an update is applied.
We now describe our implementation of dynamic update in K42. As noted previously, some of the techniques used in the implementation are specific to K42, but other operating systems are becoming more amenable to dynamic update, as discussed in the next section.
The K42 project is developing a new scalable open-source research operating system incorporating innovative mechanisms and policies, and modern programming technologies. It runs on 64-bit cache-coherent PowerPC systems, and supports the Linux API and ABI. It uses a modular object-oriented design to achieve multiprocessor scalability, enhance customisability, and enable rapid prototyping of experimental features (such as dynamic update).
Object-oriented technology has been used throughout the system. Each resource (for example, virtual memory region, network connection, open file, or process) is managed by a different set of object instances . Each object encapsulates the meta-data necessary to manage the resource as well as the locks necessary to manipulate the meta-data, thus avoiding global locks, data structures, and policies. The object-oriented nature enables adaptability, because different resources can be managed by different implementations. For example, each running process in the system is represented by an in-kernel instance of the Process object (analogous to the process control block structure present in other operating systems). Presently two implementations of the Process interface exist, ProcessReplicated, the default, and ProcessShared, which is optimised for the case when a process exists on only a single CPU . The K42 kernel defaults to creating replicated processes, but allows for a combination of replicated and shared processes.
K42 uses clustered objects , a mechanism that enables a given object to control its own distribution across processors. Using the object translation table facility provided by clustered objects, hot-swapping [4,28] was implemented in K42. Hot-swapping allows an object instance to be transparently switched to another implementation while the system is running, and forms the basis of our dynamic update implementation.
In Section 2.2, we identified several requirements for dynamic update of an operating system. In K42, these requirements are addressed by our implementation of the dynamic update mechanism, as follows:
22,23]. This technique makes use of the fact that each system request is serviced by a new kernel thread, and that all kernel threads are short-lived and non-blocking.
Each thread in K42 belongs to a certain epoch, or generation, which was the active generation when it was created. A count is maintained of the number of live threads in each generation, and by advancing the generation and waiting for the previous generations' counters to reach zero, it is possible to determine when all threads that existed on a processor at a specific instance in time have terminated .
The implementation blocks new invocations of an object being updated, and then uses the generation-count mechanism to detect quiescence .
transfer negotiation protocol to allow the negotiation of a common intermediate format that both objects support . Object developers must implement data conversion functions to and from common intermediate formats.
This generalised technique was developed to support hot-swaps between arbitrary implementations of an object. In the case of dynamic update, usually the replacement object is merely a slightly modified version of the original object, with similar state information, so the conversion functions perform either a direct copy, or a copy with slight modifications.
In cases where a lot of state is maintained, or when many object instances must be updated, a copy is an unnecessary expense, because the updated object is deleted immediately afterwards. For example, the process object maintains a series of structures which describe the process' address space layout. To avoid the cost of deep-copying and then discarding these structures, the data transfer functions involved simply copy the pointer to the structure and set a flag in the old object. When the object is destroyed it checks this flag and, if it is set, does not attempt destruction of the transferred data structures. Effectively ownership of the structure is transferred to the new object instance. This only works in cases where the new object uses the same internal data format as the old object. This is true in many dynamic update situations. In cases where this is not true, the negotiation protocol ensures that a different transfer function is used.
object translation table. Each object has an entry in the table, and all object invocations are made through this reference. In the process of performing a dynamic update, the translation table entries for an object are updated to point to the new instance, which causes future calls from clients to transparently invoke the new code. The object translation table was originally introduced into K42 to support the clustered object multiprocessor scalability mechanism , and we have been able to utilise it to implement hot-swapping and thus dynamic update.
When an object that has multiple instances is updated, we must also redirect creations of that type. This redirection is provided by the factory mechanism, described in Section 3.3.
As we have mentioned, the object translation table adds an extra level of indirection on all object invocations. This indirection enables an interposition mechanism whereby an object's entry in the object translation table is modified, causing all accesses to that object to transparently invoke a different interposer object. The interposer can then choose to pass the call along to the original object. This mechanism is used by the hot swapping and dynamic update implementations.
Hot-swapping operates by interposing a mediator object in front of the object to be hot-swapped. The mediator passes through several phases, first tracking incoming calls until it knows (through the generation-count mechanism) that all calls are being tracked, then suspending further calls until the existing tracked calls complete. At this point the object is quiescent. The mediator then performs state transfer format negotiation, followed by the state transfer between the old and the new object instances. Finally, it updates the object translation table reference to the new object, and forwards the blocked calls.
To perform updates, the code for the updated object must be present. The normal process for adding an object to K42 was to recompile the kernel, incorporating the new object, and then reboot the system. This is insufficient for dynamic update, so we have developed a kernel module loader that is able to load the necessary code for an updated object into a running kernel or system server.
A K42 kernel module is a relocatable ELF file with unresolved references to standard kernel symbols and library routines (such as err_printf, the console output routine). Our module loader consists of a special object in the kernel that allocates pinned memory in the kernel's text area, and a trusted user-space program that has access to the kernel's symbol table. This program uses that symbol table to resolve the undefined symbols in the module, and load it into the special region of memory provided by the kernel object. It then instructs the kernel to execute the module's initialisation code.
Our module loader operates similarly to that used in Linux , but is simpler. Linux must maintain a dynamic symbol table and support interdependencies between modules, we avoid this because all objects are invoked indirectly through the object translation tables. A module can (and to be useful should) contain code that is called by the existing kernel without requiring its symbols to be visible. Its initialisation code simply instantiates replacement objects and performs hot-swap operations to invoke the code in those object instances. Our module loader performs the relocations and symbol table management at user-level, leaving only the space allocator object in the kernel.
Hot-swapping allows us to update the code and data of a single specific object instance. However, K42 is structured such that each instance of a resource is managed by a different instance of an object. To dynamically update a kernel object, the infrastructure must be able to both locate and hot-swap all instances of that object, and cause any new instantiations to use the updated object code. Note that, as we have mentioned, this problem is not unique to K42; to support dynamic updates affecting data structures requires a mechanism to track all instances of those data structures and update them.
Previously in K42, object instances were tracked in a class-specific manner, and objects were usually created through calls to statically-bound methods. For example, to create an instance of the ProcessReplicated object (the implementation used by default for Process objects), the call used was:
ProcessReplicated::Create( ProcessRef &out, HATRef h, PMRef pm, ProcessRef creator, const char *name);
This leads to problems for dynamic update, because the Create call is bound at compile-time, and cannot easily be redirected to an updated implementation of the ProcessReplicated object, and also because we rely on the caller of this method to track the newly created instance.
To track object instances and control object instantiations, we used the factory design pattern . In this design pattern, the factory method is an abstraction for creating object instances. In K42, factories also track instances that they have created, and are themselves objects. Each factory object provides an interface for creating and destroying objects of one particular class, and maintains the set of objects that it has created.
The majority of the factory implementation is factored out using inheritance and preprocessor macros, so that adding factory support to a class is relatively simple. Using our previous example, after adding the factory, the creation call changed to:
where (...) represents the arguments as before. The macro above hides some implementation details, whereby the default factory for a class is referenced using a static member; it expands to the following:
Using a factory reference allows us to hot-swap the factory itself, which is used in our implementation of dynamic update.
To provide rudimentary support for configuration management, factories carry a version number identifying the specific implementation of the factory and its type. The factories in the base system all carry version zero, and updated factories have unique non-zero version numbers. We assume a strictly linear model of update, when an update occurs the current version number of the factory is compared to the version number of the update, and if the update is not the immediately succeeding version number, the update is aborted. To support reverting updates in this scheme, we could reapply the previous version with an increased version number.
Performance and scalability influenced our implementation of the factories. For example, object instances are tracked for dynamic update in a distributed fashion using per-CPU instance lists. Moreover, we found that adding factories to K42 was a natural extension of the object model, and led to other advantages besides dynamic update. As an example, in order to choose between ProcessReplicated and ProcessShared, K42 had been using a configuration flag that was consulted by the code that creates process objects to determine which implementation to use. Using the factory model, we could remove this flag and allow the scheme to support an arbitrary number of implementations, by changing the default process factory reference to the appropriate factory object.
We use factories to implement dynamic update in K42. To perform a dynamic update of a class, the code for the update is compiled along with some initialisation code into a loadable module. When the module is loaded, its initialisation code is executed. This code performs the following steps (illustrated in Figure 1):
In some special cases, for example when an update adds new objects to the system that do not replace any existing objects, or when an update affects an object with only a single instance, the full dynamic update implementation is not required. In these cases the initialisation code in the module is simpler.
In this section we discuss the way in which a dynamic update mechanism might be provided in operating systems other than K42, focusing on Linux. Previously we identified several key requirements, these might be provided as follows:
20] being one well-known example. Furthermore, modern operating systems are being constructed in an increasingly modular fashion to allow for better multiprocessor performance, and improved customisability. This trend toward modularity provides the necessary interfaces for creating updatable units. Even monolithic systems such as Linux now have support for loadable kernel modules for device drivers, file systems, networking functionality, etc. This support could be leveraged to provide dynamic update capabilities in the same areas where kernel modules can be used.
22], which is similar to the generation count mechanism used in K42. We expect that other operating systems would also be able to add RCU-style quiescence detection, which offers other benefits, such as improved scalability.
25]. If the addition of factories was required, the modules could also be made responsible for tracking instances of their own state.
For dynamic update to multiple-instance data structures to be supported, it is desirable that each instance be individually updatable. For example, the VFS layer's use of one set of function pointers per node, rather than one set for the entire file system, allows the file system's data structures to be converted incrementally. The alternative would be to block all access to all file system nodes while they are updated, effectively halting the whole system.
We have performed a number of measurements to evaluate the performance penalty imposed by our dynamic update mechanism. All the experiments were conducted on an IBM pSeries 630 Model 6E4 system, with four 1.2GHz POWER4+ processors and 8GB of main memory.
We ran microbenchmarks to directly measure the code of adding the factory mechanism. Using a factory for an object implies extra cost when creating the object, because the creation must be recorded in the factory's data structures.
We measured the cost of creating three different objects using a factory and using a statically bound method, this includes allocating storage space for the object, instantiating it, and invoking any initialisation methods. Each test was repeated 10,000 times, and the total time measured using the processor's cycle counter. Our results are summarised in Table 1.
The first object, a dummy object, was created specifically for this test, and encapsulates a single integer value. This result represents the worst-case for adding a factory, with an overhead of 12% over the base creation cost of 2.22s. The next object is an FCM (file cache manager), an instance of which is maintained by K42 for every open file in the system. Creating an FCM object is 5.6% slower with a factory, but this number overstates the true impact, because in practice FCM creation is usually followed by a page fault, which must be satisfied either by zero-filling a page, or by loading it from disk. Finally, we measured the cost of creating a process object, a relatively expensive operation because it involves initialising many other data structures, and found that in such a case, the additional cost imposed by the factory was very small, even before considering the additional costs involved in process creation such as context switches and initial page faults.
To get a more complete picture of the overhead imposed by the presence of factories, we used the SPEC software development environment throughput (SDET) benchmark . This benchmark executes one or more scripts of user commands designed to simulate a typical software-development environment (for example, editing, compiling, and various UNIX utilities). The scripts are generated from a predetermined mix of commands, and are all executed concurrently. It makes extensive use of the file system, process, and memory management subsystems.
We ran SDET benchmarks with four configurations of the system:
We found that in these cases the use of factories imposes no noticeable performance degradation on system throughput, as measured by SDET. We have not yet extended factories to other objects in the system, however we expect the performance impact to also be minor, since FCMs and processes constitute a significant portion of the kernel objects created.
The cost of performing a dynamic update itself is more significant, in some initial experiments we measured 20ms to update one hundred live instances of the dummy object. This is because the hot-swapping implementation is not yet optimised for the case where large numbers of objects are swapped concurrently. We plan to improve this, although since a dynamic update does not block the entire system while it is applied, the overall time taken for an update is less critical.
To demonstrate the effectiveness of dynamic update, we present three examples of system changes of increasing complexity that we have applied using dynamic update. These examples relate to the memory management code of the K42 kernel [19,3].
Benchmarking of a memory-intensive parallel application showed poor scalability during its initialisation phase. Analysis of the problem determined that a bottleneck occurred during the resizing of a shared hash table structure, and a new partitioned memory management object was developed that did not suffer from the problem. This object added a new interface to the kernel, allowing user programs to create a partitioned memory region if they specified extra parameters.
Adding a new piece of code to the kernel and making it available through a new interface is the simplest case of dynamic update, because we can avoid replacing old code or transferring state information. This update was implemented as a simple loadable module, consisting of the code for the new region object and some initialisation code to load it and make it available to user programs. This module could be shipped with programs requiring it, or could be loaded into the kernel on demand when a program requires the new interface, either way avoiding a reboot.
This scenario demonstrates the use of a module loader combined with an extensible kernel interface to add new functionality to a running kernel. There is nothing inherently new here, existing systems also allow modules to be loaded, for example to provide new file systems or device drivers. However, K42's modularity makes it possible to replace a portion of the page-fault path for a critical application with a new set of requirements, which could not be done on an existing system such as Linux.
This scenario involves a bug fix to a kernel service, one of the key motivations for dynamic update. In the course of development, we discovered a race condition in our core kernel memory allocator that could result in a system crash when kernel memory was allocated concurrently on multiple CPUs.
Fixing this bug required adding a lock to guard the allocation of memory descriptors, a relatively simple code change. In fact, only two lines of code were added, one to declare the lock data structure, and another to acquire (and automatically release on function return) the lock. A recompile and reboot would have brought the fix into use. However, even with continual memory allocation and deallocation occurring, we were able to avoid the reboot and dynamically apply the update using our mechanism.
The replacement code was developed as a new class inheriting almost all of its implementation from the old buggy object, except for the declaration of the lock and a change to the function that acquired and released it. This caused the C++ compiler to include references to all the unchanged parts of the old class in the replacement object code, avoiding programmer errors. Simple copying implementations of the state transfer functions were also provided to allow the object to be hot-swapped. The key parts of the replacement code are shown in Figure 2.
The new class was compiled into a loadable module, and combined with initialisation code that instantiated the new object and initiated a hot-swap operation to replace the old, buggy instance. Because this object was a special case object with only a single instance, it was not necessary to use the factory mechanism.
This scenario demonstrates the use of hot-swapping as part of our dynamic update mechanism, combined with a kernel module loader, to dynamically update live code in the system.
This scenario involves an optimisation to the implementation of file cache manager objects. An FCM is instantiated in the K42 kernel for each open file or memory object in the system.
We discovered that the unmapPage method did not check if the page in question was already unmapped before performing expensive synchronisation and IPC operations. These were unnecessary in some cases.
We developed a new version of the FCM object that performed the check before unmapping, and prepared it as a loadable module. Applying this update dynamically required the factory mechanism, because the running system had FCM instances present that needed to be updated, and because new instantiations of the FCM needed to use the code in the updated module.
This scenario demonstrates all the components of our dynamic update implementation, using a module loader to load the code into the system, a factory to track all instances of state information that are affected by an update, and hot-swapping to update each instance.
The implementation we have accomplished thus far provides a basic framework for performing dynamic update. This is a rich area for investigation, and is becoming important for providing usable and maintainable systems. Here we discuss areas for future work.
This is our next step in expanding the effectiveness of dynamic update. When considering various changes to K42 to use as example scenarios for this work, many had to be rejected because they involved interface changes. While such changes might be less common in a production system, rather than a rapidly evolving experimental system such as K42, the restriction on changing object interfaces is currently the most serious limitation of this work.
An alternative approach to this problem is used by Nooks to support recoverable device drivers in Linux . In this system, shadow drivers monitor all calls made into and out of a device driver, and reconstruct a driver's state after a crash using the driver's public API. Only one shadow driver must be implemented for each device class (such as network, block, or sound devices), rather than for each driver. A similar system could be used for dynamic update, instead of relying on objects to provide conversion functions, a shadow object could monitor calls into each updatable kernel object, reconstructing the object's state after it is updated. This approach suffers from several drawbacks. First, there is a continual runtime performance cost imposed by the use of shadows, unlike conversion functions which are only invoked at update time. Secondly, the use of shadow drivers is feasible because there is a small number of device interfaces relative to the number of device drivers, but this is generally not the case for arbitrary kernel objects, which implement many different interfaces.
27] to roll back a failed update.
21,17]. This is most likely inadequate for real use, where updates may be issued by multiple sources, and may have complex interdependencies.
More complex versioning schemes exist, such as in the .NET framework, where each assembly carries a four-part version number, and multiple versions may coexist . We will need to reconsider versioning issues once we start automating the update preparation process, and more developers start using dynamic updates.
Previous work with K42 developed the hot-swapping feature , including the quiescence detection, state transfer, and object translation table mechanisms. Our work extends hot-swapping by adding the state tracking, module loader, and version management mechanisms, and combining them to implement dynamic update.
To our knowledge, no other work has focused on dynamic update in the context of an operating system. Many systems for dynamic updating have been designed, and a comprehensive overview of the field is given by Segal and Frieder . These existing systems are generally either domain-specific [9,10,16], or rely on specialised programming languages [21,17,1], making them unsuitable for use in an operating system implemented in C or C++.
Proteus  is a formal model for dynamic update in C-like languages. Unlike our system for achieving quiescence on a module level, it uses pre-computed safe update points present in the code. Our system can also support explicit update points, however we have not found this necessary due to the event-driven programming used in K42.
Dynamic C++ classes  may be applicable to an updatable operating system. In this work, automatically-generated proxy classes are used to allow the update of code in a running system. However, when an update occurs it only affects new object instantiations, there is no support for updating existing object instances, which is important in situations such as security fixes. Our system also updates existing instances, using the hot-swapping mechanism to transfer their data to a new object.
Some commercial operating systems offer features similar to Solaris Live Upgrade , which allows changes to be made and tested without affecting the running system, but requires a reboot for changes to take effect.
Component- and microkernel-based operating systems, where services may be updated and restarted without a reboot, also offer improved availability. However, while a service is being restarted it is unavailable to clients, unlike our system where clients perceive no loss of service. Going a step further, DAS  supported dynamic update through special kernel primitives, although the kernel was itself not updatable.
Finally, extensible operating systems [7,27] could potentially be modified to support dynamic update, although updates would be limited to those parts of the system with hooks for extensibility, and many of the same problems addressed here (such as state transfer) would be encountered.
We have presented a dynamic update mechanism that allows patches and updates to be applied to an operating system without loss of the service provided by the code being updated. We outlined the fundamental requirements for enabling dynamic update, presented our implementation of dynamic update, and described our experiences applying several example dynamic updates to objects taken from changes made by kernel developers. Our implementation also incurs negligible performance impact on the base system.
Although dynamic update imposes a set of requirements on operating systems, as described in this paper, those requirements are already being incrementally incorporated into systems such as Linux. This includes RCU to detect quiescent points, more modular encapsulated data structures, and the indirection needed to redirect invocations. We expect that dynamic update will become increasingly important for mainstream operating systems.
We wish to thank the K42 team at IBM Research for their input, in particular Marc Auslander and Michal Ostrowski, who contributed to the design of the kernel module loader, and Bryan Rosenburg, whose assistance in debugging K42 was invaluable. We also thank Raymond Fingas, Kevin Hui, and Craig Soules for their contributions to the underlying hot-swapping and interposition mechanisms, and finally our paper shepherd, Vivek Pai.
This work was partially supported by DARPA under contract NBCH30390004, and by a hardware grant from IBM. National ICT Australia is funded by the Australian Government's Department of Communications, Information Technology, and the Arts and the Australian Research Council through Backing Australia's Ability and the ICT Research Centre of Excellence programs.
K42 is released as open source and is available from a public CVS repository; for details refer to the K42 web site: https://www.research.ibm.com/K42/.
This paper was originally published in the
Proceedings of the 2005 USENIX Annual Technical Conference,
April 1015, 2005, Anaheim, CA, USA
Last changed: 2 Mar. 2005 aw