################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the USENIX Summer 1993 Technical Conference Cincinnati, Ohio June 21-25, 1993 For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org High Performance Dynamic Linking Through Caching Michael N. Nelson Graham Hamilton Sun Microsystems Laboratories, Inc. Mountain View, CA 94043 USA Abstract The Spring Operating System provides high performance dynamic linking of program images. Spring uses caching of both fixed-up program images and partially fixed-up shared libraries to make dynamic linking of program images efficient, to reduce the need for PIC (position-independent code), and to improve page sharing between different program images running the same libraries. The result is that with program image caching, dynamically-linked programs have a start-up cost similar to statically-linked pro- grams regardless of the number of unresolved symbols in dynamically-linked program images and shared libraries. In addi- tion, with library and program image caching, we have reduced the need for PIC and have increased page sharing. FIGURE 1. Major system components of a Spring node 1. Introduction Dynamic linking of shared libraries is used in most modern UNIX systems. These systems include, among others, SunOS [2] and SVR4 [11]. Dynamic linking has several good properties: Un- resolved symbols in a program do not have to be resolved until the program is launched. This decreases the time to build an im- age. Shared libraries do not have to be statically bound into a program image thus saving disk space. Shared libraries can be shared between many executing im- ages thus reducing memory costs. Program images can take advantage of new releases of shared libraries without being rebuilt. Unfortunately, by deferring the linking of programs to program start-up time, and possibly even run time, dynamically-linked programs start slower than statically-linked programs. This de- gradation was believed to be an acceptable cost to pay for the advantages of dynamic linking. However, with the advent of languages such as C++, the number of symbols that must be resolved at run time is dramatically increasing thus increasing the start-up time of dynamically-linked programs to unacceptable levels. This paper describes the dynamic linking architecture and imple- mentation in Spring, a new distributed object-oriented operating system. We had six main goals for the Spring dynamic linker: Substantially reduce the start-up time for dynamically- linked images. Maintain a high degree of page sharing between different images running the same libraries. Provide a dynamic linking solution that performs well even when the number of relocatable symbols increases dramatical- ly. Run SunOS binaries on Spring. Support existing SunOS linkage semantics. Have non-PIC (position-independent code) libraries and main programs without significantly impacting program start-up time or page sharing. The Spring dynamic linking system described in this paper achieves all six goals by caching fully linked program images and partially fixed-up shared libraries. After a dynamically-linked program is run once, future instances of the program will require no dynamic linking, and once a shared library is linked for one program, subsequent programs that use the same shared library will link much more quickly. The rest of this paper is organized as follows: Section 2 pro- vides an overview of the Spring Operating System; Section 3 presents related work; Section 4 describes the SunOS dynamic linking implementation; Section 5 describes the Spring dynamic linking implementation; Section 6 gives some performance measure- ments; Section 7 looks at the relevance of this work to other operating systems; Section 8 proposes future work; and Section 9 offers some conclusions. 2. Spring Operating System Spring is a distributed, multi-threaded operating system that is structured around the notion of objects. A Spring object is an abstraction that contains state and provides a set of operations to manipulate that state. The description of an object and its operations is an interface that is specified in an interface de- finition language. The interface is a strongly-typed contract between the server (implementor) and the client of the object. Since Spring is object-oriented, it supports the notion of inter- face inheritance. An interface that accepts an object of type foo will also accept a subclass of foo. A Spring domain is an address space with a collection of threads. A given domain may act as the server of some objects and the client of other objects. The server and the client can be in the same domain or in different domains. In the latter case, the representation of the object includes an unforgeable nucleus door identifier that identifies the server domain [3]. The Spring kernel supports basic cross-domain invocations, threads, and provides basic virtual memory support for memory mapping and physical memory management [6]. A typical Spring node runs several servers in addition to the kernel as shown in Figure 1. All services are exported via objects defined using the inter- face definition language. In general, any collection of servers may reside in the same or in different domains. The decision on where to run a particular server is made for administration, pro- tection, and performance reasons, and is independent of the in- terface of the service. 2.1. UNIX Subsystem Spring can run most dynamically-linked SunOS binaries using a UNIX subsystem [5]. This includes basic UNIX programs such as cat and csh and more sophisticated programs such as openwindows and most X applications. The UNIX subsystem consists of two parts: a UNIX process server that is responsible for managing process ids, ptys, sockets, and signals between processes, and a shared li- brary libue.so that contains code to emulate all UNIX system calls. When a SunOS dynamically-linked binary is run on Spring, the libc shared library that is used to dynamically link the binary is actually a copy of libue.so. 3. Related Work Dynamic linking was first used in the Multics [9] and TENEX [7] operating systems. However, it was not used in mainstream UNIX systems until System V [1] and SunOS [2] added dynamic linking support. More recently, SVR4 [11] and HP-UX [10] provided dynamic linking support similar to the SunOS dynamic linking architec- ture. We will describe the SunOS implementation in detail in Sec- tion 4. The notion of caching previously run programs has been used in past UNIX systems. Early versions of UNIX had the notion of the "sticky" bit. If this bit was set, then the program would remain cached in memory after it exited. The next time that the program was run it would not have to be loaded into memory. However, to the best of our knowledge, this same strategy of caching previ- ously run programs has not been used for dynamically-linked pro- grams. Thus, every time that a dynamically-linked program is run it must be relinked from scratch. 4. SunOS Background In this section we will examine dynamic linking on SunOS. We will first examine the notion of position-independent code and its effectiveness on C++ code, then we will examine how SunOS performs dynamic linking, and finally we will examine some li- braries and programs to get some insight into what needs to be done to perform efficient dynamic linking on Spring. 4.1. PIC (Position-independent Code) The Sun compilers support two flavors of output code: position- dependent code and position-independent code. Position-dependent code is the default. In position-dependent code the compiler makes no attempt to minimize the number of fix-ups that must be performed at dynamic link time. Typically one relocation will be required for each source code reference to an external symbol. In PIC, the compiler and the static linker conspire to attempt to minimize the number of relocations that must be performed at dynamic link time by grouping as many external references as pos- sible into tables, so that there is a single reference to each symbol. The code that uses these symbols then indirects (using PC-relative addressing) through these tables. For example, rather than performing a direct procedure call, PIC will load an address from an indirection table and jump through that address. PIC has two advantages. First, it reduces the number of dynamic linkage relocations that need to be performed since a widely used procedure still only ends up with one table entry that needs to be relocated. Second, it also reduces the number of pages that will be updated by the relocations, since the table entries are clustered closely together. Reducing the number of page updates is very important as this maximizes the number of physical pages that can be shared between different users of the library. 4.2. PIC and C++ PIC successfully minimizes direct-instruction-stream references to relocatable symbols by adding a level of indirection. However, this approach cannot be applied to references from program data. If a given memory location is supposed to contain a pointer to a given relocatable symbol, then the compiler cannot simply add a level of indirection. This is not normally a significant problem for C programs as it is relatively rare for C data structures to be declared with pointers to procedures or other data. The situation with program data references is rather different in C++. Both the cfront 3.0 preprocessor and the G++ compiler gen- erate virtual function tables as initialized data structures. These virtual function tables are full of references to relocat- able symbols. Thus, even when using PIC, a typical C++ program or library will contain a much higher number of symbolic relocations than a similar C program (see Section 4.4). Although PIC reduces the number of symbolic resolutions over non-PIC, PIC does tend to be larger than non-PIC due to the addi- tion of extra levels of indirection. With the G++ compiler, use of PIC increases the size of C++ texts between 3 and 20 percent, with roughly similar impacts on execution performance. The overall size of the core Spring library libspring.so that is written in C++ increases by 13 percent when compiled PIC with the G++ compiler. While this size increase is an acceptable price to pay for the benefits of shared dynamic libraries, it is not an insignificant cost. We were interested in determining if we could reap the benefits of shared dynamic libraries without paying this particular space and time tax. Note that the increase in code size due to PIC is fairly sensi- tive to particular details of compiler technology and CPU archi- tecture. Measurements of systems with other compilers may yield different results. 4.3. SunOS Dynamic Linking In this section we will briefly describe the SunOS dynamic link- er. For more details on the SunOS dynamic linker see [2]. Each a.out file contains as part of its text segment two tables used for dynamic linking. The first table is the symbol table. This contains a description of each symbol that is either defined or referenced by the a.out file. There is only a single entry for each symbol, even if it is both defined and referenced. The sym- bol table is indexed by an associated hash table that facilitates the mapping from a linker symbol to a particular symbol table en- try. The second table in each a.out file is the relocation table. This table contains a description of each location in the a.out file that needs to be patched up at dynamic link time. There are, broadly speaking, two kinds of relocations. The first kind does not require symbol resolution. This category basically covers re- locations that are purely a matter of patching up code and data to reflect the addresses they were loaded at in memory. The second kind of relocations are symbolic relocations that are dependent on the final values of some particular linkage symbol. Each of these relocations contains a reference (actually a table index) to the symbol table entry describing the corresponding symbol. A runnable image is composed from a set of a.out files, one of which is a main program and the rest of which are shared li- braries. At dynamic link time, each of the relocations is pro- cessed in turn. Non-symbolic relocations are straightforward and can be performed solely on the basis of the address at which the a.out file has been mapped into memory. Symbolic relocations are more troublesome. For each symbolic relocation, the dynamic link- er finds the symbol table entry associated with the relocation. This now gives the linker the name of the symbol that is being used for the relocation. It now searches through each of the a.out files in turn, searching for the first a.out file that con- tains a definition of the symbol. The a.out files are searched in a standard order, starting with the main program and then with the libraries in the order that they were specified on the static link command line, left to right. Thus for example, if one linked an image: ld -o foo foo.o /lib/fred.so /lib/libc.so then when resolving a symbolic relocation, foo would be searched first followed by fred.so and libc.so. When a definition of the given symbol is found in a given a.out file, the linker combines the symbol table entry and the address at which the a.out file is loaded to calculate the absolute ad- dress to which the symbol refers. That value is then used to per- form the relocation. 4.4. Looking at Libraries and Programs Table 1 contains some statistics on several typical libraries. Some of these libraries are compiled PIC and the remainder are compiled non-PIC. The column labeled "defined symbols" is the number of symbols defined within the library, and the column la- beled "undefined symbols" is the number of additional symbols that the library references but does not define. The table also includes the total number of relocations for each library, the number of these that were symbolic relocations, and the number that were references to undefined symbols. The liblinker library is a Spring library that contains the Spring linker implementa- tion; the libns library is a Spring library that contains the generic Spring nameserver implementation; the libue library con- tains Spring UNIX subsystem code; libspring is the standard Spring library that is linked with all Spring applications; and libc is the standard SunOS library that is linked with SunOS ap- plications. Shared Library Source Language Compiled PIC? De- fined symbols Undefined symbols Total relocations Symbolic relo- cations Undefined relocations liblinker C++ Yes 179 243 659 313 231 liblinker C++ No 187 231 2699 1691 1587 libns C++ Yes 498 416 1392 885 621 libns C++ No 513 398 7497 5763 5287 libue C++ and C Yes 10162 5 10330 4980 5 libue C++ and C No 10175 5 41479 28168 5 libspring C++ Yes 8772 0 7622 4073 0 libspring C++ No 8780 0 38652 26948 0 libc C Yes 1048 6 1211 500 6 libc C No 1041 6 10424 4532 39 TABLE 1. Library Statistics Table 1 shows several interesting things. First, PIC is very good at eliminating relocations; it reduced them by a factor of five for the core Spring library libspring. Second, the Spring library libspring, which is written in C++, is much larger in terms of relocations and defined symbols than the SunOS libc, which is written in C; libspring compiled PIC contains six times as many relocations and eight times as many defined symbols as libc com- piled PIC. Finally, although there are a large number of symbolic relocations for libspring when it is complied non-PIC (over 26,000), they are all to symbols that are defined in libspring itself. Now some small number of these symbols may be redefined in other a.out files in a linked image, but this is comparatively rare. Almost all the relocations defined in libspring will end up being satisfied from libspring itself. Table 2 contains statistics for some main programs. Note that for main programs all remaining relocations are symbolic relocations to undefined symbols. The top six programs are programs written in C and compiled for SunOS, and the bottom six are programs written in C++ and compiled for Spring. All libraries used by the programs are shared libraries. Program Source Language CompiledPIC? Defined Symbols Undefined Symbols Total Relocations csh C Yes 722 83 83 csh C No 721 83 106 xterm C Yes 310 196 379 xterm C No 309 196 405 tar C Yes 112 66 66 tar C No 111 66 210 ssh C++ Yes 109 321 819 ssh C++ No 117 277 2015 caching_fs C++ Yes 670 444 524 caching_fs C++ No 696 416 4379 machine_ns C++ Yes 21 126 81 machine_ns C++ No 20 126 307 TABLE 2. Program Statistics Table 2 shows two things. First, it once again shows the effec- tiveness of PIC in reducing relocations. Second, Spring programs have many more relocations than SunOS programs even when the Spring programs are compiled PIC. For example, ssh defines the same number of symbols as tar yet ssh has over ten times the re- locations. The results from the measurements of the Spring li- braries and programs show that Spring has to deal with many more relocations than SunOS. Thus, Spring needs a dynamic linking solution that can efficiently link programs and libraries with large numbers of relocations. 4.5. Comments on the SunOS algo- rithm Note that the non-symbolic relocations for a given a.out file are independent of which other a.out files are bound into the program image. These relocations only depend on the address at which the library is loaded. The main cost of dynamic linking is resolving the symbolic relocations and, in particular, in searching through the libraries for symbol definitions. In typical small programs, almost all the definitions will come from the standard OS library (e.g., libc on UNIX or libspring on Spring), but for each symbolic relocation it is first necessary to search the symbol table of the main program and any other minor libraries. This property was somewhat mitigated in an ear- lier Spring dynamic linker [4] by the use of a cache of resolved symbols, but even so, it was by far the dominant component of the linkage. For most programs, the dynamic libraries used by the program contain substantially more relocations and symbol defini- tions than the program itself. This is true both for conventional UNIX utilities such as csh, awk or cpp, and also for window-based tools such as xterm and cmdtool. Clearly there are also programs where the program contains more relocations and symbols than the libraries, but this is not the case for most of the programs for which fast start-up is important. Thus, we were particularly in- terested in minimizing the costs of fixing-up the libraries as opposed to fixing-up the program itself. In order to reduce the cost of program start-up, the SunOS dynamic linker performs cer- tain fix-ups only "on demand." This optimization is available for PIC by filling the indirection table for procedure references with pointers to a special fix-up routine. While this optimiza- tion is useful for SunOS, it is much less useful for Spring li- braries as most of their relocations are data relocations, which cannot be performed lazily. We were therefore prepared to abandon this optimization, provided it did not cause any significant penalty during program start-up. 5. The Spring Dynamic Linker In the last section, we showed that Spring shared libraries and programs have substantially more relocations than SunOS shared libraries and programs. For this reason, Spring needs a new dynamic linking implementation that will allow us to efficiently start dynamically-linked programs on Spring. In this section, we will describe the Spring dynamic linking mechanism that uses caching to provide excellent dynamic linking performance. 5.1. Spring Linking Model Spring uses a different dynamic linking model than SunOS. In SunOS, a program image dynamically links itself as it begins exe- cuting. This is done through start-up code that is statically linked into each SunOS binary. In Spring, program images are linked by a parent domain and then the fully linked image is started. This model allows us to do the caching of linked images and libraries. The Spring model has the disadvantage that the entire image must be linked at once, rather than on demand, as in SunOS. However, because of the effectiveness of caching in Spring, we pay very little penalty for our policy. The actual linking of program images is done by a special linker domain described in Section 5.6. This domain not only links im- ages but it is also responsible for caching linked images and li- braries. 5.2. Motivation An early version of the Spring dynamic linker is described in [4]. The new Spring dynamic linker described in this section fol- lows the same basic linking model as the old Spring dynamic link- er. However, the old dynamic linker had serious performance prob- lems because of the large number of symbols that it had to relo- cate. When the original linker was built, there were over 50,000 relocations in the main Spring library libspring even when it was compiled PIC. Our first attempt at improving performance was to reduce the number of relocations. Through many different tech- niques, we have managed to drastically reduce the number of relo- cations in libspring to less than 8,000 when compiled PIC. Unfor- tunately, the original dynamic linker still did not have reason- able performance and the solution would not scale. Our second attempt at improving performance was to use the caching mechanism described in this section. We wanted to have a solution that pro- vided good performance and would scale if we decided not to use PIC or we once again found ourselves in the situation where we had a huge number of relocations. 5.3. Spring Dynamic Linker The key feature of the Spring dynamic linker design is that it facilitates caching at several different levels. Rather than dynamically linking every image individually, we attempt to cache entire fixed-up images. When we miss in the image cache, rather than linking the entire image from scratch, we attempt to reuse existing cached libraries that are already provisionally fixed- up. The linker domain has a range of addresses that it regards as suitable for dynamic libraries. Whenever the linker is asked to operate on a new dynamic library, it allocates the library space within this range at a location that will not conflict with any currently cached library. It will later arrange that the li- brary is indeed loaded at this location in all dynamically-linked images. Once the linker has allocated an address range for a li- brary, the linker then provisionally fixes-up the library to re- side at this location. It does this by making a copy-on-write copy of the library and then provisionally processing the list of relocations. All the non-symbolic relocations are performed im- mediately. For each of the symbolic relocations, if the symbol is defined in the library, a provisional relocation is performed based on the library's own definition. When the dynamic linker is asked to fix-up an image, it finds the transitive closure set of the libraries needed by that program. For each of these li- braries it obtains a copy-on-write copy of a provisionally fixed-up version of the library. The dynamic linker then performs a final fix-up pass on the main program and the shared libraries. All the relocations in the main program will be to symbolic relo- cations to undefined symbols, as all other relocations are fixed-up when the main program is statically linked. For each such relocation, the linker searches through the libraries for the first suitable symbol definition and uses that symbol's ad- dress to perform the relocation. Fix-ups within the libraries are more interesting. The dynamic linker has already fixed-up all the non-symbolic relocations in the library and has provisionally fixed-up as many symbolic relocations as possible. However, the linker must now validate the symbolic relocations and perform any remaining ones. For each library, the linker compares the library's symbol table against the symbol tables for the linked image and for each library that occurs earlier in the search ord- er. If a symbol that is defined in the library is also defined elsewhere, the linker redefines the library's symbol table entry to say that the symbol has been overridden. The linker then scans through each library's relocation table looking for relocations using undefined or overridden symbols. If the relocation points to an undefined symbol, then the linker can perform a standard relocation using the newly found symbol definition. However, if the relocation points to an overridden symbol, the linker must first undo the effects of the pro 5.4. Assumptions and Conse- quences The Spring dynamic linking strategy has a couple of as- sumptions and some resulting consequences. First, it assumes that the typical program defines a relatively small number of symbols compared with the number of symbols defined and referenced in the standard libraries. Thus, it is assumed that it is significantly cheaper to do a validation pass over a library to check that no symbol definitions have been overridden than it is to perform a full scale dynamic link of that library. This assumption would not be true if both a main program and a library defined large numbers of symbols (making the validation expensive), but the li- brary had only a small number of relocations (meaning that a con- ventional dynamic link would be cheap). The second assumption of the Spring dynamic linking strategy is that it is fairly rare for a main program to redefine a symbol that is defined in a library. If it were common for programs to redefine large numbers of li- brary symbols, then many of our provisional fix-ups would be er- roneous and the cost of undoing and redoing them may be more ex- pensive than a straightforward dynamic link. Additionally, the page containing the redone fix-up will no longer be shareable with other processes that use the same dynamic library. For- tunately, for the bulk of the programs we have looked at, both of these assumptions appear to be true. It is clearly the case that these assumptions will not be true for all programs, but even in these cases, the performance is unlikely to be significantly worse than a straightforward dynamic link. There is one problem with the Spring linking strategy. A commonly used option "-dc" of the SunOS static linker causes space to be allocated in the main program for undefined external data symbols. Although for the standard SunOS dynamic linker this is a useful optimization, for our dynamic linker it is not. However, it is not normally a prob- lem, except for two symbols "_errno" and "_iob", both of which tend to be referenced in libraries as well as in the main pro- gram. Even for these symbols there is no real problem for li- braries that are compiled PIC, since the compiler and static linker ensure that there is only a single reference to each of these symbols in each PIC library. However we wanted to be able to build our principal libraries non-PIC so as to avoid the space and time penalty caused by PIC. In a better world, we might re- link all of the SunOS programs without the "-dc" option. However, one of the goals of the Spring UNIX subsystem is to run standard SunOS binaries unchanged. Thus, we decided to compile PIC those affected libue modules. 5.5. Paging Behavior The Spring dynamic linking strategy relies heavily on cheap copying of virtual memory segments using a copy-on-write virtual memory implementa- tion where the physical pages are initially shared between the "copies" and are only really copied when a write is performed on one of the two memory segments. A library is initially copied from a disk file to a provisionally fixed-up library and is then copied a second time to become part of a fully fixed-up image. Normally only these two copies are necessary, but if the linker image is being debugged a third copy will occur to generate a writable program image. Additional copies may occur due to UNIX emulation forks. Fortunately, the Spring VM system has no prob- lems coping with multiple levels of copy-on-write sharing [6]. Both Spring and SunOS attempt to share dynamic library pages between different images. However, there is one key difference. SunOS copies the library first and then performs the fix-ups in- dependently in each process that uses the library. This means that it is important to minimize the number of pages that are mu- tated by fix-ups, as these pages will not be shared. The Spring dynamic linker, on the other hand, tries to perform most fix-ups early and then copies the fixed-up library. This is possible be- cause all users of the library will load it at the same virtual memory address. Now, if Spring has to undo and redo its fix-ups, then we gain nothing. However, if the provisional fix-ups prove durable, then we have potentially increased the number of physi- cal pages that can be shared between users of the library. At the very least, Spring's strategy of caching entire fixed-up images will increase the amount of sharing between two processes running the same image, as they can both benefit from the same fixed-up pages. 5.6. Spring Linker Domain The actual dynamic linking of images is done by a separate linker domain. This domain imple- ments a linker object that it exports in a well-known place in the Spring naming service. When a new program is to be run on Spring, the parent domain invokes the link operation on a linker object. The arguments to the link operation include a memory ob- ject that represents the program image and a naming context to use to find shared libraries; this naming context is analogous to the loader library path in SunOS. After the linker domain links the image, it returns a list of pairs. The parent domain then maps these memory objects at the given ad- dresses in a new domain and starts the new domain executing. The linker domain implements the caching of fully linked programs and partially fixed-up libraries. When the linker domain is asked to link a program it checks its image cache. If the memory_object that is being linked is the same as a previously linked image and the context that is being used to find libraries is the same, then the linker domain will consider this a cache hit. The linker can determine if two memory objects or contexts are equivalent by following a secure object equivalence protocol. This protocol re- quires two cross-domain calls for each equivalence check. The linker domain strategy for determining if there is a hit on the image cache will be correct as long as the contents of the con- text used to find libraries does not change. If new libraries are bound into the context used for linking, then the linker may get a false cache hit. If new libraries are bound into the name space then the linker cache must be flushed to take advantage of these libraries. Since installing new shared libraries should be a fairly uncommon occurrence, we believe that our solution is rea- sonable. However, in order to eliminate the need for linker cache flushing, we plan to implement the more aggressive solution to cache coherency described in Section 8.3. In addition to the im- age cache, the linker domain also maintains a cache of partially fixed-up libraries. When the image cache is missed, the image must be linked against a set of shared libraries. For each shared library, the linker domain uses the context given to the link operation to resolve the name of the library to a memory object. The linker then checks to see if the memory object is equivalent to the memory object for any cached library. If it is then the linker can use the already cached library. 6. Performance This Section provides measurements of dynamic linking performance on Spring and SunOS 4.1.3. These measurements were taken on a SPARCstation 10 model 31 with a 36 Mhz CPU and 64 Megabytes of memory. The SunOS numbers include all of the linking that occurs before the main function is called. Thus, they include some fix- ups and the library mapping costs but do not include any on- demand fix-ups. The Spring numbers include the cost of fully fixing-up the shared libraries and the main program, but they do not include the library mapping cost. Since the mapping cost on SunOS and Spring is small, these mapping costs are not signifi- cant. 6.1. Linking Against a Single Library Table 3 shows the time it takes to link a null main program on Spring and SunOS, and Table 4 shows the cost of linking some Spring programs on Spring and SunOS. These measurements show several things. First, image caching is very effective in providing low dynamic linking cost. If a program image is cached, then Spring can start-up the program in 12 milliseconds regardless of the number of reloca- tions. However, SunOS, which does not have image caching, takes from 22 to 1376 milliseconds to link a program depending on the number of relocations in the library. These measurements also show the effectiveness of library caching. Library caching allows the program to be linked up to four times faster than without caching. In fact, with library caching, the effect of a large number of relocations on linking performance is reduced. For ex- ample, even though a non-PIC version of libspring has five times as many relocations as a PIC version, it only takes 20 to 50 per- cent longer to link against a cached non-PIC version of the li- brary. This is much better than SunOS which takes over five times longer to link against a non-PIC version of libspring than it does to link against the PIC version. Library Library Compiled- PIC? SunOS Spring Program Cached Spring Library Cached Spring Nothing Cached libspring Yes 182 ms 12 ms 90 ms 185 ms libspring No 1008 ms 12 ms 136 ms 576 ms libc Yes 22 ms 12 ms 65 ms 73 ms libc No 197 ms 12 ms 87 ms 235 ms TABLE 3. Linking A NULL Program In general, even without any caching, Spring is able to link programs as fast or faster than SunOS. In fact when the number of relocatable symbols gets large, Spring can dynamically link twice as fast as SunOS even when neither the image nor the library is cached. Spring's better performance is due to the fact that the Spring dynamic linker caches and reuses symbol resolu- tions during the fix-up phase. There is one case where without caching Spring is much slower than SunOS: when the null-program is linked against a PIC version of libc. The poor performance on Spring is because Spring has to perform all relocations before the program is run and SunOS does not. When libraries such as libc are compiled PIC, the SunOS idea of performing fix-ups of procedure references on demand appears to be very effective. How- ever, for libraries such as libspring where most relocations are data relocations that cannot be performed lazily, the SunOS op- timization is not very effective. Program Program PIC? lib- springPIC? SunOS Program Cached Library Cached Nothing Cached ssh Yes Yes 209 ms 12 ms 134 ms 221 ms ssh Yes No 1147 ms 12 ms 166 ms 588 ms ssh No Yes 291 ms 12 ms 174 ms 272 ms ssh No No 1286 ms 12 ms 212 ms 643 ms caching_fs Yes Yes 239 ms 12 ms 158 ms 252 ms caching_fs Yes No 1212 ms 12 ms 194 ms 615 ms caching_fs No Yes 371 ms 12 ms 240 ms 335 ms caching_fs No No 1376 ms 12 ms 280 ms 712 ms TABLE 4. Linking Spring Pro- grams on Spring These measurements provide a good indication of the effectiveness of PIC. Without caching, PIC is very effective in reducing program start-up time. Compiling a library PIC is much more important than compiling a program PIC. The difference in the dynamic linking cost for PIC and non-PIC versions of a program is at most 50 percent. However, compiling a library PIC can reduce the dynamic linking cost by from three to eight times over non-PIC. With image caching, PIC is completely unnecessary. In addition, library caching reduces the necessity for PIC. Thus, the need for PIC will depend on a combination of the hit rate of the image cache and the hit rate of the library cache. Our experience with Spring so far is that because of the effectiveness of our image and library caches, we have no need for PIC. 6.2. Multiple Li- braries The previous measurements were of the cost of dynamically linking a program against one library. Table 5 shows the cost of dynamically linking a null program against the set of X libraries libXaw, libXmu, libXt, and libX11 and either libc or libue. Table 5 shows several interesting things. First, as expected, both im- age caching and PIC are very effective in improving performance. Second, without image caching, Spring is much worse than SunOS when the program and the X libraries are linked against libc. Once again, the poor performance on Spring is because Spring has to perform all relocations before the program can run and SunOS does not. Since each relocation must be checked against many li- braries, the dynamic linking cost is very high (much higher than the cost shown in Table 3 of linking a null program against libc). Library Library CompiledPIC? SunOS Spring Program Cached Spring Library Cached Spring Nothing Cached libue Yes 486 ms 13 ms 424 ms 645 ms libue No 3111 ms 13 ms 473 ms 1017 ms libc Yes 102 ms 14 ms 416 ms 516 ms libc No 387 ms 13 ms 431 ms 684 ms TABLE 5. Linking Against Multiple X Libraries These meas- urements show the potential for the library chaining optimization described in Section 8.1 and the unified symbol table idea described in Section 8.2. We hope these optimizations would allow Spring to perform equally well regardless of the number of li- braries that are being dynamically linked. 7. Relevance to Oth- er Operating Systems In Spring, it was natural to implement the dynamic linker cache as a user-level service. Spring's microker- nel organization meant that image start-up was already handled by user-level libraries, and our focus on objects meant that system resources, such as memory objects, could be conveniently passed to and from a user-level service. However the basic caching techniques we have described do not require a user-level server. A UNIX system might implement an equivalent dynamic library cache as an operating system service that is used by the exec system call. This should provide essentially all the same benefits as our user-level service. 8. Future Work Everything described in this paper has been implemented. In this section, we will describe three potential improvements to our linking system. 8.1. Chaining Some programs are linked with many shared li- braries. For example, the X programs such as xterm and xclock are linked with five shared libraries. When a program with many shared libraries is linked in Spring, the linker domain will fix-up each library in turn each time different programs that use the same set of libraries are linked. One optimization that we could implement is to cache fixed-up chains of libraries. Thus, if two programs are linked that use the same set of libraries in the same order, then the second program could use the cached set of libraries that were fixed-up for the first program. As an ex- ample of the chaining optimization, consider the two programs progA and progB created in the following manner: ld -o progA progA.o X.so Y.so Z.so ld -o progB progB.o X.so Y.so Z.so If pro- gA is linked, the linker domain could cache the fixed-up chain of libraries . In this cached chain, X.so will al- ready be fixed-up against Y.so and Z.so, and Y.so will already be fixed-up against Z.so. When progB is linked, the linker domain will notice that it uses the already cached chain of libraries . Thus when progB is linked many of the linking steps can be avoided. 8.2. A Unified Symbol Table Currently, we maintain a separate symbol table for each library in our cache. However, a possible alternative would be to maintain a unified symbol table for all the libraries in the cache. This unified symbol table would map each symbol to the set of libraries that implement it (in practice the vast majority of symbols are only defined by a single library). Whenever we added a library to the library cache, we would add its external symbol definitions to the unified symbol table. This unified symbol table could poten- tially accelerate the final linking stage for images that use large numbers of shared libraries. Currently, we have to search through each of the libraries' symbol tables in turn to resolve undefined symbols and to validate that a given symbol definition has not been overridden. By doing a single lookup in the unified symbol table, we would be able to discover which libraries define that symbol and determine relatively quickly which (if any) of those library definitions to use for the current image. 8.3. Aggressive Coherency Our current mechanism, described in Section 5.6, for determining a hit in the image cache requires cache flushes when libraries change. We plan to be much more aggressive about keeping the image and library caches coherent with respect to changes in libraries. We intend to implement a fully coherent linker cache by using a combination of our naming and file cach- ing strategies [8]. 9. Conclusion The Spring dynamic linker uses caching very effectively to improve performance. When a dynamically-linked program image is cached, then the program can start up nearly as fast as a statically-linked program. The only difference is that there are more memory objects to map for dynamically-linked programs. When libraries are cached, uncached programs can still start up quickly regardless of the number of relocations in a library. In fact, in the cases that we have measured on Spring, PIC is not necessary for good dynamic linking performance. Current UNIX dynamic linking implementations were designed for languages such as C where the use of PIC can reduce the dynamic linking costs to an acceptable level. However, object-oriented languages such as C++, with features such as vir- tual inheritance, produce images where technologies such as PIC are no longer sufficient to provide acceptable dynamic linking costs. In this paper, we have presented a new technology, caching of fixed-up program images and shared libraries, that provides efficient dynamic linking of programs written in these new object-oriented languages. We believe that new operating system techniques such as image and library caching, or new compiler techniques, are going to be required in the future in order to ensure acceptable dynamic linking performance. Acknowledgments We would like to thank Yousef Khalidi and Peter Madany for pro- viding valuable feedback on this paper. We would also like to thank Yousef for originally suggesting this line of inquiry. Fi- nally, we would like to acknowledge the efforts of all the Spring contributors whose work made this project a success. References [1] Arnold, Q., "Shared Libraries on UNIX System V," USENIX Sum- mer Conference, 1986, pp. 395-404. [2] Gingell, R. A., Lee, M., Dang, X. T., and Weeks, M. S.,"Shared Libraries in SunOS," USENIX Summer Conference, 1987, pp. 131-145. [3] Hamilton, G. and Kougiouris, K., "The Spring nucleus: A microkernel for objects," USENIX Summer Conference, June 1993. [4] Kempf, J. and Kessler, P. B., "Cross-Address Space Dynamic Linking," Sun Microsystems Laboratories, Technical Report TR-92-2. [5] Khalidi, Y. A. and Nelson, M. N., "An Implementation of UNIX on an Object-oriented Operating System," USENIX Winter Conference, January 1993. [6] Khalidi, Y. A. and Nelson, M. N., "The Spring Virtual Memory Sys- tem," Sun Microsystems Laboratories, TR-93-9. [7] Murphy, D. L., "Storage Organization and Management in TENEX," Proceedings of the Fall Joint Computer Conference, AFIPS, 1972, pp. 23-32. [8] Nelson, M. N., Hamilton, G., and Khalidi, Y. A., "A Framework for Caching in an Object Oriented System," Submitted for publication. [9] Organick, E. I., The Multics System: An Examination of Its Structure, MIT Press, 1972. [10] Sabatella, M., "Issues in Shared Library Design," USENIX Summer Conference, 1990, pp. 11- 23. [11] System V Application Binary Interface, Prentice-Hall, Englewood Cliffs, NJ, 1990. Author Information Michael N. Nelson is currently a Senior Staff Engineer at Sun Microsystems Labora- tories. Before joining Sun he was one of the principal developers of the Sprite Operating System at Berkeley and worked at DEC Western Research Laboratory. His interests include distributed, object-oriented software, operating systems, and architecture. He has a Ph.D. in Computer Science from UC Berkeley. He can be reached at Sun Microsystems Laboratories, Inc., 2550 Garcia Ave., MTV 29-112, Mountain View, CA, 94043, USA, or via e-mail at michael.nelson@sun.com. Graham Hamilton is a Senior Staff En- gineer at Sun Microsystems Laboratories, where he is the project lead for the Spring operating system project. His interests in- clude distributed computing, object-oriented systems, and operat- ing systems. He has a Ph.D. in Computer Science from the Univer- sity of Cambridge. Trademarks Sun, Sun Microsystems, SunOS, and SPARCstation are trademarks or registered trademarks of Sun Mi- crosystems, Inc. UNIX is a registered trademark of UNIX System Laboratories, Inc. All SPARC trademarks are trademarks or re- gistered trademarks of SPARC International, Inc. All other pro- duct names mentioned herein are the trademarks of their respec- tive owners.