################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
      Proceedings of the USENIX Summer 1993 Technical Conference
		  Cincinnati, Ohio  June 21-25, 1993


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


        High Performance Dynamic Linking Through Caching
              Michael N. Nelson     Graham Hamilton
               Sun Microsystems Laboratories, Inc.
                   Mountain View, CA 94043 USA

                            Abstract

The Spring Operating System  provides  high  performance  dynamic
linking  of  program images. Spring uses caching of both fixed-up
program images and partially fixed-up shared  libraries  to  make
dynamic  linking  of program images efficient, to reduce the need
for PIC (position-independent code), and to improve page  sharing
between  different program images running the same libraries. The
result is that with  program  image  caching,  dynamically-linked
programs  have  a start-up cost similar to statically-linked pro-
grams  regardless  of  the  number  of  unresolved   symbols   in
dynamically-linked  program images and shared libraries. In addi-
tion, with library and program image caching, we have reduced the
need for PIC and have increased page sharing.

    FIGURE 1.       Major system components of a Spring node

1.  Introduction

Dynamic linking of shared libraries is used in most  modern  UNIX
systems.  These systems include, among others, SunOS [2] and SVR4
[11]. Dynamic linking has several  good  properties:          Un-
resolved  symbols  in  a program do not have to be resolved until
the program is launched. This decreases the time to build an  im-
age.          Shared libraries do not have to be statically bound
into a program image thus saving disk space.

        Shared libraries can be shared between many executing im-
ages thus reducing memory costs.

        Program images can take  advantage  of  new  releases  of
shared libraries without being rebuilt.

Unfortunately, by deferring the linking of  programs  to  program
start-up  time,  and  possibly  even run time, dynamically-linked
programs start slower than statically-linked programs.  This  de-
gradation  was  believed  to be an acceptable cost to pay for the
advantages of  dynamic  linking.  However,  with  the  advent  of
languages  such  as  C++,  the  number  of  symbols  that must be
resolved at run time is dramatically increasing  thus  increasing
the  start-up time of dynamically-linked programs to unacceptable
levels.

This paper describes the dynamic linking architecture and  imple-
mentation  in Spring, a new distributed object-oriented operating
system. We had six main goals for the Spring dynamic linker:

        Substantially reduce the start-up time  for  dynamically-
linked images.

        Maintain a high degree of page sharing between  different
images running the same libraries.

        Provide a dynamic linking  solution  that  performs  well
even when the number of relocatable symbols increases dramatical-
ly.

        Run SunOS binaries on Spring.

        Support existing SunOS linkage semantics.

        Have non-PIC (position-independent  code)  libraries  and
main  programs  without  significantly impacting program start-up
time or page sharing.

The  Spring  dynamic  linking  system  described  in  this  paper
achieves all six goals by caching fully linked program images and
partially fixed-up shared libraries. After  a  dynamically-linked
program is run once, future instances of the program will require
no dynamic linking, and once a shared library is linked  for  one
program,  subsequent  programs  that  use the same shared library
will link much more quickly.

The rest of this paper is organized as follows:  Section  2  pro-
vides  an  overview  of  the  Spring  Operating System; Section 3
presents related work; Section  4  describes  the  SunOS  dynamic
linking  implementation;  Section  5 describes the Spring dynamic
linking implementation; Section 6 gives some performance measure-
ments;  Section  7  looks  at the relevance of this work to other
operating systems; Section 8 proposes future work; and Section  9
offers some conclusions.

2.  Spring Operating System

Spring is a distributed, multi-threaded operating system that  is
structured  around  the  notion of objects. A Spring object is an
abstraction that contains state and provides a set of  operations
to  manipulate  that  state. The description of an object and its
operations is an interface that is specified in an interface  de-
finition  language.  The  interface  is a strongly-typed contract
between the server (implementor) and the client  of  the  object.
Since Spring is object-oriented, it supports the notion of inter-
face inheritance. An interface that accepts an object of type foo
will also accept a subclass of foo.

 A Spring domain  is  an  address  space  with  a  collection  of
threads. A given domain may act as the server of some objects and
the client of other objects. The server and the client can be  in
the  same domain or in different domains. In the latter case, the
representation of the object includes an unforgeable nucleus door
identifier that identifies the server domain [3].

The  Spring  kernel  supports  basic  cross-domain   invocations,
threads,  and  provides  basic  virtual memory support for memory
mapping and physical memory management [6]. A typical Spring node
runs several servers in addition to the kernel as shown in Figure
1. All services are exported via objects defined using the inter-
face  definition  language. In general, any collection of servers
may reside in the same or in different domains. The  decision  on
where to run a particular server is made for administration, pro-
tection, and performance reasons, and is independent of  the  in-
terface of the service.

2.1.  UNIX Subsystem

Spring can run most dynamically-linked  SunOS  binaries  using  a
UNIX subsystem [5]. This includes basic UNIX programs such as cat
and csh and more sophisticated programs such as  openwindows  and
most  X applications. The UNIX subsystem consists of two parts: a
UNIX process server that is responsible for managing process ids,
ptys,  sockets,  and  signals between processes, and a shared li-
brary libue.so that contains code  to  emulate  all  UNIX  system
calls.  When  a SunOS dynamically-linked binary is run on Spring,
the libc shared library that is  used  to  dynamically  link  the
binary is actually a copy of libue.so.

3.  Related Work

Dynamic linking was first used in the Multics [9] and  TENEX  [7]
operating  systems.  However,  it was not used in mainstream UNIX
systems until System V [1] and SunOS [2]  added  dynamic  linking
support. More recently, SVR4 [11] and HP-UX [10] provided dynamic
linking support similar to the SunOS  dynamic  linking  architec-
ture. We will describe the SunOS implementation in detail in Sec-
tion 4.

The notion of caching previously run programs has  been  used  in
past  UNIX  systems. Early versions of UNIX had the notion of the
"sticky" bit. If this bit was set, then the program would  remain
cached  in memory after it exited. The next time that the program
was run it would not have to be loaded into memory.  However,  to
the  best  of our knowledge, this same strategy of caching previ-
ously run programs has not been used for dynamically-linked  pro-
grams.  Thus, every time that a dynamically-linked program is run
it must be relinked from scratch.

4.  SunOS Background

 In this section we will examine dynamic  linking  on  SunOS.  We
will  first  examine  the notion of position-independent code and
its effectiveness on C++ code, then we  will  examine  how  SunOS
performs  dynamic  linking,  and finally we will examine some li-
braries and programs to get some insight into what  needs  to  be
done to perform efficient dynamic linking on Spring.

4.1.  PIC (Position-independent Code)

The Sun compilers support two flavors of output  code:  position-
dependent  code and position-independent code. Position-dependent
code is the default.  In  position-dependent  code  the  compiler
makes  no  attempt to minimize the number of fix-ups that must be
performed at dynamic link time. Typically one relocation will  be
required for each source code reference to an external symbol.

In PIC, the compiler and the static linker conspire to attempt to
minimize  the  number  of  relocations  that must be performed at
dynamic link time by grouping as many external references as pos-
sible  into  tables,  so that there is a single reference to each
symbol. The code that uses these symbols  then  indirects  (using
PC-relative addressing) through these tables. For example, rather
than performing a direct procedure call, PIC will load an address
from an indirection table and jump through that address.

PIC has two advantages. First, it reduces the number  of  dynamic
linkage relocations that need to be performed since a widely used
procedure still only ends up with one table entry that  needs  to
be  relocated.  Second,  it also reduces the number of pages that
will be updated by the relocations, since the table  entries  are
clustered  closely  together. Reducing the number of page updates
is very important as this maximizes the number of physical  pages
that can be shared between different users of the library.

4.2.  PIC and C++

PIC successfully minimizes  direct-instruction-stream  references
to relocatable symbols by adding a level of indirection. However,
this approach cannot be applied to references from program  data.
If  a given memory location is supposed to contain a pointer to a
given relocatable symbol, then the compiler cannot simply  add  a
level  of indirection. This is not normally a significant problem
for C programs as it is relatively rare for C data structures  to
be declared with pointers to procedures or other data.

The situation with program data references is rather different in
C++.  Both  the cfront 3.0 preprocessor and the G++ compiler gen-
erate virtual function tables  as  initialized  data  structures.
These  virtual function tables are full of references to relocat-
able symbols. Thus, even when using PIC, a typical C++ program or
library will contain a much higher number of symbolic relocations
than a similar C program (see Section 4.4).

Although PIC reduces the  number  of  symbolic  resolutions  over
non-PIC, PIC does tend to be larger than non-PIC due to the addi-
tion of extra levels of indirection. With the G++  compiler,  use
of  PIC increases the size of C++ texts between 3 and 20 percent,
with  roughly  similar  impacts  on  execution  performance.  The
overall  size  of  the  core  Spring library libspring.so that is
written in C++ increases by 13 percent when compiled PIC with the
G++  compiler. While this size increase is an acceptable price to
pay for the benefits of shared dynamic libraries, it  is  not  an
insignificant cost. We were interested in determining if we could
reap the benefits of shared dynamic libraries without paying this
particular space and time tax.

Note that the increase in code size due to PIC is  fairly  sensi-
tive  to particular details of compiler technology and CPU archi-
tecture. Measurements of systems with other compilers  may  yield
different results.

4.3.  SunOS Dynamic Linking

In this section we will briefly describe the SunOS dynamic  link-
er. For more details on the SunOS dynamic linker see [2].

Each a.out file contains as part of its text segment  two  tables
used  for  dynamic  linking. The first table is the symbol table.
This contains a description of each symbol that is either defined
or referenced by the a.out file. There is only a single entry for
each symbol, even if it is both defined and referenced. The  sym-
bol table is indexed by an associated hash table that facilitates
the mapping from a linker symbol to a particular symbol table en-
try.

The second table in each a.out file is the relocation table. This
table  contains  a description of each location in the a.out file
that needs to be patched up at  dynamic  link  time.  There  are,
broadly  speaking,  two kinds of relocations. The first kind does
not require symbol resolution. This category basically covers re-
locations  that  are purely a matter of patching up code and data
to reflect the addresses they  were  loaded  at  in  memory.  The
second  kind  of  relocations  are  symbolic relocations that are
dependent on the final values of some particular linkage  symbol.
Each  of these relocations contains a reference (actually a table
index) to the symbol table  entry  describing  the  corresponding
symbol.

A runnable image is composed from a set of a.out  files,  one  of
which  is  a  main  program  and the rest of which are shared li-
braries. At dynamic link time, each of the  relocations  is  pro-
cessed  in turn. Non-symbolic relocations are straightforward and
can be performed solely on the basis of the address at which  the
a.out  file has been mapped into memory. Symbolic relocations are
more troublesome. For each symbolic relocation, the dynamic link-
er  finds  the symbol table entry associated with the relocation.
This now gives the linker the name of the symbol  that  is  being
used  for  the  relocation.  It  now searches through each of the
a.out files in turn, searching for the first a.out file that con-
tains a definition of the symbol. The a.out files are searched in
a standard order, starting with the main program  and  then  with
the libraries in the order that they were specified on the static
link command line, left to right. Thus for example, if one linked
an image:

ld -o foo foo.o /lib/fred.so /lib/libc.so

then when resolving a symbolic relocation, foo would be  searched
first followed by fred.so and libc.so.

When a definition of the given symbol is found in a  given  a.out
file,  the linker combines the symbol table entry and the address
at which the a.out file is loaded to calculate the  absolute  ad-
dress to which the symbol refers. That value is then used to per-
form the relocation.

4.4.  Looking at Libraries and Programs

Table 1 contains some statistics on  several  typical  libraries.
Some  of  these  libraries are compiled PIC and the remainder are
compiled non-PIC. The column labeled  "defined  symbols"  is  the
number  of symbols defined within the library, and the column la-
beled "undefined symbols" is the  number  of  additional  symbols
that  the  library references but does not define. The table also
includes the total number of relocations for  each  library,  the
number  of  these  that were symbolic relocations, and the number
that were references to undefined symbols. The liblinker  library
is  a  Spring library that contains the Spring linker implementa-
tion; the libns library is a Spring  library  that  contains  the
generic  Spring nameserver implementation; the libue library con-
tains Spring UNIX  subsystem  code;  libspring  is  the  standard
Spring  library  that is linked with all Spring applications; and
libc is the standard SunOS library that is linked with SunOS  ap-
plications.   Shared  Library  Source Language Compiled PIC?  De-
fined symbols Undefined symbols Total relocations Symbolic  relo-
cations  Undefined  relocations liblinker C++ Yes 179 243 659 313
231 liblinker C++ No 187 231 2699 1691 1587 libns C++ Yes 498 416
1392  885 621 libns C++ No 513 398 7497 5763 5287 libue C++ and C
Yes 10162 5 10330 4980 5 libue C++ and C No 10175 5 41479 28168 5
libspring  C++  Yes  8772  0  7622 4073 0 libspring C++ No 8780 0
38652 26948 0 libc C Yes 1048 6 1211 500 6 libc C No 1041 6 10424
4532 39 TABLE 1.        Library Statistics

Table 1 shows several interesting things. First, PIC is very good
at  eliminating  relocations; it reduced them by a factor of five
for the core Spring library libspring. Second, the Spring library
libspring,  which  is  written in C++, is much larger in terms of
relocations and defined symbols than the  SunOS  libc,  which  is
written  in  C; libspring compiled PIC contains six times as many
relocations and eight times as many defined symbols as libc  com-
piled PIC. Finally, although there are a large number of symbolic
relocations for libspring  when  it  is  complied  non-PIC  (over
26,000),  they  are  all to symbols that are defined in libspring
itself. Now some small number of these symbols may  be  redefined
in other a.out files in a linked image, but this is comparatively
rare. Almost all the relocations defined in libspring will end up
being satisfied from libspring itself.

Table 2 contains statistics for some main programs. Note that for
main  programs all remaining relocations are symbolic relocations
to undefined symbols. The top six programs are  programs  written
in  C  and  compiled  for  SunOS, and the bottom six are programs
written in C++ and compiled for Spring. All libraries used by the
programs are shared libraries.

Program Source Language CompiledPIC?  Defined  Symbols  Undefined
Symbols Total Relocations csh C Yes 722 83 83 csh C No 721 83 106
xterm C Yes 310 196 379 xterm C No 309 196 405 tar C Yes  112  66
66 tar C No 111 66 210 ssh C++ Yes 109 321 819 ssh C++ No 117 277
2015 caching_fs C++ Yes 670 444 524 caching_fs  C++  No  696  416
4379  machine_ns  C++  Yes 21 126 81 machine_ns C++ No 20 126 307
TABLE 2.        Program Statistics

Table 2 shows two things. First, it once again shows  the  effec-
tiveness  of PIC in reducing relocations. Second, Spring programs
have many more relocations than  SunOS  programs  even  when  the
Spring  programs  are  compiled PIC. For example, ssh defines the
same number of symbols as tar yet ssh has over ten times the  re-
locations.   The  results from the measurements of the Spring li-
braries and programs show that Spring has to deal with many  more
relocations  than  SunOS.  Thus,  Spring  needs a dynamic linking
solution that can efficiently link programs  and  libraries  with
large  numbers of relocations.  4.5.  Comments on the SunOS algo-
rithm Note that the non-symbolic relocations for  a  given  a.out
file  are  independent  of which other a.out files are bound into
the program image. These relocations only depend on  the  address
at  which the library is loaded. The main cost of dynamic linking
is resolving the symbolic  relocations  and,  in  particular,  in
searching through the libraries for symbol definitions.

In typical small programs, almost all the definitions  will  come
from  the standard OS library (e.g., libc on UNIX or libspring on
Spring), but for each symbolic relocation it is  first  necessary
to  search  the  symbol  table  of the main program and any other
minor libraries. This property was somewhat mitigated in an  ear-
lier  Spring dynamic linker [4] by the use of a cache of resolved
symbols, but even so, it was by far the dominant component of the
linkage.   For  most  programs, the dynamic libraries used by the
program contain substantially more relocations and symbol defini-
tions than the program itself. This is true both for conventional
UNIX utilities such as csh, awk or cpp, and also for window-based
tools  such as xterm and cmdtool. Clearly there are also programs
where the program contains more relocations and symbols than  the
libraries,  but this is not the case for most of the programs for
which fast start-up is important. Thus, we were particularly  in-
terested  in  minimizing  the costs of fixing-up the libraries as
opposed to fixing-up the program itself.  In order to reduce  the
cost  of program start-up, the SunOS dynamic linker performs cer-
tain fix-ups only "on demand." This optimization is available for
PIC  by  filling  the  indirection table for procedure references
with pointers to a special fix-up routine. While  this  optimiza-
tion  is  useful for SunOS, it is much less useful for Spring li-
braries as most of their relocations are data relocations,  which
cannot be performed lazily. We were therefore prepared to abandon
this optimization, provided it  did  not  cause  any  significant
penalty during program start-up.

5.  The Spring Dynamic Linker

In the last section, we showed that Spring shared  libraries  and
programs  have  substantially  more relocations than SunOS shared
libraries and programs. For  this  reason,  Spring  needs  a  new
dynamic  linking implementation that will allow us to efficiently
start dynamically-linked programs on Spring. In this section,  we
will  describe  the  Spring  dynamic  linking mechanism that uses
caching to provide excellent dynamic linking performance.

5.1.  Spring Linking Model

Spring uses a different dynamic  linking  model  than  SunOS.  In
SunOS, a program image dynamically links itself as it begins exe-
cuting. This is done through start-up  code  that  is  statically
linked  into  each  SunOS  binary.  In Spring, program images are
linked by a parent domain and then  the  fully  linked  image  is
started.  This model allows us to do the caching of linked images
and libraries.

The Spring model has the disadvantage that the entire image  must
be  linked  at once, rather than on demand, as in SunOS. However,
because of the effectiveness of caching in Spring,  we  pay  very
little penalty for our policy.

The actual linking of program images is done by a special  linker
domain  described  in Section 5.6. This domain not only links im-
ages but it is also responsible for caching linked images and li-
braries.

5.2.  Motivation

An early version of the Spring dynamic  linker  is  described  in
[4]. The new Spring dynamic linker described in this section fol-
lows the same basic linking model as the old Spring dynamic link-
er. However, the old dynamic linker had serious performance prob-
lems because of the large number of symbols that it had to  relo-
cate.  When the original linker was built, there were over 50,000
relocations in the main Spring library libspring even when it was
compiled  PIC.  Our first attempt at improving performance was to
reduce the number of relocations. Through  many  different  tech-
niques, we have managed to drastically reduce the number of relo-
cations in libspring to less than 8,000 when compiled PIC. Unfor-
tunately,  the original dynamic linker still did not have reason-
able performance and the solution would not  scale.   Our  second
attempt at improving performance was to use the caching mechanism
described in this section. We wanted to have a solution that pro-
vided  good  performance and would scale if we decided not to use
PIC or we once again found ourselves in the  situation  where  we
had  a  huge number of relocations.  5.3.   Spring Dynamic Linker
The key feature of the Spring dynamic linker design  is  that  it
facilitates  caching  at  several  different  levels. Rather than
dynamically linking every image individually, we attempt to cache
entire  fixed-up  images. When we miss in the image cache, rather
than linking the entire image from scratch, we attempt  to  reuse
existing  cached  libraries that are already provisionally fixed-
up.  The linker domain has a range of addresses that  it  regards
as  suitable  for dynamic libraries. Whenever the linker is asked
to operate on a new dynamic library,  it  allocates  the  library
space within this range at a location that will not conflict with
any currently cached library. It will later arrange that the  li-
brary is indeed loaded at this location in all dynamically-linked
images.  Once the linker has allocated an address range for a li-
brary,  the linker then provisionally fixes-up the library to re-
side at this location. It does this  by  making  a  copy-on-write
copy of the library and then provisionally processing the list of
relocations. All the non-symbolic relocations are  performed  im-
mediately. For each of the symbolic relocations, if the symbol is
defined in the library, a  provisional  relocation  is  performed
based  on  the library's own definition.  When the dynamic linker
is asked to fix-up an image, it finds the transitive closure  set
of  the  libraries  needed by that program. For each of these li-
braries it  obtains  a  copy-on-write  copy  of  a  provisionally
fixed-up version of the library. The dynamic linker then performs
a final fix-up pass on the main program and the shared libraries.
All the relocations in the main program will be to symbolic relo-
cations to  undefined  symbols,  as  all  other  relocations  are
fixed-up  when  the  main  program is statically linked. For each
such relocation, the linker searches through  the  libraries  for
the  first  suitable symbol definition and uses that symbol's ad-
dress to perform the relocation.  Fix-ups  within  the  libraries
are more interesting. The dynamic linker has already fixed-up all
the non-symbolic relocations in the library and has provisionally
fixed-up  as  many symbolic relocations as possible. However, the
linker must now validate the symbolic relocations and perform any
remaining  ones.  For  each  library,  the  linker  compares  the
library's symbol table against the symbol tables for  the  linked
image and for each library that occurs earlier in the search ord-
er. If a symbol that is defined in the library  is  also  defined
elsewhere,  the linker redefines the library's symbol table entry
to say that the symbol has been overridden. The linker then scans
through  each  library's relocation table looking for relocations
using undefined or overridden symbols. If the  relocation  points
to  an  undefined  symbol, then the linker can perform a standard
relocation using the newly found symbol definition.  However,  if
the  relocation  points  to an overridden symbol, the linker must
first undo the effects of the pro 5.4.   Assumptions  and  Conse-
quences  The  Spring dynamic linking strategy has a couple of as-
sumptions and some resulting consequences. First, it assumes that
the  typical program defines a relatively small number of symbols
compared with the number of symbols defined and referenced in the
standard  libraries. Thus, it is assumed that it is significantly
cheaper to do a validation pass over a library to check  that  no
symbol  definitions  have been overridden than it is to perform a
full scale dynamic link of that library.  This  assumption  would
not  be  true  if both a main program and a library defined large
numbers of symbols (making the validation expensive), but the li-
brary had only a small number of relocations (meaning that a con-
ventional dynamic link would be cheap).  The second assumption of
the Spring dynamic linking strategy is that it is fairly rare for
a main program to redefine a symbol that is defined in a library.
If  it  were common for programs to redefine large numbers of li-
brary symbols, then many of our provisional fix-ups would be  er-
roneous  and the cost of undoing and redoing them may be more ex-
pensive than a straightforward dynamic  link.  Additionally,  the
page  containing  the  redone  fix-up will no longer be shareable
with other processes that use the  same  dynamic  library.   For-
tunately, for the bulk of the programs we have looked at, both of
these assumptions appear to be true. It is clearly the case  that
these  assumptions will not be true for all programs, but even in
these cases, the performance  is  unlikely  to  be  significantly
worse  than a straightforward dynamic link.  There is one problem
with the Spring linking strategy. A commonly used option "-dc" of
the  SunOS static linker causes space to be allocated in the main
program for undefined external data  symbols.  Although  for  the
standard  SunOS dynamic linker this is a useful optimization, for
our dynamic linker it is not. However, it is not normally a prob-
lem,  except  for  two symbols "_errno" and "_iob", both of which
tend to be referenced in libraries as well as in  the  main  pro-
gram.  Even  for  these  symbols there is no real problem for li-
braries that are compiled PIC,  since  the  compiler  and  static
linker  ensure  that  there is only a single reference to each of
these symbols in each PIC library. However we wanted to  be  able
to build our principal libraries non-PIC so as to avoid the space
and time penalty caused by PIC.  In a better world, we might  re-
link all of the SunOS programs without the "-dc" option. However,
one of the goals of the Spring UNIX subsystem is to run  standard
SunOS  binaries  unchanged. Thus, we decided to compile PIC those
affected libue modules. 5.5.  Paging Behavior The Spring  dynamic
linking  strategy  relies  heavily  on  cheap  copying of virtual
memory segments using a copy-on-write virtual memory  implementa-
tion  where  the  physical pages are initially shared between the
"copies" and are only really copied when a write is performed  on
one  of  the  two memory segments.  A library is initially copied
from a disk file to a provisionally fixed-up library and is  then
copied  a  second  time to become part of a fully fixed-up image.
Normally only these two copies are necessary, but if  the  linker
image  is  being  debugged  a third copy will occur to generate a
writable program image. Additional copies may occur due  to  UNIX
emulation  forks.  Fortunately, the Spring VM system has no prob-
lems coping with multiple levels of  copy-on-write  sharing  [6].
Both  Spring  and  SunOS  attempt  to share dynamic library pages
between different images. However, there is one  key  difference.
SunOS  copies the library first and then performs the fix-ups in-
dependently in each process that uses  the  library.  This  means
that it is important to minimize the number of pages that are mu-
tated by fix-ups, as these pages will not be shared.  The  Spring
dynamic  linker, on the other hand, tries to perform most fix-ups
early and then copies the fixed-up library. This is possible  be-
cause  all  users of the library will load it at the same virtual
memory address. Now, if Spring has to undo and redo its  fix-ups,
then  we  gain nothing. However, if the provisional fix-ups prove
durable, then we have potentially increased the number of  physi-
cal pages that can be shared between users of the library. At the
very least, Spring's strategy of caching entire  fixed-up  images
will increase the amount of sharing between two processes running
the same image, as they can both benefit from the  same  fixed-up
pages.   5.6.  Spring Linker Domain The actual dynamic linking of
images is done by a separate linker domain.  This  domain  imple-
ments  a  linker  object that it exports in a well-known place in
the Spring naming service. When a new program is  to  be  run  on
Spring,  the parent domain invokes the link operation on a linker
object. The arguments to the link operation include a memory  ob-
ject  that  represents  the program image and a naming context to
use to find shared libraries; this naming context is analogous to
the  loader  library path in SunOS. After the linker domain links
the image, it returns a list of <memory object,  address>  pairs.
The parent domain then maps these memory objects at the given ad-
dresses in a new domain and starts the new domain executing.  The
linker domain implements the caching of fully linked programs and
partially fixed-up libraries. When the linker domain is asked  to
link  a  program  it checks its image cache. If the memory_object
that is being linked is the same as a previously linked image and
the  context  that  is  being used to find libraries is the same,
then the linker domain will consider this a cache hit. The linker
can determine if two memory objects or contexts are equivalent by
following a secure object equivalence protocol. This protocol re-
quires  two  cross-domain  calls  for each equivalence check. The
linker domain strategy for determining if there is a hit  on  the
image  cache  will be correct as long as the contents of the con-
text used to find libraries does not change. If new libraries are
bound  into the context used for linking, then the linker may get
a false cache hit. If new libraries are bound into the name space
then  the linker cache must be flushed to take advantage of these
libraries. Since installing new  shared  libraries  should  be  a
fairly  uncommon occurrence, we believe that our solution is rea-
sonable. However, in order to eliminate the need for linker cache
flushing,  we  plan  to implement the more aggressive solution to
cache coherency described in Section 8.3.  In addition to the im-
age  cache, the linker domain also maintains a cache of partially
fixed-up libraries. When the image cache  is  missed,  the  image
must be linked against a set of shared libraries. For each shared
library, the linker domain uses the context  given  to  the  link
operation  to resolve the name of the library to a memory object.
The linker then checks to see if the memory object is  equivalent
to  the  memory  object for any cached library. If it is then the
linker can use the already cached library.  6.  Performance  This
Section  provides  measurements of dynamic linking performance on
Spring and SunOS  4.1.3.  These  measurements  were  taken  on  a
SPARCstation  10  model  31 with a 36 Mhz CPU and 64 Megabytes of
memory. The SunOS numbers include all of the linking that  occurs
before  the main function is called. Thus, they include some fix-
ups and the library mapping costs but  do  not  include  any  on-
demand  fix-ups.  The  Spring  numbers  include the cost of fully
fixing-up the shared libraries and the main program, but they  do
not  include  the library mapping cost. Since the mapping cost on
SunOS and Spring is small, these mapping costs are  not  signifi-
cant.   6.1.   Linking Against a Single Library Table 3 shows the
time it takes to link a null main program on  Spring  and  SunOS,
and  Table  4  shows  the cost of linking some Spring programs on
Spring and SunOS. These measurements show several things.  First,
image  caching is very effective in providing low dynamic linking
cost. If a program image is cached, then Spring can start-up  the
program  in  12  milliseconds regardless of the number of reloca-
tions. However, SunOS, which does not have image  caching,  takes
from  22  to 1376 milliseconds to link a program depending on the
number of relocations in the  library.  These  measurements  also
show the effectiveness of library caching. Library caching allows
the program to be linked up to four  times  faster  than  without
caching.  In  fact,  with  library caching, the effect of a large
number of relocations on linking performance is reduced. For  ex-
ample,  even though a non-PIC version of libspring has five times
as many relocations as a PIC version, it only takes 20 to 50 per-
cent  longer  to link against a cached non-PIC version of the li-
brary. This is much better than SunOS which takes over five times
longer  to  link  against  a non-PIC version of libspring than it
does to link against the PIC version. Library  Library  Compiled-
PIC?   SunOS  Spring  Program Cached Spring Library Cached Spring
Nothing Cached libspring Yes 182 ms 12 ms 90 ms 185 ms  libspring
No  1008  ms 12 ms 136 ms 576 ms libc Yes 22 ms 12 ms 65 ms 73 ms
libc No 197 ms 12 ms 87 ms 235 ms TABLE 3.        Linking A  NULL
Program  In  general, even without any caching, Spring is able to
link programs as fast or faster than  SunOS.  In  fact  when  the
number  of relocatable symbols gets large, Spring can dynamically
link twice as fast as SunOS even when neither the image  nor  the
library is cached. Spring's better performance is due to the fact
that the Spring dynamic linker caches and reuses  symbol  resolu-
tions  during  the fix-up phase.  There is one case where without
caching Spring is much slower than SunOS: when  the  null-program
is  linked against a PIC version of libc. The poor performance on
Spring is because Spring has to perform  all  relocations  before
the  program  is  run  and SunOS does not. When libraries such as
libc are compiled PIC, the SunOS idea of  performing  fix-ups  of
procedure references on demand appears to be very effective. How-
ever, for libraries such as libspring where most relocations  are
data  relocations  that cannot be performed lazily, the SunOS op-
timization is not very effective.   Program  Program  PIC?   lib-
springPIC?   SunOS  Program  Cached Library Cached Nothing Cached
ssh Yes Yes 209 ms 12 ms 134 ms 221 ms ssh Yes No 1147 ms  12  ms
166  ms  588  ms  ssh No Yes 291 ms 12 ms 174 ms 272 ms ssh No No
1286 ms 12 ms 212 ms 643 ms caching_fs Yes Yes 239 ms 12  ms  158
ms  252  ms  caching_fs  Yes  No  1212  ms  12  ms  194 ms 615 ms
caching_fs No Yes 371 ms 12 ms 240 ms 335  ms  caching_fs  No  No
1376  ms  12 ms 280 ms 712 ms TABLE 4.        Linking Spring Pro-
grams on Spring These measurements provide a good  indication  of
the  effectiveness of PIC. Without caching, PIC is very effective
in reducing program start-up time. Compiling  a  library  PIC  is
much  more important than compiling a program PIC. The difference
in the dynamic linking cost for PIC and  non-PIC  versions  of  a
program  is  at most 50 percent. However, compiling a library PIC
can reduce the dynamic linking cost by from three to eight  times
over non-PIC.
 With image caching, PIC is completely unnecessary. In  addition,
library caching reduces the necessity for PIC. Thus, the need for
PIC will depend on a combination of the hit  rate  of  the  image
cache  and the hit rate of the library cache. Our experience with
Spring so far is that because of the effectiveness of  our  image
and  library caches, we have no need for PIC.  6.2.  Multiple Li-
braries The previous measurements were of the cost of dynamically
linking  a program against one library. Table 5 shows the cost of
dynamically linking a null program against the set of X libraries
libXaw, libXmu, libXt, and libX11 and either libc or libue. Table
5 shows several interesting things. First, as expected, both  im-
age  caching and PIC are very effective in improving performance.
Second, without image caching, Spring is much  worse  than  SunOS
when  the  program  and  the X libraries are linked against libc.
Once again, the poor performance on Spring is because Spring  has
to  perform  all relocations before the program can run and SunOS
does not. Since each relocation must be checked against many  li-
braries,  the dynamic linking cost is very high (much higher than
the cost shown in Table 3  of  linking  a  null  program  against
libc).  Library Library CompiledPIC?  SunOS Spring Program Cached
Spring Library Cached Spring Nothing Cached libue Yes 486  ms  13
ms  424  ms 645 ms libue No 3111 ms 13 ms 473 ms 1017 ms libc Yes
102 ms 14 ms 416 ms 516 ms libc No 387 ms 13 ms  431  ms  684  ms
TABLE  5.        Linking Against Multiple X Libraries These meas-
urements show the potential for the library chaining optimization
described  in  Section  8.1  and  the  unified  symbol table idea
described in Section 8.2. We hope these optimizations would allow
Spring  to  perform  equally well regardless of the number of li-
braries that are being dynamically linked.  7.  Relevance to Oth-
er  Operating  Systems In Spring, it was natural to implement the
dynamic linker cache as a user-level service. Spring's  microker-
nel organization meant that image start-up was already handled by
user-level libraries, and our focus on objects meant that  system
resources,  such  as memory objects, could be conveniently passed
to and from a user-level  service.   However  the  basic  caching
techniques  we have described do not require a user-level server.
A UNIX system might implement an equivalent dynamic library cache
as  an  operating  system service that is used by the exec system
call. This should provide essentially all the  same  benefits  as
our  user-level service.  8.  Future Work Everything described in
this paper  has  been  implemented.  In  this  section,  we  will
describe  three  potential  improvements  to  our linking system.
8.1.  Chaining Some programs are  linked  with  many  shared  li-
braries. For example, the X programs such as xterm and xclock are
linked with five shared  libraries.  When  a  program  with  many
shared  libraries  is  linked  in  Spring, the linker domain will
fix-up each library in turn each time different programs that use
the  same  set  of libraries are linked. One optimization that we
could implement is to cache fixed-up chains of  libraries.  Thus,
if  two programs are linked that use the same set of libraries in
the same order, then the second program could use the cached  set
of  libraries that were fixed-up for the first program. As an ex-
ample of the chaining optimization,  consider  the  two  programs
progA  and  progB  created  in  the following manner: ld -o progA
progA.o X.so Y.so Z.so ld -o progB progB.o X.so Y.so Z.so If pro-
gA is linked, the linker domain could cache the fixed-up chain of
libraries <X.so, Y.so, Z.so>. In this cached chain, X.so will al-
ready be fixed-up against Y.so and Z.so, and Y.so will already be
fixed-up against Z.so. When progB is linked,  the  linker  domain
will  notice  that  it uses the already cached chain of libraries
<X.so, Y.so, Z.so>. Thus when progB is linked many of the linking
steps  can be avoided. 8.2.  A Unified Symbol Table Currently, we
maintain a separate symbol table for each library in  our  cache.
However,  a  possible  alternative would be to maintain a unified
symbol table for all the libraries in  the  cache.  This  unified
symbol  table  would map each symbol to the set of libraries that
implement it (in practice the vast majority of symbols  are  only
defined  by a single library). Whenever we added a library to the
library cache, we would add its external  symbol  definitions  to
the unified symbol table.  This unified symbol table could poten-
tially accelerate the final linking stage  for  images  that  use
large  numbers  of shared libraries. Currently, we have to search
through each of the libraries' symbol tables in turn  to  resolve
undefined  symbols and to validate that a given symbol definition
has not been overridden. By doing a single lookup in the  unified
symbol table, we would be able to discover which libraries define
that symbol and determine relatively quickly which  (if  any)  of
those  library  definitions  to  use for the current image.  8.3.
Aggressive Coherency Our current mechanism, described in  Section
5.6,  for  determining  a  hit  in the image cache requires cache
flushes when libraries change. We plan to be much more aggressive
about  keeping the image and library caches coherent with respect
to changes in libraries. We intend to implement a fully  coherent
linker  cache by using a combination of our naming and file cach-
ing strategies [8].  9.  Conclusion  The  Spring  dynamic  linker
uses  caching  very  effectively  to  improve performance. When a
dynamically-linked program image is cached, then the program  can
start  up nearly as fast as a statically-linked program. The only
difference is that there are  more  memory  objects  to  map  for
dynamically-linked  programs. When libraries are cached, uncached
programs can still start up quickly regardless of the  number  of
relocations  in  a  library.  In  fact, in the cases that we have
measured on Spring, PIC is not necessary for good dynamic linking
performance.  Current  UNIX  dynamic linking implementations were
designed for languages such as C where the use of PIC can  reduce
the  dynamic  linking  costs  to  an  acceptable  level. However,
object-oriented languages such as C++, with features such as vir-
tual  inheritance,  produce images where technologies such as PIC
are no longer sufficient to provide  acceptable  dynamic  linking
costs. In this paper, we have presented a new technology, caching
of fixed-up program images and shared  libraries,  that  provides
efficient  dynamic  linking  of  programs  written  in  these new
object-oriented languages. We believe that new  operating  system
techniques  such  as  image  and library caching, or new compiler
techniques, are going to be required in the future  in  order  to
ensure  acceptable  dynamic linking performance.  Acknowledgments
We would like to thank Yousef Khalidi and Peter Madany  for  pro-
viding  valuable  feedback  on  this paper. We would also like to
thank Yousef for originally suggesting this line of inquiry.  Fi-
nally, we would like to acknowledge the efforts of all the Spring
contributors whose work made this project a success.   References
[1]  Arnold, Q., "Shared Libraries on UNIX System V," USENIX Sum-
mer Conference, 1986, pp. 395-404.  [2] Gingell, R. A., Lee,  M.,
Dang, X. T., and Weeks, M. S.,"Shared Libraries in SunOS," USENIX
Summer Conference,  1987,  pp.  131-145.  [3]  Hamilton,  G.  and
Kougiouris,  K., "The Spring nucleus: A microkernel for objects,"
USENIX Summer Conference, June 1993.  [4] Kempf, J. and  Kessler,
P.  B.,  "Cross-Address  Space Dynamic Linking," Sun Microsystems
Laboratories, Technical Report TR-92-2.  [5] Khalidi, Y.  A.  and
Nelson,  M.  N., "An Implementation of UNIX on an Object-oriented
Operating System," USENIX Winter Conference, January  1993.   [6]
Khalidi, Y. A. and Nelson, M. N., "The Spring Virtual Memory Sys-
tem," Sun Microsystems Laboratories, TR-93-9.  [7] Murphy, D. L.,
"Storage  Organization  and  Management in TENEX," Proceedings of
the Fall Joint Computer Conference, AFIPS, 1972, pp. 23-32.   [8]
Nelson, M. N., Hamilton, G., and Khalidi, Y. A., "A Framework for
Caching in an Object Oriented System," Submitted for publication.
[9]  Organick,  E.  I., The Multics System: An Examination of Its
Structure, MIT Press,  1972.   [10]  Sabatella,  M.,  "Issues  in
Shared  Library  Design," USENIX Summer Conference, 1990, pp. 11-
23.  [11] System V Application Binary  Interface,  Prentice-Hall,
Englewood Cliffs, NJ, 1990.  Author Information Michael N. Nelson
is currently a Senior Staff Engineer at Sun Microsystems  Labora-
tories. Before joining Sun he was one of the principal developers
of the Sprite Operating System at  Berkeley  and  worked  at  DEC
Western  Research  Laboratory. His interests include distributed,
object-oriented software, operating systems, and architecture. He
has  a  Ph.D.  in  Computer  Science  from UC Berkeley. He can be
reached at Sun Microsystems Laboratories, Inc., 2550 Garcia Ave.,
MTV  29-112,  Mountain  View,  CA,  94043,  USA, or via e-mail at
michael.nelson@sun.com.  Graham Hamilton is a  Senior  Staff  En-
gineer  at Sun Microsystems Laboratories, where he is the project
lead for the Spring operating system project. His  interests  in-
clude distributed computing, object-oriented systems, and operat-
ing systems. He has a Ph.D. in Computer Science from the  Univer-
sity  of Cambridge.  Trademarks Sun, Sun Microsystems, SunOS, and
SPARCstation are trademarks or registered trademarks of  Sun  Mi-
crosystems,  Inc.  UNIX  is a registered trademark of UNIX System
Laboratories, Inc. All SPARC trademarks  are  trademarks  or  re-
gistered  trademarks  of SPARC International, Inc. All other pro-
duct names mentioned herein are the trademarks of  their  respec-
tive owners.