################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
	    Proceedings of the USENIX SEDMS IV Conference
      (Experiences with Distributed and Multiprocessor Systems)
	     San Diego, California, September 22-23, 1993


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


		Experience Building a File System on a 
                  Highly Modular Operating System

     Michael N. Nelson    Yousef A. Khalidi     Peter W. Madany

                 Sun Microsystems Laboratories, Inc.
                    Mountain View, CA 94043 USA
                  {yak, mnn, madany}@eng.sun.com

Abstract

File systems that employ caching have been built for many years. However, most 
work in file systems has been done as part of monolithic operating systems. In this 
paper we give our experience with building a high-performance distributed file 
system on Spring, a highly modular operating system where system services such 
as file systems are provided as user-level servers. The Spring file system described 
in this paper supports cache coherent file data and attributes. It uses the virtual 
memory system to provide data caching and uses the operations provided by the 
virtual memory system to keep the data coherent. The file system uses a unique 
dynamic caching algorithm that allows per-machine caching file servers to be 
located when a file object is passed from one machine to another. A per-machine 
caching file server utilizes the virtual memory system to provide caching of data 
for read and write operations, and it has a private protocol with the remote file 
servers to cache file attributes. The result is an operating system that has all the 
advantages of modular systems while providing the efficiency of caching that was 
available in monolithic systems.

1.  Introduction

Distributed file systems that utilize caching to provide good performance have existed for many 
years (e.g. Sprite [1], and Andrew [2]). However, until recently all of these file systems were 
implemented as part of a monolithic operating system. With the advent of microkernel systems 
(e.g. Mach [3] and CHORUS [4]) file systems are now being implemented outside the kernel in 
user level servers. Some of the system properties on monolithic systems that were exploited in 
order to build distributed file systems have changed. These system properties include:

7	Each system component knew about the location of other components. For example, the 
virtual memory system knew that files could only be implemented by the file systems that were 
in the kernel. In modular systems the different components could be anywhere, including 
across the network.

7	File objects were always acquired through the cache manager. For example, files were 
always opened through the file system, which was the same file system that did the caching. In 
modular systems objects can be passed between user applications, so a caching server may not 
be involved when users acquire objects.

7	All components could trust each other. When system components are implemented by user 
level servers, this is no longer true.

7	Files and directories were only named via the file system. In systems with generic naming 
systems, files and directories can be bound and resolved via a naming system outside the file 
system.

This paper describes our experience building a file system on Spring-a highly modular, distrib-
uted, object-oriented operating system. Spring has several properties that provide unique opportu-
nities and challenges when building a file system, including:

7	A powerful VM system with support for external pagers and operations that allow the construc-
tion of distributed shared memory systems.

7	A naming system that allows objects of all types, including files, to be bound into the name 
space.

7	A capability-based security model.

7	An object model that allows objects to be passed freely between domains on the same or differ-
ent machines.

The Spring File System was designed to take advantage of these and other Spring properties to 
build a powerful coherent distributed file system. The file system consists of two types of file serv-
ers: ones that provide access to files that they implement and ones that cache accesses to files 
implemented by remote file servers. File servers of the first type (called storage file servers) are 
responsible for providing access control, coherent access to file data and attributes, and file nam-
ing. Data is kept coherent by using the primitives provided by the virtual memory system, and 
attributes are kept coherent by using a private protocol with caching file servers (see below). The 
storage file servers name their files by being one of the many name servers that compose the 
Spring naming system. In addition files can be stored in name servers that are not implemented by 
the file system.

There are actually two different types of storage file servers: one that runs on each Spring machine 
and provides access to files on the local disk and one that runs on the SunOS? system and pro-
vides access to SunOS files. Except for file storage details the two implementations are identical.

The second type of file server (called a caching file server or CFS) is responsible for making 
access to remote data and attributes efficient. One of these file servers runs on each Spring 
machine that desires to have file caching. The CFS is optional: remote files will be accessible with-
out a CFS, but accesses will be slower. 

The Spring File System utilizes a unique dynamic caching protocol to allow file objects to be 
cached by the CFS. Under this protocol the CFS is contacted to cache file objects when they first 
appear in a client domain. The result is that file objects are always cached by the CFS on the same 
machine as the client that possesses the file object.

The resulting file system provides good performance. Once files are cached on the local machine, 
no remote operations are required to perform any operation on the file. Preliminary measurements 
show that caching allows basic file operations such as read, write, and map to be executed at least 
5 times faster than without caching. 

The rest of this paper is organized as follows: Section 2 provides an overview of the Spring Oper-
ating System; Section 3 discusses the file interface; Section 4 discusses the implementation of the 
storage file servers; Section 5 describes the implementation of the CFS and discusses the coher-
ency protocol used by it and the storage file servers; Section 6 describes some additional file sys-
tem functionality; Section 7 discusses performance; Section 8 presents related work; Section 9 
discusses the lessons that we learned from building the Spring file system; and Section 10 offers 
some conclusions.

2.  The Spring Operating System

Spring is a distributed, multi-threaded, extensible operating system that is structured around the 
notion of objects. A Spring object is an abstraction that contains state and provides a set of opera-
tions to manipulate that state. The description of the object and its operations are specified in an 
interface definition language (IDL). IDL supports both notions of single and multiple interface 
inheritance.

A Spring domain is an address space with a collection of threads. A given domain may act as the 
server (implementor) of some objects and the clients of other objects. The server and the client can 
be in the same domain or in different domains.

Spring objects consist of two parts: the object representation that lives in the domain that is using 
the object and the state kept by the server of the object. The object representation contains at least 
enough state to allow an invocation on the object to get to the object's server. Figure 1 shows an 
example of a Spring object where the client of the object and the server of the object are on differ-
ent machines.

The Spring kernel supports basic cross domain invocations and threads, low-level machine-depen-
dent handling, as well as basic virtual memory support for memory mapping and physical memory 
management [5, 6]. A Spring kernel does not know about other Spring kernels-all remote invoca-
tions are handled by a network proxy server.

A typical Spring node runs several servers besides the kernel. These include a name server, file 
servers, a linker domain that manages and caches dynamically linked libraries [7], a network 
proxy that handles remote invocations, a device server that provides basic terminal handling as 
well as frame-buffer and mouse support, and a UNIX, server that provides support for running 
UNIX binaries on Spring [8].

2.1	Spring Security

If the server and the client of an object are in different domains, the representation of the object 
includes an unforgeable handle managed by the kernel that identifies the server domain. These 
unforgeable handles have many of the security properties of capabilities in traditional operating 
systems. If a server determines that a client is entitled to specific access rights to a given piece of 
state (e.g. a file), it can give the client an unforgeable handle X. Encapsulated in the server side 
state for handle X will be the granted access rights and possibly the principal name of the client. 
Whenever a call arrives quoting handle X, the server can permit the given access to the underlying 
state without further checks.

Servers determine if a client is allowed access to a piece of state by consulting an access control 
list (ACL) that is associated with the state. Each ACL entry contains a principal name and a list of 
access rights. A server will only believe that a client is a given principal if that client has first been 
authenticated to be that principal. Once a client has been authenticated to the server as a given 
principal P, then the server will be willing to return objects that grant specific rights for P as deter-
mined by the ACL.

2.2	Spring Naming

The Spring name service [9] allows any object to be associated with any name. A name-to-object 
association is called a name binding. Each name binding is stored in a context. A context is an 
object that contains a set of name bindings in which each name is unique. An example of a context 
is a UNIX file directory. An object can be bound to several different names in possibly several dif-
ferent contexts at the same time.

Since a context is like any other object, it can also be bound to a name in some context. By binding 
contexts we can create a naming graph. The UNIX file system is a naming graph that is frequently 
restricted to a tree. 

Spring contexts provide support for the Spring security model. When an object is bound, an ACL 
can be given that specifies which principals are allowed which rights for the object. When a name 
is resolved, a set of desired modes is specified. Modes are a superset of rights. For example, read 
and write modes correspond directly to read and write access rights; however, append mode 
implies write access but also indicates the "mode" with which the object should be accessed when 
writes occur. When a name is resolved, an object with the desired modes is returned if the client 
doing the resolve is allowed the corresponding rights.

2.3	Virtual Memory

A per-node virtual memory manager (VMM) is responsible for handling mapping, sharing, and 
caching of local memory. The VMM depends on external pagers for accessing backing storage and 
maintaining inter-machine coherency [6, 10].

Most clients of the virtual memory system only deal with address space and memory objects. An 
address space object represents the virtual address space of a Spring domain while a memory 
object is an abstraction of storage (memory) that can be mapped into address spaces. An example 
of a memory object is a file object (the file interface in Spring inherits from the memory object 
interface). Address space objects are implemented by the VMM.

A memory object has operations to set and query the length, and operations to bind to the object 
(see below). There are no page-in/out or read/write operations on memory objects (which is in 
contrast to systems such as Mach [3]). The Spring file interface provides file read and write opera-
tions (but not page-in and page-out operations). Separating the memory abstraction from the inter-
face that provides the paging operations is a feature of the Spring virtual memory system that we 
found very useful in implementing our file system. This separation enables the server of the mem-
ory object to be different from the server of the pager object that provides the contents of the 
memory object. We will show uses of this feature in Section 5.

2.3.1	Binding a memory object to a cache object

When a VMM is asked to map a memory object into an address space, the VMM must be able to 
obtain the contents of the memory object, since the memory object itself does not provide opera-
tions for obtaining this data. Therefore, the VMM contacts the pager domain that implements the 
memory object by invoking the bind operation on the memory object. The objective of the bind 
operation is to point the VMM to a local data cache that provides the contents of the memory 
object and to tell the VMM what rights are encapsulated by the memory object. The details of the 
bind operation are given in [10]; in the rest of this section we will give a brief overview of the bind 
operation.

During the bind operation the VMM and the pager domain exchange two objects: a pager object 
and a cache object. The pager object provides operations to page-in and out memory blocks, and 
the VMM uses it to populate a local cache. The cache object is implemented by the VMM, and the 
pager domain uses it to affect the state of the cache. Tables 1 and 2 list the operations of the cache 
and pager objects, respectively. A given pager object-cache object pair constitutes a two-way 
communication channel between a pager and a VMM. Typically, there are many such channels 
between a given pager domain and a VMM (see Figure 2 for an example). As far as the VMM is 
concerned, each memory object is unique-the VMM relies on the memory object's pager to point 
it to a data cache from which the VMM obtains the contents of the memory object, and it also 
relies on the pager to indicate the encapsulated access rights of the memory object. This extra level 
of indirection allows different memory objects that share the same pages (but perhaps encapsulate 
different access rights) to share the same cache at the VMM instead of flushing the same pages 
back and forth between two separate caches.

Operation

Description

flush_back

Remove data from the cache and send modi-
fied blocks to the pager.

deny_writes

Downgrade read-write blocks to read-only 
and return modified blocks to the pager.

write_back

Return modified blocks to the pager. Data is 
retained in the cache in the current mode.

delete_range

Remove data from the cache, return no data.

zero_fill

Indicate to the VMM that the given range of 
cache is zero-filled. Data blocks in the range 
are held by the VMM in read-write mode.

populate

Introduce data blocks into the cache.

TABLE 1. 	Cache object operations

Operation

Description

page_in

Request data be brought into the cache.

page_out

Write data to the pager and remove data from 
the cache. 

write_out

Write data to the pager and retain data in read-
only mode. 

sync

Write data to the pager and retain data in the 
current mode.

TABLE 2. 	Pager object operations

3.  The File Interface

Spring files contain data and attributes and support authentication. The interface provides access to 
the file's data through two mechanisms. One way is through read and write operations; these oper-
ations are inherited from the Spring io interface. The other way is by mapping the file object into 
an address space; this ability comes by having a file object inherit the memory object interface. 

Spring files have three attributes: the length of the file, its access time, and its modify time. The file 
interface provides get_length and set_length operations to retrieve and change the file length; these 
operations are inherited from the memory object interface. All three attributes can be retrieved via 
the stat operation; there is no direct way to set the access or modify time.

Spring files support Spring authentication by inheriting the authenticated interface. The authenti-
cated class provides support for access control lists, encapsulated rights and principals, and it 
allows new file objects to be created that reference the same underlying file state as the current file, 
yet contain different encapsulated rights.

4.  The Storage File Server

In this section we will describe the implementation of the storage file servers. In this description 
we will ignore the issue of the caching file server since the caching file server is merely an optimi-
zation and is not required for the file system to function properly. In the next section when we dis-
cuss the caching file server, we will discuss the extra implementation required in the storage file 
servers to support caching by the CFS.

4.1	Naming Files

The Spring file system fits into the overall Spring naming system. Spring files can be accessed via 
contexts implemented by the storage file servers or via contexts implemented by other domains. 
The context objects implemented by the storage file servers are only one of the many types of con-
texts that together compose the Spring naming system.

4.1.1	The File System Contexts

The storage file servers implement a subclass of the context class called fs_context. The fs_context 
class inherits from the authenticated interface and it adds the create_file operation, which creates a 
file and binds it to a name. Thus the fs_context objects implemented by the storage file servers 
contain an encapsulated principal, encapsulated rights, and an ACL. Fs_context objects are nor-
mally used to retrieve and bind file and fs_context objects, but other types of objects can be bound 
and retrieved as well (see Figure 3 for an example Spring name space).

The storage file servers export their files by binding fs_context objects into a public Spring name 
server. Storage file servers read configuration files that determine where to bind their context 
objects.

Each binding in an fs_context has an ACL. When a name resolution is invoked on an fs_context 
(e.g. someone wants to open a file for read-write), the file system ensures that the encapsulated 
principal of the context doing the lookup is allowed the desired access to the bound object. The 
resulting file or fs_context object will encapsulate the principal of the context doing the lookup 
and will also encapsulate the desired modes. For example, if a client had the root context object in 
Figure 3 authenticated with principal P and the client invoked the operation resolve("B/F/H", 
read-write), the client would get back a file object that encapsulated principal P and read-write 
mode, assuming that P had read access to contexts B and F and read-write access to file H.

4.1.2	Naming Separate From File System

File and fs_context objects can be bound into the Spring naming system just like any other object. 
Thus these objects can be bound into contexts that are not implemented by the file system. When a 
client retrieves a file or fs_context object from a non-file-system context (e.g. the file named "A/
D" in Figure 3), the context must be able to create a copy of the file or fs_context object that 
encapsulates the current principal and the desired modes. This is done using a Spring duplication 
service.

A standard Spring naming server does not know how to change the encapsulated principal or 
modes of an object. Thus any object server that wishes to allow its objects to be stored in name 
servers and allow the encapsulated access to be changed, must implement a Spring duplication ser-
vice object. This object supports the dup operation which takes an object, a principal, and a desired 
set of modes and returns a copy of the object that encapsulates the given principal and modes.

The file servers implement two duplication services: one for files and one for contexts. When a file 
server is asked to duplicate an object it ensures that the caller has the right to produce an object 
with the desired principal and modes, and if so returns a copy of the object that encapsulates the 
given principal and modes.

4.2	The FS Object

The fs object can be used to create unnamed files. It supports one operation, get_file, which returns 
a new unnamed file object. In order for this new file object to be bound to a name it must be bound 
into some context.

4.3	File Implementation

Files are implemented by the storage file servers. In this section we will discuss the interesting 
details of the file implementation. Note that if a file that is being accessed is implemented by a 
remote storage file server, all operations invoked on the object will require a network RPC. The 
CFS that is discussed in Section 5 is able to eliminate most of these network RPCs.

4.3.1	Security

The file objects implemented by the storage file servers are authenticated objects. Therefore they 
have both an encapsulated principal and encapsulated rights. The encapsulated rights are set when 
a file object is created, and the rights are checked on each access to the object. The encapsulated 
principal is not currently used for file objects. If we decide at some point to verify the principal on 
each access then we would use the encapsulated principal.

4.3.2	Mapping Files

As we described in Section 2.3, Spring files can be mapped into address spaces because the Spring 
file class inherits the memory object interface. When a client maps a file object into its address 
space, the virtual memory system and the file system follow the bind protocol described in Section 
2.3. The result is that the cache - pager object connection between the VMM and the file system 
is set up. Figure 4 gives the state of the system after a file object is bound into a client domain's 
address space.

4.3.3	Data Coherency

There is a potential coherency problem when a particular file is mapped into multiple client's 
address spaces on several machines at the same time. For example, if two clients on different 
machines have the same page of a file mapped into their address spaces both readable and writ-
able, then some action must be taken to ensure that both clients see a coherent view of the page. 
One of the goals when building the file system was to give clients a coherent view of files. As a 
result one of the primary jobs of the file system is to keep files coherent.

Since files can be cached a page at a time, coherency is done on a per page level; a file server keeps 
pages coherent by invoking operations on the cache objects that are associated with each file 
object. The storage file servers implement a single-writer, multiple-reader per-page coherency 
algorithm. The file system can guarantee coherency because it gets all page-in requests. Each 
request indicates whether the page is desired in read-only or read-write mode.

4.3.4	Read and Write Caching

Read and write operations are cached by mapping the file that is being read or written into the stor-
age file server's address space. Once the file is mapped, then the data is copied to or from the 
mapped region as appropriate. Since file mapping is used, all of the issues of data caching and 
coherency are handled by the vm-pager data coherency protocol.

4.3.5	Periodic Data Write Back

In order to reduce the amount of data lost in a machine crash, the storage file servers write back all 
modified data for their files cached at VMMs every 30 seconds. The file servers do this by invok-
ing the write_back operation on the cache objects associated with each file.

4.3.6	Coherency Impact of the Length

Getting and setting the length may require coherency actions. Getting the length requires that the 
file server retrieves the length from anyone who is caching it writable. Setting the length requires a 
coherency action if the length is decreased. In this case the pages at the end of the file need to be 
eliminated from the file and from all caches of the file. If the pages are not removed from the 
caches, then clients will not see a consistent view of the file because some clients may be able to 
access parts of the file that no longer exist. Pages are deleted from caches by invoking the del-
ete_range operation with the appropriate data range on all cache objects that possess deleted 
pages.

If a file's length is increased, then nothing has to be done in order to ensure coherency. However, 
there is an opportunity for an optimization that can best be done by the caching file server. We will 
discuss this optimization in Section 5.8.

5.  The Caching File Server

In this section we describe the implementation of the Caching File Server (CFS). The CFS caches 
the following things in order to provide high performance: 

7	Attributes to eliminate remote get_length, set_length, and stat calls.

7	Data to eliminate remote read and write calls.

7	VM cache objects to eliminate remote bind calls and allow an additional optimization that 
eliminates most zero-fill page faults.

5.1	Basic Architecture

In order to allow local file caching to be implemented, the file objects used by client domains must 
be implemented by the CFS. In addition the CFS must have a special communication channel for 
caching with the storage file servers whose data and attributes it is caching and a copy of the VMM 
cache object for the file.

The other component of the caching architecture is the virtual memory system. The virtual mem-
ory system uses the cache and pager objects described in Section 2.3. In order to make page-ins 
and page-outs as efficient as possible, the virtual memory manager should be able to communicate 
directly with the file server that stores the data; in other words, the pager object should be imple-
mented by the storage file server, not the file cacher. The desired structure for data caching involv-
ing the CFS, the storage file server, and the VMM is given in Figure 5.

5.2	The Caching Subcontract

When client domains receive objects from a remote file server, the CFS must somehow be able to 
interpose on these objects so that caching can occur. This is done through the use of the caching 
subcontract. 

Every Spring object has an associated subcontract [11]. Subcontract is responsible for many things 
including marshaling, unmarshaling, and invoking operations on the object. Subcontract also 
defines the representation for each object that appears in a client domain's address space. The stan-
dard Spring subcontract is called singleton. The representation of a singleton object includes a ker-
nel handle that identifies the server domain. When a client invokes an operation on an object that 
uses singleton, this handle is used to send the invocation to the server domain. 

File objects use a different subcontract called the caching subcontract. File objects are only one of 
the users of the caching subcontract. The representation for an object that uses the caching subcon-
tract contains:

7	A handle that identifies the server domain (this is the same handle that is in the singleton repre-
sentation).

7	An object, called the cached_object, that is implemented by a domain that caches the original 
object.

7	A name, called cacher_name, that names the cacher to use.

Figure 6 shows the configuration after a file object with the caching subcontract is cached by a 
CFS domain.

The cached_object in the caching subcontract representation is used when an invocation occurs on 
an object that uses the caching subcontract. If the cached_object is non-null, then the invocation is 
done on the cached_object; if the cached_object is null, then the invocation is done on the server's 
handle. The cached_object will be null if there is no cacher or the server is on the local machine.

The cached_object is obtained using the cacher_name when an object is unmarshaled into a client 
domain. Each cacher domain (such as the CFS) implements a cacher object. This object provides 
the operation get_cached_obj that takes an object implemented by a remote server and returns an 
object implemented by the cacher domain. This cacher object is bound in the local machine's 
name space under a name that must be agreed upon by the implementor of the cacheable service 
and the implementors of cacher domains for the service. This is the name that is stored as the 
cacher_name in the subcontract representation. This name is put there by the server domain that 
created the cacheable object.

When an object is unmarshaled into a client domain the unmarshaling code resolves the 
cacher_name to a cacher object implemented by a cacher domain. The unmarshaling code then 
invokes the get_cached_obj operation on the cacher object passing it in a copy of the cacheable 
object. When the cacher domain receives the object, it creates a new object that it implements and 
returns this new object to the client domain. The object returned from the cacher is stored as the 
cached_object in the subcontract representation.

5.3	The CFS Cacher Object

The CFS implements a cacher object that it exports in the machine name space under a known 
name. Whenever a storage file server creates a file object, it sets the cached name in the file 
object's representation to be the name of the CFS. Thus when a file object is unmarshaled, the 
CFS's cacher object will be found and the get_cached_obj operation will be invoked on the cacher 
object. The CFS will then return a file object that it implements.

When the CFS receives a file object to cache via the get_cached_obj call it must determine two 
things. First, it has to determine if it implements the cached_object that is in the file object's repre-
sentation; if so it just returns the cached_object. Second, it has to find the internal cache state for 
the file and the file's encapsulated access rights; this is done by using the same bind protocol that 
the VMM uses to set up the cache object - pager object connection (see below). 

5.4	CFS to Remote File Server Connection

The CFS and remote file servers need a connection similar to the connection between the VMM 
and pagers. The CFS needs to be able to get cached information for files and the remote file server 
needs to perform callbacks for cache coherency. This connection consists of two objects: an 
fs_cache object and an fs_pager object. The fs_cache object is a subclass of the VM cache object 
and is implemented by the CFS. The fs_pager object is a subclass of the VM pager object and is 
implemented by the storage file servers (see Tables 3 and 4 respectively for the extra operations 
added by the fs_pager and fs_cache objects). These objects are subclasses of the VM objects for 
two reasons:

7	It allows the normal bind operation on a file object to be used to set up the connection and dis-
cover whether a file is already cached.

7	It allows the storage file servers to keep data coherent while being ignorant of whether they are 
dealing with a VM system or a CFS - the file servers just use the VM cache object operations 
for data coherency.

The CFS - remote file server connection is set up using the same bind protocol described in Sec-
tion 2.3 - it just involves different objects. 

Operation

Description

cached_bind

Tell server file is cached at VMM.

cached_stat

Get cached attributes (writable if 
desired). Result indicates which 
attributes are cacheable.

set_length

Set the length.

release_cache_info

Release cached information.

TABLE 3. 	Fs_pager object operations

5.5	Caching Binds

One of the important jobs of the CFS is to cache the results of VM binds since they occur on every 
map call. When a bind occurs the CFS checks permissions and then checks to see if it already has 
a VM cache object for the file. If not it gets one in the following manner. The CFS first tells the 
remote file server that the VMM is caching file data so the remote file server knows that the file's 
data is being cached; this is done by invoking the cached_bind operation on the appropriate 
fs_pager_object. The CFS then calls the local VMM with the fs_pager object implemented by the 
remote storage file server to create a VM cache object. Once the CFS has a cache object, it keeps a 
copy of it and returns the cache object to the caller of bind (i.e. the local VMM).

Operation

Description

get_back_times

Return access and modify times.

get_back_length

Return the length. A parameter indicates 
whether the length can still be cached.

dont_cache_time

Don't cache the time anymore.

delete_cache

The VM cache is no longer valid.

TABLE 4. 	Fs_cache object operations

Figure 7 shows the configuration after a successful cached bind operation. Note that the VMM has 
a direct pager connection to the remote file server and the remote file server's cache object is actu-
ally implemented by the CFS. Thus all cache coherency operations on the cache object will indi-
rect through the CFS. This does not significantly degrade performance since we are just adding 
one extra local call to two remote calls (the coherency call and the page-out operation) and all of 
the data is being transferred using the direct pager object connection.

5.6	Caching Reads and Writes

The CFS caches data for reads and writes on files by mapping the file that is being read or written 
into the CFS's own address space. Once the file is mapped, then the data can be copied to or from 
the mapped region as appropriate. Since file mapping is used, all of the issues of data caching and 
coherency are handled by the virtual memory system and the remote file servers.

In order to implement the read and write operations, the file length must be available locally. In 
particular, for writes that append data to a file, the CFS must be able to modify the length locally.

5.7	Caching Length 

Caching the length is important because it allows read, write, get_length, and some set_length 
operations to happen locally. In order to let set_length and write operations happen locally, a CFS 
must have the ability to modify the length locally. As a result a length coherency algorithm is nec-
essary. This coherency algorithm is a simple single-writer, multiple-reader algorithm: a storage file 
server will allow multiple CFS domains to cache the length readable, but only one to cache it writ-
able. A CFS retrieves the length by invoking the cached_stat operation on the appropriate 
fs_pager object, and a storage file server keeps the length coherent by invoking the 
get_back_length operation on the appropriate fs_cache objects. 

The file length has to be retrieved by the storage file servers on page faults because the file server 
must know the current length of the file to determine if the page fault is legal. Thus, if on a page 
fault the length is being cached read-write, the file server will fetch the length back from the CFS 
that is caching the length and revoke write permission.

Having the length cached read-write allows a CFS only to increase the length without informing 
the storage file server. A CFS still has to call through to the file server when a file is truncated so 
the file server can take necessary coherency actions.

5.8	Zero-filling Cache Objects

When a file is lengthened, all of the new pages between the old length and the new length will be 
read as zeros until the pages are modified. Instead of the remote file server zero-filling these pages 
on page faults, it would be much more efficient if the virtual memory system could zero-fill these 
pages itself thus avoiding a cross-machine call and a data transfer. This optimization is imple-
mented by the CFS. If the CFS has the length cached writable and the length is increased, the CFS 
invokes the zero_fill operation on the VM cache object. If the file object hasn't been bound yet, 
then the CFS will do the zero-fill after the object is bound.

The storage file servers have to keep track of pages that are being zero-filled by virtual memory 
managers. Whenever a storage file server discovers that the length of the file has been extended by 
a CFS, it assumes that all new pages between the old length and the new length are being zero-
filled by the VMM on the CFS's machine. A storage file server can discover that a CFS has length-
ened a file in three ways:

7	The length is retrieved for coherency purposes.

7	The CFS gives the length back because it is no longer caching it.

7	A page-out past the end-of-file occurs from a machine that has the length cached writable. In 
this case the length is set to contain the last byte of the page.

5.9	Caching Time

Both the access time and the modify time can be cached by a CFS. Both times can be cached writ-
able, but we make no attempt at keeping the access time coherent because it is impossible to keep 
a cached access time coherent if the file is mapped in multiple caches. Thus if we insisted on a 
coherent access time, it would require that stat operations on all shared mapped files, even read-
only ones, are remote. We do not know of any important application programs that require a coher-
ent access time.

The modify time is kept coherent so that programs such as make can function properly. A CFS is 
allowed to cache the modify time if no one has the file cached read-write or the CFS is the only 
CFS that has the file cached read-write. In the second case, the CFS is allowed to change the mod-
ify time. A CFS retrieves the access and modify times by invoking the cached_stat operation on 
the appropriate fs_pager object and a storage file server keeps the modify time coherent by invok-
ing the dont_cache_time operation on the appropriate fs_cache objects.

5.10	Data and Length Write Back Policy

Modified data is cached by the VMM for files that are cached by the CFS. If the machine that the 
data is cached on crashes, this data will be lost. As mentioned before, the storage file servers 
employ a 30 second write back policy for writing back this cached data. In order to make the data 
even more secure, the CFS employs its own write back policy: when the last reference to a cached 
file object is gone, the CFS will write back all modified data for the file. Data is not written back 
for temporary or anonymous files (see Section 6).

Writing back the data is not sufficient - the length must be written back as well. As we discussed 
in Section 5.8, the storage file servers implicitly lengthen the file when page-outs past the end-of-
file occur. Since page-outs are in page-size quantities, the file length is set to include the whole 
page. Thus the length has to be written back after the data is written back so the file server can 
know the true length of the file. When a storage file server gets the length from a CFS that is cach-
ing the length read-write, it will truncate the file to that length.

5.11	Security

The CFS file server is trusted by client domains to cache their files. The CFS needs to ensure that it 
does not accidently allow some client to attain greater access to some cached file than the client is 
allowed. This is guaranteed by using the access rights obtained from the secure bind protocol 
described in Section 2.3.1. These access rights are checked on every operation on file objects to 
ensure that the client is allowed the desired access.

6.  Additional Functionality

There are other pieces of functionality in the file system that we have not discussed. First, the file 
system is the source of anonymous memory objects. These memory objects are used by the system 
for things such as stacks and heap memory. These objects are acquired by the VMM via objects 
implemented by storage file servers and the CFS. The details of the anonymous memory object 
implementation is given in [12].

The other piece of functionality that we have not discussed is cache reclamation. The VMM, the 
CFS, and the storage file servers all cache information. When any of these services get too many 
objects in their caches, they need to reclaim some of them. Reclaiming can be complicated since it 
involves multiple domains. Details of cache reclamation are given in [6, 12].

7.  Current Status and Performance

We have implemented the file system described in this paper. The file system that we have imple-
mented consists of three file servers:

7	a storage file server that provides coherent access to files stored on the local disk, 

7	a CFS that runs on each machine,

7	and a storage file server that runs on the SunOS system and provides access to SunOS files. 

The Spring File System that we have implemented uses caching extensively to provide high per-
formance. In the rest of this section we will examine just how effective this caching could be and 
how effective it really is.

7.1	Potential Improvements

The caching by the CFS provides the ability for substantial increases in performance. Table 5 gives 
two examples of sequences of operations that clients can do on files and how caching dramatically 
reduces network accesses. The first example is the use of a 1 Mbyte temporary file accessed via the 
read-write interface. This shows the effect of the data and length caching done by the CFS. In this 
example, when caching is used there is virtually no network activity; this file can be read and writ-
ten as fast as the local file system can copy data. The second example shows the use of a 1 Mbyte 
file accessed via memory mapping. This shows the effect of the zero-fill optimization. With the 
zero-fill optimization and length caching, there are virtually no network operations.

7.2	Measured Improvements

In the previous section we discussed the potential benefits from caching. Table 6 gives measure-
ments of some common file operations. The client machine is a SPARCstation? 2 running Spring. 
The operations without caching go to a storage file server that is running on the SunOS system on 
a SPARCstation 2. These measurements show that caching allows the operations to be executed at 
least 5 times faster than without caching. 

Operation

Without 
Caching

With 
Caching

read 4K

11 ms

1.9 ms

write 4K

51 ms

2.1 ms

set_offset

3.4 ms

0.11 ms

map/unmap

10.5 ms

2.1 ms

TABLE 6. 	Measured Performance

We need to do much more extensive performance evaluation of the Spring File System including 
comparing its performance to other systems. Perhaps the most interesting comparison would be to 
compare the performance to that of other non-modular systems such as the SunOS system. How-
ever, for now it is encouraging that caching is very effective for these simple measurements.

8.  Related Work 

There have been many instances of file systems that employ caching. Examples are NFS [13], the 
Sprite File System [1], and the Andrew File System [2]. All three of these file systems provide 
some level of caching of both data and attributes and some level of coherency. However, none of 
them provide distributed shared memory (DSM), and they were all built as part of or on top of 
monolithic operating systems. As a result many of the issues addressed by the Spring File System, 
such as dealing with external pagers, the separation of naming from the file system, and dynami-
cally locating a per-machine cacher, were not addressed by these file systems.

There have also been several instances of systems that provide DSM including [14], [15], and 
[16]. However, these systems also did not address the issues involved in a system like Spring.

To our knowledge the only system that has addressed the caching problems in a distributed modu-
lar system besides Spring is CHORUS [4]. The CHORUS system implements distributed shared 
memory by having one global coherency manager that interacts with a per-machine cache man-
ager. Each access to a file object is indirected through the local cache manager by using a coherent 
capability. When a file object is created it contains the known port of the local cache manager.

The special coherent capability in CHORUS provides functionality similar to the subcontract 
mechanism in Spring. However, the Spring subcontract mechanism is more general since it works 
even when the local cacher does not exist, and the cacher is identified by a name instead of a spe-
cific port number.

The notions of length and attributes coherency are not mentioned in [4]. Other issues, such as 
binding to caches, naming, and cache reclamation, are not mentioned either. Thus although CHO-
RUS has implemented something similar to the Spring File System, it is unclear if they have 
solved all of the hard problems solved by the Spring File System.

9.  Lessons Learned

While building the Spring file system we learned several things about designing a file system for a 
modular system such as Spring:

7	Splitting the memory object into a memory object and a pager object adds power. We used 
this feature to allow file operations such as getting attributes to go through the CFS while hav-
ing all data transfers go directly to the storage file server. We would not have been able to 
implement our caching architecture as efficiently if the data had to be paged in via the memory 
object as was done in Mach [3]. 

7	Using the VM system for data caching greatly simplifies things. This is a much better 
approach than requiring the file system to implement its own buffer cache for data as was done 
in older systems such as Sprite [1].

7	Building file systems at user level is a good thing. We found it much easier to build a file sys-
tem at user level than building one inside the kernel. We were able to try out new versions of 
the file system without rebooting the kernel and we were able to debug the file system using 
normal user level debugging tools.

7	Strong interfaces with subclassing is the right way to build systems. Once we developed the 
interfaces to our objects we were able to produce many different implementations (including 
adding caching) without having to change any client code. In addition we were able to utilize 
interface subclassing so that we could add functionality to the cache and pager object interfaces 
while still using the standard VM bind protocol. 

7	Control over the object invocation mechanism is powerful. The Spring notion of subcon-
tract was very useful in allowing us to transparently implement caching. We were able change 
the marshaling, unmarshaling, and invocation mechanisms for file objects so that they could be 
cached without programmers of client applications having to do anything. 

7	When splitting a system into components work is required to allow good performance. We 
worked very hard when we developed the interfaces between the VM system and the file sys-
tem to allow performance optimizations such as zero-filling to be possible. More work is still 
necessary in this area so that we can support other optimizations such as input and output clus-
tering [18].

7	A general naming system is good. In Spring, the file system fits into the overall Spring nam-
ing system instead of trying to wedge naming for all objects into the file system as was done in 
other systems like Plan 9 [17]. This made the implementation of naming easier and cleaner.

10.  Conclusions

File caching is crucial to good system performance in a distributed environment. The Spring File 
System provides effective caching in an environment different than the previous environments 
where caching was implemented. The Spring file data and attribute caches not only provide good 
performance but they are fully coherent as well. The Spring File System demonstrates that caching 
can be as effective in a highly modular distributed system as it is in monolithic systems such as the 
UNIX and Sprite operating systems.

The one open question about building a file system on a modular system such as Spring is how 
performance compares to that on monolithic systems. We are currently beginning the process of 
performance analysis and tuning of Spring, and we believe that with the proper amount of tuning, 
we can attain performance comparable to monolithic systems.

11.  References

[1] Nelson, M.N., Welch, B. B., and Ousterhout, J.K, "Caching in the Sprite Network File Sys-
tem," ACM Transactions on Computer Systems 6, 1 (Feb. 1988), pp. 134-154.

[2] Howard, J.H. ET AL., "Scale and Performance in a Distributed File System," ACM Transac-
tions on Computer Systems 6, 1 (Feb. 1988), pp. 51-81.

[3] Acceta, M. ET AL, "Mach: A New Kernel Foundation for UNIX Development," Proceed-
ings of the USENIX 1986 Summer Conference, June 1986.

[4] Abrosimov, V., Armand, F. and Ortega, M.I., "A Distributed Consistency Server for the 
CHORUS system," Proceedings of Third Symposium on Experiences with Distributed and 
Multiprocessor Systems, March 1992, pp. 129-148.

[5] Hamilton, K.G. and Kougiouris, P., "The Spring Nucleus: A Microkernel for Objects," Pro-
ceedings of the 1993 Summer USENIX Conference, June 1993, pp. 147-160.

[6] Khalidi, Y.A. and Nelson, M.N., "The Spring Virtual Memory System," Sun Microsystems 
Laboratories, Technical Report SMLI-92-388, Sept. 1992.

[7] Nelson, M. N. and Hamilton, G., "High Performance Dynamic Linking Through Caching," 
Proceedings of the 1993 Summer USENIX Conference, June 1993, pp. 253-266.

[8] Khalidi, Y. A. and Nelson, M. N., "An Implementation of UNIX on an Object-oriented 
Operating System," Proceedings of the 1993 Winter USENIX Conference, Jan. 1993, pp. 469-
480.

[9] Radia, S. R., Nelson, M. N., and Powell, M. L., "The Spring Name Service," Sun Microsys-
tems Laboratories, Technical Report.

[10] Khalidi, Y. A. and Nelson, M. N., "A Flexible External Pager Interface," Proceedings of 
the Second Symposium on Microkernels & Other Kernel Architectures, Sept. 1993.

[11] Hamilton, G., Powell, M. L., and Mitchell, J. G., "Subcontract: A Flexible Base for Dis-
tributed Programming," Proceedings of Fourteenth ACM Symposium on Operating System 
Principles, to appear Dec. 1993.

[12] Nelson, M. N., Khalidi, Y. A., and Madany, P. W., "The Spring File System," Sun Micro-
systems Laboratories, Technical Report SMLI 93-10, Feb. 1993.

[13] Sandberg, R. ET AL., "Design and Implementation of the Sun Network Filesystem," Pro-
ceedings of the 1985 Summer USENIX Conference, June 1985, 119-130.

[14] Li, K., Shared Virtual Memory on a Loosely Coupled Multiprocessor, Ph.D. Thesis, Yale 
University, 1986.

[15] Leach, P., Levine, P., Hamilton, J., and Stumpf, B., "The File System of an Integrated 
Local Network," Proceedings of the 1985 ACM Computer Science Conference, March 1985, 
309-324.

[16] Ramachandran, U. and Khalidi, Y.A., "An Implementation of Distributed Shared Mem-
ory," Software-Practice & Experience 21, 5 (May 1991), pp. 443-464. 

[17] Pike R., Presotto, D., Thompson, K., and Trickey, H., "Plan 9 from Bell Labs," Proceed-
ings of 1990 UKUUG Conference, July, 1990.

[18]  McVoy, L. W. and Kleiman, S. R., "Extent-like Performance from a UNIX File System," 
Proceedings of the 1991 Winter USENIX Conference, January 1991.

Information on Spring

To get technical reports and other information on the Spring project send email to Corrine Dreis-
bach at corrine@eng.sun.com.

Trademarks

Sun, Sun Microsystems, SunOS, and SPARCstation are trademarks or registered trademarks of 
Sun Microsystems, Inc. UNIX is a registered trademark of UNIX System Laboratories, Inc.
Storage File Server 
VMM 
file 
object 
pager 
object 
cache 
object 
file 
object 
The client has a file object that is implemented by the CFS. The CFS has a private cach-
ing communication channel with the storage file server. If the contents of the file object 
is cached by the VMM, then the VMM has a pager object implemented by the storage 
file server and the CFS has a copy of the VMM cache object.
FIGURE 5.	Desired Caching Structure
The page size is assumed to be 4 Kbytes. The first example involves reading and writing 
a 1 Mbyte file in its entirety where each read and write transfers 4 Kbytes of data. With-
out caching, 256 network reads and 256 network writes are required. The second exam-
ple involves accessing a 1 Mbyte file through the mapping interface. Without the zero-
fill optimization 256 network page faults are required if all the pages are touched.


Operation

Network 
Operations
Without 
Caching

Network 
Operations
With 
Caching

Create file

2

2

Write 1 Mbyte

256

1 (bind)

Read 1 Mbyte

256

0

Remove

1

1

Total

515

4

Create file

2

2

Map file

1

1

Set length

1

0

Modify pages

256

0

Total

260

3

TABLE 5. 	Possible improvements with caching

FIGURE 4.	State after a file has been mapped.
The file server implements the file object. When the file is mapped into the client's 
address space, a pager object is created at the file server and a cache object is cre-
ated at the client's VMM.
CFS 
The client has a file object that is implemented by the CFS. The private caching 
channel between the CFS and the storage file server shown in Figure 5 is actually 
the fs_cache object-fs_pager object pair plus the file object. The storage file serv-
er's pager object actually has an fs_cache object implemented by the CFS instead of 
the VM cache object shown in Figure 5. However, the VMM still has a direct pager 
connection to the storage file server.

FIGURE 1.	Spring Object
The client domain has an object that is implemented by a server domain. The client has 
a representation for the object that allows the invocations on the object to get to the 
server domain. The server keeps some state for the object.
Client
CFS 

FIGURE 2.	Pager-cache object example
A VMM and a pager have one or more two-way cache-pager object connections. In 
this example Pager 1 is the pager for two distinct memory objects cached by VMM 1 
so there are two pager-cache object connections, one for each memory object. Pager 
2 is the pager for a single memory object cached at both VMM 1 and VMM 2 so 
there is a pager-cache object connection between Pager 2 and each of the VMMs.
FIGURE 3.	Sample Name Space
A sample Spring name space that consists of fs_contexts and files implemented by 
the file system and other objects implemented by other domains. Files and 
fs_context objects can be bound and retrieved from fs_context objects or from 
other context objects. Although fs_context objects are normally used to store files 
and fs_context objects, other types of objects can be stored in fs_contexts as well.
FIGURE 6.	State after object is cached by CFS.
The representation of a file object consists of a cached file object that is imple-
mented by a CFS domain, a cacher name that names the CFS, and a handle to 
the storage file server domain.

FIGURE 7.	State after a cached bind