################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
		      Proceedings of the USENIX
	Microkernels and Other Kernel Architectures Symposium
	     San Diego, California, September 20-23, 1993


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


                     Design and Implementation of an
                 Object-Orientated 64-bit Single Address
                            Space Microkernel

             Kevin Murray, Tim Wilkinson (tim@cs.city.ac.uk),
	     Peter Osmon - SARC, City University
             Ashley Saulsbury - Swedish Institute of Computer Science
             Tom Stiemerling, Paul Kelly - Imperial College

                                Abstract

    In  the  mid  eighties,  the  System  Architecture  Research 
    Centre  at  City  University developed  a  message-passing, 
    UNIX  compliant  micro  kernel  (Meshix)  for  our  own scalable
    distributed memory architecture (Topsy).  Over the last two
    years we have been engaged in a research programme aimed at
    learning from this experience, and developing a new operating
    system based on these lessons.  The result is the Angel
    microkernel. This paper sets out the lessons we have learnt from
    Meshix, how this has influenced the design of Angel and outlines
    our current design of Angel and its C++ implementation.  We will
    also describe our future plans and hopes for Angel, and the
    lessons that we have learnt from the design and implementation
    process.

1    Introduction

Almost all modern operating systems are being designed using
microkernels [1, 2].  The microkernel architecture can be said to
encompass good software engineering practice: small code "units" that
are insulated from each other together with a minimal amount of critical
code (the microkernel).  They also introduce an "open system
architecture" due to the ease with which additional services can be
provided and used.

Virtually  without  exception,  however,  microkernel  architectures 
use  message  passing  as  the  basis of  communication,  implementing 
the  client-server  paradigm  upon  this  using  remote  procedure  call
techniques (RPC). Message passing offers an apparently ideal structuring
mechanism -- it isolates one "unit" from another, requires only a
minimal microkernel (message passing and process control), and allows
extra services to be provided simply by registering the service which
then receives and processes the messages.

Meshix is typical of such microkernel based,  message passing operating
systems and was developed several years ago [3]. Over the last few years
we have been looking at its structure and performance in a very critical
manner to decide how to improve upon it -- in essence we are trying to
evaluate whether or not the message passing microkernel is as good as it
seems.  This has shown there are a number of issues that have not yet
been addressed by most current message passing microkernel
architectures, or which have only been addressed with limited success or
requiring complex restructuring of the system. It is to tackle these
issues that Angel has been designed.

This paper will outline the issues behind the the design of Angel,  in
addition to the actual design and implementation of Angel.  Although
Angel moves away from a message passing structure, we will show how its
most important use -- RPC -- can be very efficiently implemented in
Angel using LRPC techniques, and how it maintains the essential
isolation of the systems into protected "units". We will also mention
some of the other work that we are applying to Angel, principally in the
areas of fault tolerance and scalable I/O systems.

2    Shortcomings of Meshix

The original goal of Meshix and the Topsy architecture was to produce a
scalable, parallel multiprocessor1 . To help achieve this goal a
dedicated point to point network with custom, virtual cut-through
routing chips were developed which supported a raw bandwidth of 10
Mbytes/sec.  To a reasonable extent the scalability goal has been
achieved.  Unfortunately its communications performance is only about
100 K/sec, as seen by a user process, and there is limited support for
parallel programming. The reason is largely to do with two factors:  the
nature of microkernels, and the adoption of UNIX. The adoption of System
Vr3 UNIX as the primary interface to Meshix means there is no support
for parallel programs, forces the use of UNIX heavyweight processes and
limits the IPC mechanisms.

2.1    Microkernel

In a microkernel architecture, there is an inherent performance loss
caused by information exchange between services and clients.  Typically
a client collects the information it needs in its own private address
space,  independently of the server.  When it wishes to exchange
information with a server (probably  to  obtain  some  service),  it 
first  creates  a  message  containing  this  information  and  then
requests the microkernel to convey this to the server.  Usually this
involves several context switches and some data copying, remapping or
cross-machine transferral, all of which are known to be costly actions.

 The  Chorus  group  [4]  (among  others),  has  done  much  work  to 
overcome  this.   The  methods  used include replacing context-dependent
addresses with unique addresses, so speeding up message delivery whilst
reducing security, combining mutually trusted servers into a single
address space (and hence protection domain), so reducing context
switches, and by placing all of the IPC management into the microkernel.
 In addition they use the lightweight RPC optimisation developed for the
DEC Firefly system [5] to improve the speed of RPCs. All of these
modifications have required non-trivial alterations to the operating
system's structure and increased the complexity of the system.  It is
our belief that communication (or more generally co-operation), despite
the above optimisations, is still slower than desirable and more complex
an operation than need be.

 We performed a set of detailed measurements of the speed of the Meshix
communication system [6, 7] to help identify the causes of
communications costs, to better understand these costs, to find ways to
reduce or eliminate them and to help develop a simpler mechanism.  This
study concluded that many, though not all, of the costs are an inherent
fact of using message passing in multiple protection domains and the
numerous context switches and data copying or remapping that this
caused.  During this study, it also became apparent that much, often
unmeasured, time was spent in preparing the data for transfer.

 As a comparison, we modelled the behaviour of a very simple distributed
shared memory (DSM [8]) scheme with an amount of hardware assistance
comparable with the current Meshix message passing system.  The
conclusion was that this would easily outperform the current message
passing system, used in Meshix.  This lead us to believe that the shared
memory paradigm should be at the base of future parallel operating
systems, replacing the message passing that is currently in use.  This
is in agreement with several other researchers and manufacturers [9, 10,
11].

 2.2  UNIX

The UNIX process model provides every process with the illusion of a
complete computer for itself. But whenever the process tries to access
anything other than the processor or the data currently residing in
memory, it may suffer a context switch as another process is given the
chance to run. A context switch involves the exchange of a large amount
of information beyond the processor's context including the memory map
and extensive operating system information. This is called a heavyweight
process model.

Since Meshix provides a UNIX programming model, the process model it
implements is that of UNIX: distinct  heavyweight  processes.   The 
heavyweight  process  model  is  costly  [12]  due  to  its  extensive
amounts of state,  and this reduces the benefits of writing programs in
parallel,  although this may be overcome to some extent using threads
packages.  Unfortunately, these threads are not real first class objects
in the operating system and certain system operations for one thread
affect others (eg. blocking). Even then it is still nearly impossible to
share these threads between processes, unlike such systems as Psyche
[13], to achieve the required flexibility (such as cost effective load
balancing).  For efficient parallel programming a lightweight, flexible
and extensible process model is needed.  This is one where changing from
one thread to another is exceptionally cheap, where the actions of one
thread do not necessarily affect another and where exchanging processes
should also be cheap. 

2.3    Support for Parallel Programs

Meshix provides no synchronisation primitives other than those implicit
in message passing. Any others are built using messages.  This means
that if co-operating processes need to use synchronisation other than
messaging, it is slow and limited by the characteristics of the
messaging system.  In a scalable parallel machine, real parallel
programs will require efficient and varied synchronisation mechanisms
(e.g. barrier synchronisation) and better support for them must be
provided.

Additionally, much work has been done on load balancing in many systems
and support for this is important to parallel programs since it is
necessary for them to distribute their work over the machine
efficiently.  With a general purpose computer, the load on various parts
of the system can change, and to maintain the efficiency of the
applications running on such a system, it is necessary to re-allocate
work between available processors.  This is at the heart of load
balancing, and naturally in a scalable parallel system it is important
that, at the very least, there is support to allow this to be done.

3    The Angel Design and Single Address Space Architectures

>From our experience with Meshix, and as a result of our studies both of
Meshix and other systems, it was decided that Angel should have the
following characteristics:

 *	It should not support message passing, but use shared memory to
    support a single address space. This decision was taken to tackle
    two problems:  the lack of speed of the message passing model, as
    outlined above, and to improve the context switch time by removing
    the need to flush various caches, which has been noted as the most
    costly part of the context switch operation.

*	It should provide a protection mechanism which is not part of the
    process. This decision was taken to allow a far greater flexibility
    in protection scheme, and allow more than one process to operate
    within the same domain to increase speed when necessary.  It is also
    a logical step following the above point in which we had divested
    address translation from the process.

*	It should allow processes to be informed of the actions of the
    operating system on their behalf. This decision is aimed at allowing
    threads within a process to become first class citizens of the
    operating system and to allow the process to partake in scheduling
    decisions that may affect it.

*	It should use a minimal microkernel. None of our studies of Meshix
    showed a flaw in the microkernel design; in fact many of our
    experiences with Meshix have shown how vital the microkernel design
    is. The problems identified have been tackled by the above
    alterations to the architecture, so Angel remains a microkernel. 
    However, as the implementation section will show, there is even less
    in the Angel kernel than in many other microkernels.

The following sections will outline the main characteristics of the
Angel design.

3.1    SASA

Most importantly, Angel is a Single Address Space Architecture (SASA),
like such systems as Multics [14], Psyche [15], and Opal [16]. A SASA is
one in which there is only one address space shared by the entire system
(all the processes, servers and the kernel). This is in contrast to the
UNIX approach whereby every process has its own unique address space. 
This has several benefits:  it improves and simplifies data sharing,
helps cache performance, and blurs the distinction between shared memory
and distributed memory machines. The SASA is maintained between multiple
processors using shared memory techniques. The SASA has become feasible
with the appearance of large address space processors [17], enabling
many processes to consume addresses from the same range without
exhausting the supply.

This address space is managed as persistent objects (contiguous groups
of pages).  Not only does this remove the need for an explicit  "file
system" interface (with a different namespace and explicit system calls)
but greatl simplifies the storage of complex structures, databases, etc.

3.2    Protection Issues

In Unix one process is protected from another by the use of separate
address spaces.  In a SASA all processes shar the same address space, so
separating protection from address translation, and hence a new scheme
is needed to provide protection. This has also caused some researchers
to propose alterations to the traditional memory and protection hardware
with the addition of new hardware support [18]. The protection scheme
must define two areas within which it works:  the unit of protection,
and the method used to specify and meet access requirements.

              Figure 1: The structure of ACDs and biscuits

In Angel, protection is provided on objects which consist of one or more
pages. Objects cannot overlap, nor may they be contained within other
objects. A critical server in Angel is the object manager which is
responsible for allocating addresses to objects and for validating
access to objects. For every object, the object manager associated with
it one or more Access Control Descriptors (ACD) which describe the other
objects that must be accessible before this object may be accessed. 

An example of this structure is shown in figure 1 in which one object
has three ACDs associated with it; one of them has part of its
permissions tree show. In this example, to gain write access to the
object, the process must already have read and write access to objects
A, B and C.

When an object is created,  or when a new ACD is associated with an
object,  the object manager gives out  a  biscuit  from  which  it  can 
reliably  identify  the  valid  corresponding  ACD.  Conceptually there
is only one biscuit per ACD despite processes being free to duplicate
this as frequently as they like. When a process wishes to access an
object, it presents this biscuit to the object manager.  The biscuit is
then used to determine if the process possesses the necessary objects to
resolve the requested object. Consequently, the system does not have the
concept of user identifiers. However, it is trivial to implement such a
system by creating an object whose sole purpose is to act as a "user id"
for access checking.

3.3    Support for Parallel Programs

Angel supports first class threads and uses upcalls for inter-process
and kernel-process signalling (see section 4.2).  Their purpose is to
allow process to be informed external events in which they have declared
an interest, eg.  the release of locks, the arrival of new work, a page
fault or a pending time slice.  By passing such information onto the
process, the process is able to make its own decisions on what to run
and to take remedial action (e.g. release a lock) when decisions are
imposed upon it.

The DSM supported  by  Angel  allows  the  construction  of  locks  such
as  spin  locks  with  reasonable efficiency.  When combined with the
upcall mechanism,  it is simple for a thread to "sleep" and be "woken"
at some later date.  This provides asynchronous systems, not possible
with shared memory alone.

The SASA that lies at the heart of Angel makes implementing load
balancing trivial.  As all processes and threads on all processors exist
within a single address space that also contains all the necessary
kernel information, moving a process or thread from one physical
processor to another simply involves loading the processor context for
the thread into the new processor.  The DSM that implements the SASA
will then move any necessary data as it is accessed.  The design of
Angel as it stands will not automatically load balance work for a
process, but this can easily be provided through library routines.

3.4   The Angel Model

                    Figure 2: The Angel Process Model

Figure 2 shows how the above points are combined into the process model
that Angel supports. Within any process there may be one or more
threads.  Threads may run in their own domain, as is the case with
thread 1 in the diagram.  This allows several threads in the same
process to be protected from each other.  Alternatively several threads
may share a protection domain, potentially between different processes,
as is the case with threads 2 to 4.  Where threads do not wish to share
a protection domain for security or trust reasons,  they may have some
mutually shared objects,  as is the case with the remaining threads.

3.5    Fault tolerance

It is possible to build a scalable, efficient fault tolerance scheme in
a SASA based operating system. This relies on the unification of
resources to simplify the implementation, and the augmentation of the
DSM system in order to capture the data interactions necessary to make
distributed checkpoints.  Unlike other schemes [19,20] where excessive
DSM activity can result in large number of checkpoints being made, we
allow general data sharing without checkpoints, instead utilising the
DSM state information to determine which data depends on which.  This
allows distributed checkpoints to be made which will only effect
processes which are interacting, and also allows the DSM mechanism to be
reused for checkpointing data to other machines' memories.  Experiments
indicate this costs only an additional 10% on an applications execution
time. A full description can be found in [21].

4    The Angel implementation

                        Figure 3: Angel structure

Figure 3 depicts the general structure of the Angel operating system.
This structure has few differences from more conventional, message
passing microkernel designs.  However, the use of a single address space
and shared memory for communications has significantly simplified the
microkernel.  Currently, the implementation consists of 2,500 lines of
C++ code and 1,000 lines of include files. This constitutes the virtual
memory, the distributed shared memory and the device management systems
but not the device drivers themselves.

At time of writing, we have completed initial work on the microkernel
and client/server communication system.  The microkernel provides two
major services:

1.	Persistent virtual memory, and

2.	Virtual processor management.

The client/server communications are implemented using "lightweight" RPCs.

4.1    Persistent virtual memory

                  Figure 4: Object orientated VM system

The virtual memory (VM) system is the heart of Angel since it supports
the persistent single address space.  The single address space nature of
the VM enables some simplifications of the structure to be made but the
persistence introduces other complications.

Figure 4 demonstates the events in the VM system initiated by a page
fault. Page faults are generated by the mmuDevice, a processor dependent
object responsible for collecting all necessary information regarding
the fault, and passed into the main, processor independent code,
vmFault.  This determines whether the fault is legitimate (user attempts
to access supervisor data are caught here) and requests the  relevant 
page  from  the  tlbCache.   The  tlbCache  first  determines  whether 
the  access  was  to  an object accessible by the virtual processor
(using the ddl which describes this relationship).  If it was not, a
fault condition is returned.  If it was, the accessed address is used to
form a pageID, an unique identifier for data in time and space.  These
pageIDs are used to support data aliasing  necessary for the
copy-on-write mechanism.  The pageID is then used by the coreMap to
locate the relevant data. The local coreMap memory is first searched for
data corresponding to this ID. If found, the page is returned for
installation by the mmuDevice.  If not found, the coreMap allocates an
empty core page and request the dataManager to find the data and install
it. The dataManager does this by consulting both the network (which
provides the DSM system) and the disk. 

Several point in this VM system are worth special mention. First, the
ddl is held in the user environment, so allowing it to be treated as any
other object, sharable via the DSM and swappable onto disk.  This
prevents consumption of valuable kernel resources and allows the user to
easily determine attributes of their envionment without the
microkernel's assistance. Second, the devices (network and disk) are
accessed through an LRPC interface (see section 4.3). This allows them
to be installed externally from the microkernel if desired although the
LRPC mechanism will automatically optimise this interface when this is
not the case. Currently, these devices are contained within the kernel
but we are planning to make them loadable kernel-level device drivers in
order to improve modularity and flexibility without compromising
performance. Third, at various stages, the VM system may reach a point
where it cannot continue immediately.  This may be the result of a fatal
error (eg.  an access is made to an object not available to the user) or
a temporary error (eg. the requested data must be fetched from disk). In
these cases, the error is reported back to the virtual processor by use
of an upcall.  This enables the virtual processor to reschedule another
thread.

4.2    Virtual processor management

The microkernel attempts to impose little process structure on the
application or programmer. Unlike POSIX therefore, it does not
implicitly provide such services as file descriptors, "death of child"
signals or other heavyweight features. Consequently the process
structure, termed a virtual processor (VP), leaves much of the general
management work to the application.  This presents no additional problem
since it can be encapsulated in libraries.

A virtual processor operates around two general data structures; its
domain descriptor list (ddl) and its upcall list.  The ddl holds
information about all object the virtual processor has access to.  As
already mentioned, this object is used by the virtual memory system to
determine the validity of memory accesses.  However, it also holds
information for processor management; such as which objects may be
signalled using upcalls, and which object was initally executed.

The upcall list is the virtual processors' interrupt mechanism and is
used by both kernel and other VPs for preempting each other when
important events occurs. These events include:

*	Alarms,

*	Invalid memory accesses,

*	Temporarily invalid memory accesses, and

*	Lock releases.

The first three of these events are microkernel generated;  the forth is
generated by user level code associated with the release of mutual
exclusion locks or conditional variables.

Upcalls are a fixed sized structure, convey little information, and will
not be delivered if the recipient has insufficient resources to receive
them.  Each one identifies its sender, its type and two further type
specific pieces of information (eg. Invalid memory accesses report the
failed address and reason for the failure; lock releases report the
address of the locking structure.).  The VP can precisely control the
effect each upcall has when it delivered, determining whether a handler
is invoked immediately, whether the upcall is queued for later
attention, or whether the upcall is ignored completely.  By default, all
upcalls are ignored unless the VP specifies otherwise.  This generally
means that upcalls are simply discarded without effect although \invalid
memory accesses" will terminate the VP.

4.2.1    Threaded virtual processes

Angel does not explicitly support threaded processes, leaving this to
user level code. However, through the use of kernel and user level
upcalls, it still provides facility for a "first class citizen" thread
model. For example,  in the kernel,  whenever a situation occurs where
it should block,  the VP is upcalled to allow another threads to be
scheduled.  Similarly, user level locks can use this facility in
parallel programs or client/server relationships (we use this heavily in
the LRPC mechanism).  At the user level, a POSIX thread model [22] is
provided.  The operation of POSIX threads is well documented, but it is
worth nothing how this model interfaces to Angel's upcall system in
order to provide \first class citizens".

All locks are implemented in shared objects.  For mutual exclusion
locks, if a lock is not obtained, the failed thread inserts itself into
the lock's pending queue. The thread scheduler is then called to
dispatch another, the VP blocking if there are no others ready to run. 
When the lock is released, the releasing thread examines the head of the
pending queue and releases the top thread.  If this thread is within the
same protection domain, the operation can be accomplished locally.  If
not, a lock release upcall is dispatched to the appropriate VP. On
receiving this, the thread is released locally. The mechanism used for
conditional variables is similar to this except that the thread release
is delayed until the associated lock is released.  By placing locks in
shared memory, the operations of obtaining and releasing locks is
greatly simplified and the need to consider whether a thread is local or
remote is hidden.

4.3    Client/Server Communications

Like many commercial and research operating systems, Angel uses the
notion of clients and servers in order to improve the functional
modularity of the system.  However, unlike many of its predecessors,
message passing is not used to implement RPC communication, instead this
is done through shared memory  regions.   This  approach  enables  a 
more  "lighweight"  RPC  mechanism  to  be  implemented (based on work
by Bershad et al [5]).

Angel's  LRPC  mechanism  operates  by  the  sharing  of  C++  objects 
in  sections  of  shared  memory. These objects are passed between
client and server by manipulation of shared lists and the release of the
associated locks.  However, optimisations in this mechanism are possible
if both client and server operate in the same protection domain. In such
cases a direct subroutine call can be made from client to server so
avoiding the need for locking altogether.  This optimisation can be
determined when the LRPC channel is established rather than at compile
time so providing greater flexibility. 

4.3.1    LRPC example

Figure 5 illustates a simple client/server interaction using a shared
memory object for communication. This object constitues a private
channel between parties, available in their protection domains only
(although one-to-many channels are no more difficult to arrange).

   Figure 5: Lightweight RPC object shared between a client and server

In conventional RPC, a client makes a request of the server by packaging
data to be transfered and then informing the server of its intentions.
The server then unpackages the request, performs the work, and replies
to the client using a similar RPC mechanism. LRPC in Angel benefits over
such a system in two ways; first, the use of shared memory reduces the
need to package data, in some cases removing it altogether; and second,
implict encapsulation of the client/server relationship in C++ classes
simplifies and hides access to the interface.

For  example,  the  server  in  figure  5  maintains  the  private 
database  holding  users'  information.   A client wishing to search
this database (such as /bin/ls -l) must make requests via an LRPC
channel. However, rather than constructing and copying requests to the
server, a passwdEntry object can be allocated which is already shared
with the server using the C++ placement operators (eg. overloading of
operator new()). This object can then be used as normal within the
client, the interaction with the server happening transparently and
without extra copying by either party.

4.4    Current status

The majority of development work has been done by operating the
microkernel as an emulation under SunOS UNIX. However,  in order to
validate the system and determine whether our efforts to keep the
dependent and independent code seperate have been successful, we
recently ported the kernel to a Tadpole M88K system.  This work took a
week to complete despite the need to write a new two-level MMU system
and although some restructuring has resulted, no major problems were
encountered.

However, neither of these systems are appropriate to Angel's needs due
to the restricted address space. Currently we are investigating a port
to either an SGI Indigo or DEC Alpha PC either of which is more
appropriate.

5    Lessons and Further Work

The most "politically difficult" decision to make regarding Angel was to
forego UNIX compatibility. It is acknowledged that if an SASA style
operating system is to accepted, then it must provide support for UNIX
and its existing software base.  As a first step we have investigated
modifying compilers to generate code that gave the appearance of UNIX
memory semantics.  This resulted in a performance penalty of only a few
percent [23].  We are now investigating a full UNIX service under Angel.
 It appears that a reasonable degree of compatibility can be provided at
low cost, without altering the SASA to provide a region of memory
addresses with UNIX characteristics. 

The fault tolerance mechanism described above has been designed,
implemented and analysed on a simulator, rather than in the current
Angel implementation.  One, relatively simple, task is therefore to
implement this scheme in the current microkernel.  Once this has been
done we hope to study the performance of the system and see if it can be
further improved.

The main area of future work lies in dealing with the projected large
I/O requirements that a parallel computer will generate.  Many current
parallel computers are badly I/O limited, and overcoming this bottleneck
is extremely important in opening up new markets for parallel machines.
There are several schemes we are currently investigated to perform this,
the most hopeful is to make use of the algorithms from the fault
tolerance scheme which generates a distributed log stream of data for
storage on disk.

6    Conclusions

This research was conceived as an exercise in learning from Meshix (and
other message passing micro- kernels); the result is the Angel operating
system, which is still a micro-kernel, but is based around a SASA
supported by DSM, and not around message passing. The current
implementation is small, and has been easy to write, which leads us to
believe that we have constructed a good design, and that a SASA is the
way to build systems. There are other benefits from this approach which
are important to scalability, for example in the areas of fault
tolerance, data sharing and load balancing.  Although we have not
developed the system with UNIX support it mind, it appears that we can
provide a simple version of this at very low overheads. All these points
lead us to believe that SASAs are an important way of constructing
operating systems, especially for scalable, parallel machines.

7    Authors Information

Dr Kevin Murray's thesis work at the University of York concerned the
development of Wisdom, an operating system designed to support a
high-level programming environment on a DMMP conforming to a subset of
the ANSA transparency model. In addition, he contributed to the
development of Wisdom's filesystem.   He  then  worked  at  Imperial 
College,  in  collaboration  with  the  Systems  Architecture Research 
Centre,  on  the  Angel  operating  system  concentrating  on  its 
scheduling  and  inter-process communications aspects, before being
appointed lecturer at City University, where he has remained heavily
involved in the Angel work.

Tim Wilkinson has worked extensively on the Topsy project including work
on the Meshix OS and Meshnet  communications  chips.   His  PhD  work, 
now  nearly  completed,  centres  around  the  design of a reliable
64-bit distributed operating system using data dependent checkpoints. 
He is currently employed on the Angel operating system project.

Prof. Peter Osmon is head of the Systems Architecture Research Centre at
City University.  He was Principal Investigator on the Alvey-funded
Cobweb project.   He conceived and directed the Topsy Unix multicomputer
project.  He has a current IED grant with Phoenix VLSI and Texas
Instruments concerned with the design of an interface device to support
shared-memory over a serial interconnect (ICTVS, reference number
GR/F99618), and he is Principal Investigator of the SERC funded project
developing the Angel kernel (GR/G28277).

Dr. Tom Stiemerling has worked on implementing DVSM on Topsy, and the
specification and imple- mentation of the Angel kernel, and is supported
by SERC research grant GR/G 28277.  His doctoral work carried out at
Edinburgh University involved the performance analysis by simulation of
a shared memory multiprocessor architecture.

Dr. Paul Kelly is a lecturer in the Department of Computing at Imperial
College. He was a researcher on the Alvey-funded Cobweb project.  His
doctoral work led in part to IED projects on functional programming of
transputer networks, and exploitation of more general parallel hardware
using func- tional languages and program transformation.  More recently
he has collaborated in the development of Paragon,  an object-oriented
graph-rewriting language,  and is also an investigator on the related
SERC-funded project developing the Angel kernel at Imperial (GR/G23562).

Bibliography

[1]   Open Software Foundation, "The OSF/1 operating system," in Spring
1991 EurOpen Conference, pp. 33{41, 1991.

[2]   M. Rozier, "Overview of the CHORUS Distributed Operating Systems,"
Tech. Rep. CS-TR-90-25, Chorus Systemes, 1990.

[3]   P.  Winterbottom  and  P.  Osmon,  "Topsy:  An  Extensible  Unix 
Multicomputer,"  in  UK  IT90 Conference, Southampton University, 1990.

[4]   A. Bricker, "A new look at micro-kernel-based UNIX operating
systems: Lessons in performance and compatability," in EurOpen Spring'91
Conference, Tromsoe, Norway, May 1991.

[5]   B. Bershad, T. Anderson, E. Lazowska, and H. Levy, "Lightweight
remote procedure call," ACM Operating Systems Review, vol. 23, pp.
102{113, December 1989.

[6]   P. Osmon, T. Stiemerling, A. Whitcroft, Wilkinson.T., and N.
Williams, "Evaluating Meshix -- a Unix compatible micro-kernel Operating
System," in OpenForum'92, November 1992.

[7]   A. Whitcroft and P. Osmon, "The CBIC: Architectural Support for
Message Passing or Shared Memory?," in U.K. Performance Engineering
Workshop, September 1992.

[8]   K. Li, Shared Virtual Memory on Loosely Coupled Multiprocessors. 
PhD thesis, Yale University, Department of Computer Science, 1986.

[9]   U.  Ramachandran,  G.  Shah,  S.  Ravikumar,  and  J. 
Muthukumarasamy,  "Scalability  study  of the KSR-1," Tech. Rep.
GIT-CC93/03, College of Computing, Georgia Institute of Computing,
Atlanta, Georgia, 1993.

[10]  E. Hagersten, A. Landin, and S. Haridi, "DDM -- A Cache-only
Memory Architecture," Tech. Rep. Research Report R91:19, SICS, Sweden,
November 1991.

[11]  M. Hill, J. Larus, S. Reinhardt, and D. Wood, "Cooperative shared
memory: software and hardware for scalable multiprocessors," in ASPLOS
V, pp. 262{273, September 1992.

[12]  J. C. Mogul and A. Borg, "The Effect of Context Switches on Cache
Performance," in ASPLOS, International Conf. on Architectural Support
for Programming Languages and Operating Systems, (Santa Clara, CA
(USA)), pp. 75{85, April 1991.

[13]  B. Marsh, M. Scott, T. LeBlanc, and E. Markatos, "First-Class
User-Level Threads," Tech. Rep., Computer Science Department, University
of Rochester, NY, 1991.

[14]  E. Organick, The Multics system: an examination of its structure.
M.I.T. Press, 1972.

[15]  M. Scott, T. LeBlanc, B. Marsh, T. Becker, C. Dubnicki, E.
Markatos, and N. Smithline, "Implementation  Issues  for  the  Psyche 
Operating  System,"  Tech.  Rep.,  University  or  Rochester, Department
of Computer Science, 1988.

[16]  J. Chase, H. Levy, M. Baker-Harvey, and E. Lazowska, "How to Use a
64-Bit Virtual Address Space," Tech. Rep. 92-03-02, Department of
Computer Science and Engineering, University of Washington, March 1992.

[17]  Dobberpuhl et al., "A 200Mhz 64-bit Dual Issue CMOS
Microprocessor," in International Solid-State Circuits Conference,
February 1992.

[18]  E. Koldinger, J. Chase, and S. Eggers, "Architectural support for
single address space operating systems," in ASPLOS V, pp. 175{186,
September 1992.

[19]  K.-L. Wu and W. Fuchs, "Recoverable distributed shared virtual
memory," IEEE Transactions on Computing, vol. 39, pp. 460{469, April
1990.

[20]  B. Fleisch, "Reliable distributed shared memory," in IEEE Workshop
on Experimental Distributed Systems, pp. 102{105, 1990.

[21]  T. Wilkinson, "Implementing Fault Tolerance in a 64-bit
Distributed Operating System," Tech. Rep., City University, 1993.

[22]  POSIX 1003.4a, "Threads Extension." IEEE Draft.

[23]  T.  Wilkinson  et  al.,  "Compiling  for  a  64-Bit  Single 
Address  Space  Architecture,"  Tech.  Rep. TCU/SARC/1993/1, SARC, City
University Computer Science Department, March 1993.

---------------------------Cut here-------------------------------