Filesystem Daemons as a Unifying Mechanism
for Network Information Access*
Steve Summit
Consultant, Seattle, Washington
Abstract
As the Net burgeons, new tools and protocols are being
introduced to permit some orderly use to be made of the wealth of
information available. These new protocols, however, often
presuppose the use of new, nonstandard, highly interactive user
interfaces. This paper presents a mechanism for unifying access
to diverse network services through filesystem daemons, which
allow network information services to be treated as if they were
conventional files and directories, residing in the local
namespace, and accessed transparently with standard tools.
Besides normal filesystem operations ("open", "read", "write",
etc.), the daemons may introduce extended operations, which
provide generic access to such features as network database
lookup operations.
1. Introduction
As the Internet grows, and as more and more information
becomes available, it becomes more difficult to locate
information of interest. Numerous protocols, such as Gopher
[Anklesaria 93], the World-Wide Web [WWW 93], and WAIS [Kahle 89,
Davis 90] have been introduced in an attempt to make network
browsing and information retrieval more convenient and productive
[Krol 92]. These protocols, and their associated user interface
tools, represent important advances, and they make the resource
discovery problem much more tractable. However, since they are
intended as browsing tools, their default user interfaces tend to
be highly interactive [1]. Furthermore, as the new protocols are
still somewhat experimental, their user interfaces are rather
idiosyncratic. Though perfectly adequate for casual browsing,
they do not lend themselves immediately to integration into
larger toolsets.
It is natural to wonder whether that venerable old UNIX
notion that "everything's a file" can be extended to these newer
network information sources. (Conventional network files have of
course been so available for some time; see section 9.) Under
this extended paradigm, the filesystem interface becomes a
rendezvous point between network services and local utilities.
It is possible to explore gopherspace, or the World-Wide Web, or
__________
* Copyright 1994 USENIX Association.
1. Even ftp, which the new protocols are intended to supersede,
has this problem.
USENIX -- Winter '94 1
Summit Filesystem Daemons as a Unifying Mechanism...
other information spaces, using one's own preferred shell, moving
around with cd, viewing and searching "files" with standard tools
such as more and grep, copying them to one's home directory with
cp, and otherwise making use of a large, open, familiar, rich,
programmable environment.
This paper describes a prototype implementation which
provides such transparent access. The implementation is built up
in several stages, and that structure will be reflected in the
body of the paper. Section 2 introduces the idea of filesystem
daemons, which permit a process to be invoked to handle any or
all of the I/O associated with any name in the filesystem.
Section 3 reviews the straightforward application of such daemons
in implementing transparent remote filesystem access (e.g. via
anonymous ftp). Section 4 extends the remote access paradigm to
the realm of non-filesystem-oriented data such as gopher and WWW,
and proposes relaxing the distinction between files and
directories (it is possible to view some pieces of structured or
indexed information as both). Section 5 suggests extensions to
the basic set of I/O operations, in order to permit advantageous
use of such services as remote database search facilities.
Section 6 describes details of the implementation, and section 7
discusses performance. Section 8 outlines open issues and future
work; section 9 compares related work.
The implementation described runs entirely in user mode;
echoing a refrain of USENIX contributions past, "no kernel
changes are required."
2. Filesystem Daemons
The underlying mechanism on which the ideas in this paper are
built is an extended filesystem tentatively named the Object-
Oriented Filesystem, or OOFS [2]. The salient feature of this
filesystem is that any file may enjoy customized I/O handling via
the services of an associated process, or filesystem daemon. In
effect, each file may be treated as an object to which I/O
messages are passed. (A similar scheme is described in
[Bershad 88].)
Under OOFS, each UNIX system call which accepts a pathname is
intercepted and the pathname inspected for the presence of a
daemon attached to the referenced file, or to one of its parent
directories. If a daemon is found, it is invoked, and a
communication path is set up along which are passed messages
which request further lookup and I/O operations. OOFS also
intercepts system calls which accept file descriptors; I/O on
descriptors which refer to daemon-opened files again results in
messages being passed to the daemon for processing, rather than
direct invocation of the conventional kernel routine. Daemons
may respond to I/O requests in arbitrary ways, constructing the
__________
2. An admittedly obvious name.
USENIX -- Winter '94 2
Summit Filesystem Daemons as a Unifying Mechanism...
file's data on the fly, if they so wish [3].
The protocol between a daemon and its OOFS-aware client is
simple and -- very importantly -- extensible. (Further rationale
behind, and ramifications of, this extensibility are discussed in
section 5; details of the daemon interface protocol can be found
in section 6.) For the purposes of the following discussion, it
suffices to know that daemon requests are specified by text
strings; most are named after the UNIX system calls they
implement.
The OOFS mechanisms support conventional, slash-separated,
UNIX-style pathname syntax. The semantics can however be
arbitrarily tangled; as we shall see, various path components may
end up representing access methods, machine names, etc.
Furthermore, if a daemon requires any special parameters, these
must typically be passed to it using daemon-specific syntax
grafted on to the pathname. (Similar gyrations are discussed in
[Roome 92].)
Filesystem daemons are obviously useful for purposes other
than networking. They can be used to implement access control
lists, self-decompressing files, self-extracting archives,
versioning systems [Korn 90, Roome 92], mailbox directories, and
NNTP-based /usr/spool/news directories, among other things, but
those applications are outside the scope of this paper.
3. Remote Filesystem Access
Once things are set up so that daemons can intercept and
provide special processing for certain pathnames, the groundwork
is obviously laid for transparent network file access. For
example, I have implemented a sort of "poor man's NFS" by
attaching a daemon to the pseudodirectory /ftp. (This technique
is very similar to an approach used by the Alex system
[Cate 92].) This daemon handles all pathname components beneath
the /ftp attachment point, such that pathnames of the form
/ftp/machine/path
are interpreted as a path accessed via anonymous ftp on a certain
machine. The daemon acts as an ftp client, connecting to the
specified machine's ftp server and performing appropriate ftp
protocol transactions as necessary to satisfy its client's (that
is, the daemon's clients) I/O requests. Under this scheme, the
ftp user command becomes superfluous; files can be copied by
anonymous ftp with the cp command (or moved with mv). For
example, the command
cp /ftp/ftp.nisc.sri.com/rfc/rfc1149.txt .
__________
3. Eggert and Parker discuss such "intensionalized" files
extensively in [Eggert 93].
USENIX -- Winter '94 3
Summit Filesystem Daemons as a Unifying Mechanism...
retrieves a file via anonymous ftp from site ftp.nisc.sri.com; no
manual interaction with an ftp client is required.
4. Remote Non-Filesystem Access
The preceding two sections are by way of prelude; they do not
represent anything that has not been done several times before.
This paper's principal contribution is the idea of accessing,
transparently and as if part of the local filesystem, non-
filesystem-structured network information services.
4.1. Gopher
The Internet Gopher protocol [Anklesaria 93] presents a
hierarchical tree of information nodes, residing potentially on
many machines. Nodes are typed: some are simple pieces of text,
others are "menus" (analogous to directories) pointing at other
nodes, still others are simple search engines. A simple,
connectionless protocol allows a node's contents to be fetched;
nodes which are directories return information about each subnode
contained: its type, the machine on which it resides, and the
identifying tag by which it can be fetched.
The conventional user interfaces to the Internet Gopher are
menu based: each time a selection is made from a gopher menu, the
user is presented either with another menu, or with a piece of
text or other information. The invocation style is highly
interactive; if there is a way to save oneself a copy of an
interesting node, it is neither the way that one saves
interesting mail messages nor news articles nor files discovered
while exploring the net using ftp or some tool other than gopher.
Gopherspace can be presented as a filesystem, however, by an
OOFS daemon which maps gopher menu nodes to directories and other
nodes to files. This daemon interprets pathnames of the form
/gopher/machine-name
as the root of the gopherspace tree on the host named machine-
name, and it is therefore possible to explore gopherspace using
one's favorite shell, using ls to see the contents of a "menu,"
cd to move among menus, and cat, more, or cp to view or save
nodes of interest.
Figure 1 shows a sample interaction using the OOFS gopher
daemon.
It must not be suggested here that the mapping from
gopherspace to a simulated filesystem is perfect. The entries in
gopher menus tend to be multiword titles, not the shorter, single-
USENIX -- Winter '94 4
Summit Filesystem Daemons as a Unifying Mechanism...
________________________________________________________________
| $ cd /gopher/gopher.micro.umn.edu |
| $ ls -l |
| dr--r--r-- 1 0 0 Dec 31 1969 Information About Gopher |
| dr--r--r-- 1 0 0 Dec 31 1969 Computer Information |
| dr--r--r-- 1 0 0 Dec 31 1969 Discussion Groups |
| dr--r--r-- 1 0 0 Dec 31 1969 Fun & Games |
| ... |
| |
| $ cd Inf* |
| $ pwd |
| /gopher/gopher.micro.umn.edu/Information About Gopher |
| $ ls -l |
| -r--r--r-- 1 0 0 Dec 31 1969 About Gopher |
| ?r--r--r-- 1 0 0 Dec 31 1969 Search Gopher News |
| dr--r--r-- 1 0 0 Dec 31 1969 Gopher News Archive |
| dr--r--r-- 1 0 0 Dec 31 1969 comp.infosystems.gopher |
| ... |
| |
| $ cat Abou* |
| This is the University of Minnesota Computer & Information |
| Services Gopher Consultant service. |
| ... |
| |
| Figure 1 |
| Sample Gopher Session |
|________________________________________________________________|
word names one expects to see in directories [4]. As the example
in Figure 1 shows, gopher nodes have neither meaningful owners,
sizes, nor modification times [5].
The mapping is not without compensating merits, however: to a
user who prefers a conventional shell, and dislikes using
different interfaces in different contexts, the aberrations
visible when trying to treat gopherspace as a filesystem are no
more jarring than the differences between a conventional gopher
client and that preferred shell. More importantly, access to
gopherspace via the filesystem and a programmable shell leaves
open the possibility of building new interaction or processing
utilities using a toolkit approach.
__________
4. To ease the resulting burden slightly, the prototype gopher
OOFS daemon implements a simple implicit wildcard scheme when
matching pathnames against gopher menus: a name not otherwise
matched will match a name of which it is an initial prefix,
if unique. (Purists will note that this twist is entirely
unnecessary, users could otherwise make frequent use of shell
globbing when cd'ing through gopherspace.)
5. Some nodes, such as gopher search engines, may not even map
to a proper UNIX "file type;" as the example shows, the ls
command is unable to interpret the S_IFMT bits, and displays
an unusual leading '?'.
USENIX -- Winter '94 5
Summit Filesystem Daemons as a Unifying Mechanism...
4.2. World-Wide Web
The World-Wide Web [WWW 93] is based on two functionally
distinct ideas. The first is a connectionless information
retrieval protocol, much like gopher's, by which a named text
entity is retrieved from a server. The second, and more
significant aspect of WWW is its hypertext model: in general, all
text entities contain hypertext links to related entities.
Interaction with WWW consists entirely of chasing links to narrow
in upon information of interest (or just to explore).
The existing WWW interaction tools, even the lowest-common-
denominator text-only version, are excellently written. Besides
formatting WWW's embedded HTML (Hypertext Markup Language)
constructs appropriately, they make it very easy to chase links
of interest and wander around in the Web. It is not my intent to
criticize the WWW interfaces; they are superior. But they are
not the shell, and one cannot transparently use arbitrary UNIX
commands from within them [6].
Once again, however, a suitable daemon allows the Web to be
accessed using familiar tools. The OOFS WWW daemon presents WWW
text entities as conventional files which can be opened and read
by any standard tool. At the same time, each text may also
function as a directory: its subfiles or subdirectories (that is,
the items it contains) are simply the text entities pointed to by
its embedded links.
The OOFS WWW daemon, then, provides an example of the utility
of relaxing the file/directory distinction. Each text entity is
simultaneously a file and a directory. A program such as cat,
which opens an entity conventionally, reads text, while a program
such as ls, which opens it as a directory, reads directory
entries.
When exploring the Web using a shell and the OOFS WWW daemon,
traversing a link to a new node is done with cd, and viewing the
current node can be done with the formerly meaningless invocation
cat .
Returning to a previously-visited node is of course accomplished
with
cd ..
Figure 2 shows a sample foray into the Web using a
conventional shell and the OOFS WWW daemon. This example shows
another issue of interest when presenting WWW text entities as
files: the text contains embedded HTML formatting requests, which
__________
6. To be sure, the WWW interfaces do provide shell escapes, and
mechanisms for piping node text to arbitrary commands.
USENIX -- Winter '94 6
Summit Filesystem Daemons as a Unifying Mechanism...
should usually be processed before display. It is an intriguing
question whether an OOFS daemon should implicitly perform such
formatting; for now, it does not, and in these examples the
existing WWW line-mode browsing tool, www, is used as a filter
only (indicated by its - option) to format the HTML for display.
_________________________________________________________________________
| $ cd /www/info.cern.ch/default.html |
| $ cat . |
|
Overview of the Web |
|
|
| General Overview
|
| There is no "top" to the World-Wide Web. |
| You can look at it from many points of view. |
| Here are some places to start: |
| |
| - The Virtual Library |
|
- A classification by information by subject. |
| ... |
| |
| $ cat . | www - |
| Overview of the Web |
| GENERAL OVERVIEW |
| There is no "top" to the World-Wide Web. You can look at it from |
| many points of view. Here are some places to start: |
| The Virtual Library[1] |
| A classification by information by subject. |
| ... |
| |
| $ cd 1 |
| $ cat . | www - |
| The World-Wide Web Virtual Library: Subject Catalogue |
| THE WWW VIRTUAL LIBRARY |
| This is a distributed subject catalogue. See also arrangement by |
| service type[1] ., and other subject catalogues[2] . |
| ... |
| |
| Figure 2 |
| Sample WWW Session |
|_________________________________________________________________________|
Accessing the Web using only a shell, cd, and cat may seem to
mock the more sophisticated text formatting and linking features
which the Web provides, and definitely skirts the edge of the
mappings which are meaningfully appropriate under the
"everything's a file" model. However, the ability to use a
familiar, general-purpose shell and toolkit at least partially
compensates for the lack of full hypertext awareness.
Furthermore, it is intriguing to contemplate the possibility of a
general-purpose "hypertext shell" which could be used to explore
both the World-Wide Web and other hypertext systems. It would be
easy to write such a shell if the details of various hypertext
USENIX -- Winter '94 7
Summit Filesystem Daemons as a Unifying Mechanism...
schemes were hidden behind a common, filesystem-like interface;
in fact, a few simple shell aliases can go a long way towards
providing hypertext-like features within a conventional, but OOFS-
aware, shell.
5. Extended I/O Operations
One of the more important services provided by network
information retrieval protocols is searching or lookup. No
matter how partial one is to one's own favorite grep variant or
other tools, it is not practical to pull many megabytes of data
over the network only to selectively discard most of it. A
search or lookup operation provided by a network protocol allows
the searching to be done on the machine which has local access
both to the data and to any precomputed indices or inverted
files.
The existence of these specialized search and lookup
operations is a more compelling force which tends to lock users
into to the idiosyncratic user interfaces which support and are
supported by the protocol. One's own existing tools are
inevitably based on simple reads of chunks of data, and no amount
of intelligent operation mapping by a daemon sitting at the level
of a conventional I/O interface is going to be able to make use
of a predefined search or lookup operation.
It is for this reason that the OOFS daemon protocol has been
left very open and extensible. Both gopher and WWW, for example,
provide simple keyword searches (as does WAIS; keyword searching
is its whole purpose). It is difficult to see how a filesystem
daemon can make use of these features: no existing I/O call can
meaningfully be mapped to a search or lookup operation; no
existing general-purpose tools presuppose the existence of such a
facility. If, however, the filesystem is to be the rendezvous
point between applications and information servers, we may
contemplate the invention of a new, filesystem-level call which
will map to these search and lookup operations.
In gopher and WWW, a search is performed against an existing
object and yields a menu (under gopher) or a text entity full of
links (under WWW) pointing at the objects which were found.
Therefore, for gopher and WWW, we may devise a lookup operation
which functions rather like mkdir: this new operation takes as
parameters the name of an existing object, a search pattern, and
a new name. If the search succeeds, a new directory entry, with
the specified new name, is created in the searched-upon object,
and points at the result of the search (which is actually a
directory of search results).
This new operation, called "mdlookup" (for "make-directory
lookup") is implemented by both the gopher and WWW daemons. A
simple program, lookup, provides a shell-invokable interface to
the new operation; lookup can be used to perform searches either
in gopherspace or the Web. (A simple shell alias could
USENIX -- Winter '94 8
Summit Filesystem Daemons as a Unifying Mechanism...
encapsulate the lookup invocation, new name selection, and cd
into the created directory, if successful.)
Figure 3 shows an example of the lookup command being used
along with the WWW daemon.
_________________________________________________________________________
| $ cd /www/info.cern.ch/default.html/1/9 |
| $ cat . | www - |
| The World-Wide Web Virtual Library: Computing |
| COMPUTING |
| Information categorised by subject. See also other subjects[1] . |
| ... |
| Jargon[7] Computer hacker's jargon index |
| ... |
| |
| $ cd 7 |
| $ cat . | www - |
| Collection `Hacker's Jargon' |
| HACKER'S JARGON |
| A[1] |
| B[2] |
| C[3] |
| ... |
| |
| $ lookup . kluge search1 |
| $ cd search1 |
| $ cat . | www - |
| jargon?kluge |
| THE FOLLOWING OBJECTS MATCH 'KLUGE' IN COLLECTION 'HACKER'S JARGON' |
| kluge[1] |
| kluge around[2] |
| kluge up[3] |
| |
| $ cd 1 |
| $ cat . | www - |
| kluge |
| KLUGE |
| kluge: /klooj/ [from the German `klug', clever] 1. n. A Rube |
| Goldberg (or Heath Robinson) device, whether in hardware or |
| software. |
| ... |
| |
| Figure 3 |
| Sample lookup Operation |
|_________________________________________________________________________|
The role of these extended operations -- such as "mdlookup"--
be carefully understood. As they are neither part of the
standard UNIX I/O interface nor utilized by standard tools, they
may seem to be as idiosyncratic as the special-purpose user
interfaces which they are attempting to replace. Their advantage
is that they sit at the level of, and augment, an existing
interface (namely the filesystem). They can therefore be used to
build upon and extend existing, more standard operations,
USENIX -- Winter '94 9
Summit Filesystem Daemons as a Unifying Mechanism...
permitting efficiency and synergy without having to discard
existing interfaces and toolsets.
6. Implementation Details
The decision to go with a user-mode implementation was made
for several reasons. One was pragmatic; machines with kernel
sources and tolerant users were not available. Secondly, user
mode code can be markedly easier to develop and debug than kernel
code [Warnock 84]. Finally, there is an undeniable challenge in
implementing things in user mode which classically "belong" in
the kernel. The choice is not without its disadvantages, of
course: it is somewhat difficult to preserve fork and exec
semantics of open files, and unless a system supports dynamic
linking or run-time interception of library routines and system
calls (as many modern systems in fact do), it can be a nuisance
to have to relink large numbers of programs.
Since this is a user-mode implementation, eschewing kernel
modifications, it does not rely on any modifications to the on-
disk filesystem structure. Instead, the presence of a daemon
attached to any file is recorded in a hidden file in the same
directory. A central "fallback" daemon attachment file may also
be used; this file allows users to attach daemons to files or
directories (e.g. $MAIL or /) for which the parent directories
are not writable.
The implementation of the OOFS library and the various
daemons which make it useful is relatively straightforward.
Calls which take pathnames pass them to a central routine which
examines a pathname component by component checking for daemon
attachments [7]. If a daemon is found, it is invoked (if it is
not already running). Communication with the daemon is by
default with a pair of conventional pipes, one for reading and
one for writing, but it is also possible to connect to an already-
running daemon at a named UNIX-domain socket. (Allowing daemons
to persist across client invocations eliminates multiple time-
consuming remote server connection interactions, and can also
simplify caching.)
Most calls (stat, rename, unlink, etc.) pass a single request
to the daemon and return its response to the caller. The open
call, however, allocates an OOFS open file structure containing a
pointer to the daemon, and returns an integer file descriptor
which, when passed in again by the caller in a read, write, or
other I/O call, will be recognized as an OOFS-handled file
descriptor and will instigate a daemon transaction.
Calls involving pathnames and file descriptors not associated
with OOFS daemons are of course passed on to the corresponding
UNIX kernel system calls for conventional interpretation.
__________
7. And symbolic links, which must be specially handled.
USENIX -- Winter '94 10
Summit Filesystem Daemons as a Unifying Mechanism...
In order to support a shell linked against the OOFS
libraries, and to permit shell redirection to and from daemon-
handled files, special handling is necessary during fork and exec
calls, which the OOFS library also intercepts. Before an exec,
current directory and open file state information is saved in the
environment variable OOFSCONTEXT so that the copy of the OOFS
library in the invoked program can recover it. Negotiations are
performed before and after a fork so that the parent and child
(which might otherwise share communication paths to a single
daemon) will not interfere with each other, in particular to
insure that one will not be able to close a file which should
remain open in the other. (Tichy describes an alternate solution
to this problem in [Tichy 84]; the problem is intriguingly
similar to one uncovered when an early version of UNIX first
implemented multitasking [Ritchie 84a].)
The OOFS library (the part which is linked in with client
applications) is currently written in approximately 4800 lines
[8] of C, which compiles to 27 KB of object code. The three
daemons discussed in this paper (ftp, gopher, and WWW) are
written in approximately 4000, 2000, and 3000 lines of C,
respectively, and have executables of size 57, 40, and 48 KB.
(These sizes are all for a Sun 4; object sizes for non-RISC
processors are somewhat smaller, while executable sizes for
systems without shared libraries are somewhat larger.)
6.1. Daemon Interface Protocol
The communication protocol between an application --
specifically, the OOFS library linked in with an application --
and an OOFS daemon is based on simple text lines, for simplicity
and ease of debugging. Each request is a line of the form
op modifiers wrsize rdsize [args]
where op is a string representing the operation requested,
modifiers is a string requesting optional kinds of behavior (none
are yet defined), wrsize is the number (represented as a string)
of bytes of data which accompany the request, rdsize is the
number of bytes of returned data the caller is prepared to
accept, and args is a list of zero or more operands specific to
the particular operation. An escape mechanism permits operands
which are pathnames to contain spaces or other special
characters, if necessary.
Each operation results in a return line of the form
status retval rdcount [string]
where status is a number (again represented as a string)
indicating the success or failure of the operation (including
__________
8. As reported by wc; these are not SLOC.
USENIX -- Winter '94 11
Summit Filesystem Daemons as a Unifying Mechanism...
conditions such as "operation not supported"), retval is the
value to be returned to the caller (or, for unsuccessful calls,
the value to be placed in errno), rdcount is the number of bytes
of data which follow, and string is an (optional) string encoding
miscellaneous, possibly operation-specific, information.
Currently the string is used to encode the return value of "seek"
operations, since retval is limited to int-sized values; it will
eventually also be used to encode error information at a higher
level of detail than can be expressed in errno values.
When a request must send some data (i.e. a write-like
request), and when data must be returned (from a read-like
request), the data immediately follows the request or response
line; the counts which appear in the request and response lines
inform the receiving end how many data bytes it should read from
the pipe.
The basic list of operations, most of which are supported by
most daemons, includes the following requests:
chdir open/dir seek
chmod quit start
chown read stat
close read/dir unlink
fork read/dir/stat utime
mkdir rename write
open rmdir
The "start" request is the first operation sent to a newly-
invoked daemon; it verifies successful daemon startup and
performs protocol version negotiation. "quit" is similarly used
to shut down a daemon. The other operations have functions
suggested by the UNIX system calls after which they are named; a
few have unusual behavior:
The "stat" operation accepts either a pathname or an open
file descriptor; it thus supports both the stat and fstat
system calls.
"open/dir" announces intent to read a file as a directory;
"read/dir" reads an open file as a directory and returns
filenames; "read/dir/stat" reads a directory and returns
filename and selected stat information at the same time.
(The data returned by the two "read/dir" variants is of
course in a filesystem-independent format.)
"fork" notifies the daemon that the client has forked and
that it must be more careful about honoring "close"
requests.
It is not immediately fatal for a daemon not to support an
operation; when a daemon cannot meaningfully perform some
operation, the OOFS library simply returns -1 to the calling
program, with errno set to EIO or EOPNOTSUPP.
USENIX -- Winter '94 12
Summit Filesystem Daemons as a Unifying Mechanism...
It will be noted that the basic protocol is synchronous; it
is also fairly stateful. Extensions to the protocol are planned
in order to support asynchronous operation; it is also intended
that all descriptor-based operators ("read", "write", etc.)
alternatively accept pathnames and offsets, to better support
stateless operation.
6.2. System Call Interception
A user-mode library such as OOFS which intercepts system
calls faces a mildly-tricky problem at link time: it wishes to
provide entry points with names such as _open, _read, _write,
etc., while also calling actual UNIX system calls with the same
names. (This is, of course, a simple inheritance problem.)
Numerous solutions to this problem can be imagined; the OOFS
library as currently implemented uses one of two.
The library contains entry points named oofsopen, oofsread,
etc. (i.e. the conventional system call names, prefixed with
"oofs"). The actual linking strategy depends on the I/O calls
being made by the application:
1. An application that uses the stdio package exclusively is
linked against a reimplementation of the stdio library
[Summit 89] which is based on the OOFS routines rather than
the standard system calls.
2. An application that uses system calls directly is
recompiled with invocation-line preprocessor defines of the
form -Dopen=oofsopen (etc.) in effect.
It would also be quite possible (and preferable) to provide a
variant version of a dynamically-linked libc.a, or to use a UNIX
kernel which provides well-defined support for system call
interception, in either case eliminating tedious recompilation
and/or relinking. (Jones discusses several relevant aspects of
system call interception in [Jones 93]. Other investigators have
demonstrated the feasibility of "intercepting" filesystem-related
calls by implementing specialized NFS daemons, or using the
automounter interface.)
6.3. Daemon Implementation
Writing a simple, read-only daemon (i.e. one which can
support, say, the cat program) is surprisingly easy; arranging
for a daemon to map or simulate all UNIX filesystem semantics
expected by any program is of course arbitrarily hard. Without
going into too much detail, this section lists some of the
difficulties (and surprises) encountered while implementing the
daemons mentioned in this paper.
A daemon must decide whether it will read or write the remote
item on-the-fly as the client issues "read" and "write" requests,
USENIX -- Winter '94 13
Summit Filesystem Daemons as a Unifying Mechanism...
or whether it will perform I/O to and from a local temporary
file, copying the entire file at once when the file is opened
(for reading) or closed (after writing). On-the-fly I/O can both
reduce overhead and provide lower startup latency: the "open" and
the first "read"s may return almost immediately. I/O to and from
a local temporary file, on the other hand, allows random access
and simultaneous access of multiple files. (If files are to be
cached, the use of a local temporary file is implied in any
case.) A hybrid scheme, which builds a temporary file
incrementally while performing on-the-fly I/O, is also possible.
However, it is not easy for a daemon to decide which of these
transfer models to use, particularly because it does not have all
the information it could use in making the decision. For
example, UNIX I/O semantics do not provide an indication at open
time of whether I/O will be sequential or random access.
Some protocols (notably ftp, and also of course many foreign
filesystems) distinguish between text and binary files. Again,
it is difficult for the daemon to decide which mode to use
without information which UNIX programs are -- quite happily --
not accustomed to providing. The OOFS ftp daemon addresses this
problem by interpreting special syntax in the pathname; Cate
describes another solution in [Cate 92].
It is notoriously difficult to map error conditions from any
new device or protocol onto the relatively small, fixed set of
UNIX errno values.
It is in general difficult or impossible to fill in all of
the fields in the stat structure when a daemon performs a "stat"
request on a piece of data which is not really a file. (The
st_mtime and st_ino fields are particularly troublesome.) When
these fields are not or cannot be filled in appropriately, some
applications may misbehave [9]. Even when troublesome fields can
be filled in, deriving values for them may be expensive, which is
unfortunate if an expensively-derived field is not actually
needed by the caller.
Eventual extensions to OOFS may provide workarounds for some
of these difficulties, such as: extra, optional tuning parameters
to be specified at open time [10]; an extended perror mechanism;
and an indication at the time of a stat call of which fields are
needed by the caller and which are being reliably returned. Any
use of these extensions by applications, however, would obviously
__________
9. find(1) has perhaps the most pressing requirements for
accurate stat values, but even such lowly tools as mv, cp,
and diff typically inspect st_dev and st_ino to determine
whether two files are identical.
10. Such parameters would not represent abandonment of UNIX's
typeless filesystem, but rather acknowledgement that foreign
systems, with which interoperation is desired, are not always
so enlightened.
USENIX -- Winter '94 14
Summit Filesystem Daemons as a Unifying Mechanism...
be nonstandard, nontransparent, and not universal.
7. Performance
It is difficult to provide precise measurements of the
performance of this system in a meaningful way. There are
certainly three areas in which the use of the OOFS scheme, and
its associated daemons, potentially degrades performance. First,
all pathname lookups are complicated by the necessity of checking
for the auxiliary files which record daemon attachment. Second,
invoking a daemon necessarily involves fork and exec overhead.
Finally, in general, all data read and written by the client
process passes through the pipe to the daemon, increasing
overhead.
There are, however, compensating advantages of the scheme.
The separate daemon process, though it necessitates extra IPC, is
nevertheless a second process: if the I/O is at all complicated,
the client may benefit by having the I/O offloaded to a second
process. In any case, when a remote network service is involved,
any extra IPC overhead on the local machine is likely to be
swamped by the unavoidable network I/O.
To assess at least a few performance aspects quantitatively,
three tests were performed, both with and without using OOFS.
The first test measures the time to stat() a file;
differences reflect the extra pathname processing,
auxiliary file lookup, and daemon invocation. This test
was performed in three ways: without using the OOFS library
at all; with OOFS but on a file without an attached daemon;
and with OOFS on a file with a "pass-through" daemon, which
passes requests back to the local filesystem. The second
case pays the pathname processing and auxiliary file lookup
penalty; the third case additionally suffers daemon
invocation overhead.
The second test measures the time to cat a 1 MB file, both
with a conventional cat and an OOFS-aware cat plus the pass-
through daemon mentioned above. Differences reflect both
name lookup and IPC overhead.
The third test compares the time required to ftp a 1.5 MB
file as opposed to copying it with an OOFS-aware cp and the
OOFS ftp daemon. (The OOFS-aware cp is at an additional
disadvantage since it performs several extra stat
operations, necessitating additional remote ftp server
interactions.)
The tests were performed on a lightly loaded Sun 4/280 running
Sun/OS 4.1.3. For the tests involving OOFS, approximate user and
system times for the daemon process are also presented. The data
appear in Table 1.
USENIX -- Winter '94 15
Summit Filesystem Daemons as a Unifying Mechanism...
________________________________________________________________________
| user, system, |
| real user system daemon daemon |
| Test 1 (stat) |
| |
| no OOFS 0.0003 0.00002 0.0003 - - |
| OOFS, no daemon 0.0037 0.0008 0.0016 - - |
| OOFS with daemon 0.25 0.041 0.12 0.0019 0.0009 |
| |
| Test 2 (cat 1 MB file) |
| |
| no OOFS 13.7 5.1 0.7 - - |
| OOFS 23.5 6.8 4.5 1.4 2.0 |
| |
| Test 3 (ftp 1.5 MB file) |
| |
| ftp (no OOFS) 90.1 0.2 2.5 - - |
| cp plus OOFS 110.3 4.5 11.3 2.7 8.0 |
| |
| |
| Table 1 |
| Performance Comparisons (all times in seconds) |
|_______________________________________________________________________|
There is an obvious performance degradation when the OOFS
library is in use, although it should be noted that this is a
prototype implementation against which no serious optimization
attempts have yet been made.
No attempt has been made to measure performance of the more
interactive operations involving the gopher and WWW daemons.
8. Open Issues and Future Work
The core OOFS mechanisms could use improvements in several
areas. The library needs to be merged with an established,
kernel-resident system call or I/O interception scheme, or at
least made to work as a dynamically-linkable run-time library. A
well-defined inheritance mechanism would cement its reputation as
an object-oriented filesystem and provide support for such
situations as remotely-mapped self-uncompressing files. Ideally,
daemon attachment would be recorded in the inode; to do so would
obviously require both kernel modifications and extensions to the
on-disk inode structure.
This project is one of those many that presume openness and
trust; concerns of security and authentication have been
secondary. Although the daemon interface protocol has a few
authentication hooks, they are not really implemented in practice
by the existing daemons. (Obviously, when a daemon is providing
access to a public resource, authentication is a non-issue; the
daemon doesn't really care who its client is.)
USENIX -- Winter '94 16
Summit Filesystem Daemons as a Unifying Mechanism...
The filesystem daemon scheme provides a fertile potential bed
for the introduction of Trojan Horses and other mayhem. (Having
a program fire up simply because a file is accessed is a
cracker's dream come true.) On a timesharing system, if many
users are using OOFS-aware applications, it may be appropriate to
ensure that daemons run as their authors, and not as their
invokers.
When daemons are attached in relatively few places (the
applications discussed in this paper involve only top-level
pseudodirectories such as /ftp and /gopher), it is practical to
manually maintain the auxiliary files which record daemon
attachment. If heavier use of the scheme were to be made, it
would be important to implement locking or other automated
handling of the auxiliary files, to prevent conflicts, and to
support automated updating of daemon attachments when files are
renamed or deleted.
The attempt to map non-filesystem-oriented information
services such as gopher and WWW onto a simulated filesystem has
revealed a few things that the designers of such protocols could
do to make the mapping task easier and more meaningful.
Specifically, it would be beneficial (for any automated use, not
just for OOFS daemons) if a protocol could provide:
1. Useful distinctions between error conditions (e.g. "mal-
formed request" vs. "item not found" vs. "permission
denied" vs. "unexpected I/O error");
2. A means of checking for the presence of an item, or
returning status information about it, without retrieving
its contents;
3. Short names for items (in addition to any implicit indices
or links);
4. Dates and/or modification times for items; and
5. A well-defined way to retrieve partial items, for example
to read the second 1 KB block of a 100 KB item.
The preceding wish list is in increasing order of difficulty, and
decreasing order of likelihood. OOFS is able to proceed without
any of these features, and many protocols may be utterly unable
to provide them, but where possible their availability would make
filesystem emulation considerably more seamless.
9. Comparison to Other Work
Remote/networked/distributed filesystems have been
implemented and described many times; examples are the Newcastle
Connection [Brownbridge 82], IBIS [Tichy 84], RFS [Rifkin 86],
the Andrew File System [Howard 88], and NFS [Sun 89]. OOFS is
more general, intended to allow arbitrary processing during file
USENIX -- Winter '94 17
Summit Filesystem Daemons as a Unifying Mechanism...
access; accessing remote filesystems over a network is but one
obvious application.
Several systems have supported heterogeneous filesystems (in
part in support of networked file systems, as above) by inserting
a level of indirection at the filesystem interface: examples are
Sun's Vnodes [Kleiman 86] and the Version 8 typed file system
[Weinberger 84, Rago 90].
Inserting extra functionality at the system call interface is
a process which appears in several guises; many of the systems
described in this section implement their extensions in this way.
Jones discusses several aspects of system call extension and
provides an excellent bibliography in [Jones 93]; another
implementation is described in [Krell 92].
The idea of attaching arbitrary processing modules to certain
I/O streams is not new; it is central to Research UNIX's streams
[Ritchie 84b] and to apollo's DOMAIN system [Rees 86]. What this
paper calls "filesystem daemons" have been described and
implemented several times: one implementation is Bershad and
Pinkerton's "Watchdogs" [Bershad 88]; a related idea is
implemented by Eggert and Parker's IFS in [Eggert 93].
OOFS, then, is not terribly unique: it shares with Watchdogs
and IFS the ability to do arbitrary processing, with per-file (as
opposed to per-filesystem) granularity. It shares with the
Newcastle Connection, IBIS, and IFS a user-mode implementation
which requires no kernel modifications. One significant feature
of the OOFS scheme is that it is designed to work with daemons
which provide both more and less than the canonical level of
processing: some daemons are barely able to provide a minimal
simulation of filesystem semantics, but some daemons are able to
support extended operations unheard of in conventional
filesystems.
Current networking issues of broad interest include the
problems of resource discovery, resource naming, resource
organization, and resource access. Resource discovery is the
focus of archie [Emtage 92], gopher, and WWW. One attempt at a
systematic approach to uniform resource naming in the face of
heterogeneous protocols is the Uniform Resource Locator (URL)
scheme [Berners-Lee 93]. Much ongoing research attacks the
resource organization problem; the Prospero system [Neuman 89]
provides an excellent example. The applications described in
this paper are primarily directed at the resource access problem,
although the abilities to browse heterogeneous resources using
standard tools and target them with symbolic links provide some
support for the discovery and organization problems.
(Appropriate OOFS daemons, including those described in this
paper, could also provide a tidy implementation of the URL
scheme, in the form of a /url pseudodirectory containing
subdirectories ftp, http, etc.)
USENIX -- Winter '94 18
Summit Filesystem Daemons as a Unifying Mechanism...
10. Conclusions
This paper has described a mechanism whereby heterogeneous
network information services can be integrated more-or-less
transparently into a local filesystem, such that they can be
accessed and manipulated using standard, familiar tools. The
emulation is not perfect (not all information sources can be made
to mimic all filesystem semantics), but the limitations of the
emulation are definitely balanced by the advantages of being able
to use standard tools. When tools and protocols share standard
interfaces, they can be combined in arbitrarily powerful ways to
solve problems not originally envisioned.
The integration depends, in this case, on making the
filesystem interface (as embodied in the operating system calls
open, read, etc.) a rendezvous point between network services and
local utilities, such that services and utilities written at
different times, under different sets of assumptions, with
different immediate goals in mind, by different people, and
without advance knowledge of each other, can nevertheless
interoperate.
Acknowledgements
Thanks to Mark Brader, Stan Brown, Paul Eggert, Michael
Jones, Jeff Mogul, and Melanie Summit for their comments on
earlier drafts of this paper. Thanks to Robert Dinse at Eskimo
North for providing the system on which most of this work was
performed.
USENIX -- Winter '94 19
Summit Filesystem Daemons as a Unifying Mechanism...
References
[Anklesaria 93] F. Anklesaria et al, "F.Y.I. on the Internet
Gopher Protocol," March, 1993, URL=ftp:
//boombox.micro.umn.edu/pub/gopher/
gopher_protocol/DRAFT_Gopher_FYI_RFC.txt
(also RFC-1436).
[Berners-Lee 93] Tim Berners-Lee, "Uniform Resource Locators,"
Internet Draft, March, 1993, URL=ftp:
//info.cern.ch/pub/ietf/url4.txt .
[Bershad 88] Brian N. Bershad and C. Brian Pinkerton,
"Watchdogs -- Extending the UNIX File System,"
Computing Systems, 1:2, 1988, pp. 169-188.
[Brownbridge 82] D.R. Brownbridge, L.F. Marshall, and B. Randell,
"The Newcastle Connection or UNIXes of the World
Unite!," Software -- Practice and Experience,
12, 1992, pp. 1147-1162.
[Cate 92] Vincent Cate, "Alex -- a Global Filesystem," in
Proceedings of the USENIX File Systems Workshop,
Ann Arbor, MI, 1992, pp. 1-11.
[Davis 90] Franklin Davis et al, "WAIS Interface Protocol
Prototype Functional Specification," April 23,
1990, URL=ftp://quake.think.com/pub/wais/doc/
protspec.txt .
[Eggert 93] Paul R. Eggert and D. Stott Parker, "File
Systems in User Space," in Proceedings of the
Winter 1993 USENIX Conference, San Diego.
[Emtage 92] Alan Emtage and Peter Deutsch, "archie -- An
Electronic Directory Service for the Internet,"
in Proceedings of the Winter 1992 USENIX
Conference, San Francisco, pp. 93-110.
[Howard 88] J.H. Howard, "An Overview of the Andrew File
System," in Winter 1988 USENIX Conference
Proceedings, Dallas, February, 1988.
[Jones 93] Michael B. Jones, "Interposition Agents:
Transparently Interposing User Code at the
System Interface," in Proceedings of the 14th
ACM Symposium on Operating Systems Principles,
Asheville, NC, December, 1993.
[Kahle 89] Brewster Kahle, "Wide Area Information Server
Concepts," November 3, 1989, URL=ftp:
//quake.think.com/pub/wais/doc/wais-concepts.txt .
USENIX -- Winter '94 20
Summit Filesystem Daemons as a Unifying Mechanism...
[Kleiman 86] S.R. Kleiman, "Vnodes: An Architecture for
Multiple File System Types in Sun UNIX," in
USENIX Association, Summer Conference
Proceedings, Atlanta, 1986, pp. 238-247.
[Korn 90] David G. Korn and Eduardo Krell, "A New
Dimension for the Unix File System," Software --
Practice and Experience, 20:S1, June, 1990,
pp. 19-34.
[Krell 92] Eduardo Krell and Balachander Krishnamurthy,
"COLA: Customized Overlaying," in Proceedings of
the Winter 1992 USENIX Conference, San
Francisco, pp. 3-7.
[Krol 92] Ed Krol, The Whole Internet User's Guide &
Catalog, O'Reilly & Associates, 1992, ISBN
1-56592-025-2.
[Neuman 89] B. Clifford Neuman, "The Virtual System Model
for Large Distributed Operating Systems,"
Technical Report 89-01-07, April, 1989,
Department of Computer Science, University of
Washington, Seattle.
[Rago 90] Stephen Rago, "A Look at the Ninth Edition
Network File System," in UNIX Research System
Papers, Volume II, Saunders College Publishing,
1990, ISBN 0-03-047529-5.
[Rees 86] Jim Rees, Paul H. Levine, Nathaniel Mishkin, and
Paul J. Leach, "An Extensible I/O System," in
USENIX Association, Summer Conference
Proceedings, Atlanta, 1986, pp. 114-125.
[Rifkin 86] Andrew P. Rifkin et al, "RFS Architectural
Overview," in USENIX Conference Proceedings,
Atlanta, GA, June, 1986.
[Ritchie 84a] Dennis M. Ritchie, "A Stream Input-Output
System," AT&T Bell Laboratories Technical
Journal, 63:8, October, 1984, pp. 1897-1910.
[Ritchie 84b] Dennis M. Ritchie, "The Evolution of the UNIX
Time-sharing System," AT&T Bell Laboratories
Technical Journal, 63:8, October, 1984, pp. 1577-
1593.
[Roome 92] W.D. Roome, "3DFS: A Time-Oriented File Server,"
in Proceedings of the Winter 1992 USENIX
Conference, San Francisco, pp. 405-418.
[Summit 89] Steve Summit, "A Reimplementation of the
Standard I/O Package," unpublished.
USENIX -- Winter '94 21
Summit Filesystem Daemons as a Unifying Mechanism...
[Sun 89] Sun Microsystems, Inc., "NFS: Network File
System Protocol Specification," Internet RFC-
1094, March, 1989.
[Tichy 84] Walter F. Tichy and Zuwang Ruan, "Towards a
Distributed File System," in USENIX
Association/Software Tools Users Group, Summer
Conference Proceedings, Salt Lake City, 1984,
pp. 87-97.
[Warnock 84] Robert P. Warnock III, "User-Mode Development of
Hardware and Kernel Software," in USENIX
Association/Software Tools Users Group, Summer
Conference Proceedings, Salt Lake City, 1984,
pp. 224-226.
[Weinberger 84] P.J. Weinberger, "The Version 8 Network File
System" (abstract), in USENIX Association/
Software Tools Users Group, Summer Conference
Proceedings, Salt Lake City, 1984, p. 86.
[WWW 93] "Protocol for the Retrieval and Manipulation of
Textual and Hypermedia Information," June, 1993,
URL=ftp://info.cern.ch/pub/www/doc/http-spec.ps .
Author Information
Steve Summit attended the Massachusetts Institute of
Technology from 1979 to 1983, receiving a Bachelor of Science
degree in Electrical Engineering and Computer Science. He has
worked as a software engineer since then, presently as an
independent consultant. His interests and specialties are too
undifferentiated, and his mistrust of buzzwords too complete, to
attempt to list them here; this paper is however not
unrepresentative. He can be reached at scs@eskimo.com .
USENIX -- Winter '94 22