Filesystem Daemons as a Unifying Mechanism for Network Information Access* Steve Summit Consultant, Seattle, Washington Abstract As the Net burgeons, new tools and protocols are being introduced to permit some orderly use to be made of the wealth of information available. These new protocols, however, often presuppose the use of new, nonstandard, highly interactive user interfaces. This paper presents a mechanism for unifying access to diverse network services through filesystem daemons, which allow network information services to be treated as if they were conventional files and directories, residing in the local namespace, and accessed transparently with standard tools. Besides normal filesystem operations ("open", "read", "write", etc.), the daemons may introduce extended operations, which provide generic access to such features as network database lookup operations. 1. Introduction As the Internet grows, and as more and more information becomes available, it becomes more difficult to locate information of interest. Numerous protocols, such as Gopher [Anklesaria 93], the World-Wide Web [WWW 93], and WAIS [Kahle 89, Davis 90] have been introduced in an attempt to make network browsing and information retrieval more convenient and productive [Krol 92]. These protocols, and their associated user interface tools, represent important advances, and they make the resource discovery problem much more tractable. However, since they are intended as browsing tools, their default user interfaces tend to be highly interactive [1]. Furthermore, as the new protocols are still somewhat experimental, their user interfaces are rather idiosyncratic. Though perfectly adequate for casual browsing, they do not lend themselves immediately to integration into larger toolsets. It is natural to wonder whether that venerable old UNIX notion that "everything's a file" can be extended to these newer network information sources. (Conventional network files have of course been so available for some time; see section 9.) Under this extended paradigm, the filesystem interface becomes a rendezvous point between network services and local utilities. It is possible to explore gopherspace, or the World-Wide Web, or __________ * Copyright 1994 USENIX Association. 1. Even ftp, which the new protocols are intended to supersede, has this problem. USENIX -- Winter '94 1 Summit Filesystem Daemons as a Unifying Mechanism... other information spaces, using one's own preferred shell, moving around with cd, viewing and searching "files" with standard tools such as more and grep, copying them to one's home directory with cp, and otherwise making use of a large, open, familiar, rich, programmable environment. This paper describes a prototype implementation which provides such transparent access. The implementation is built up in several stages, and that structure will be reflected in the body of the paper. Section 2 introduces the idea of filesystem daemons, which permit a process to be invoked to handle any or all of the I/O associated with any name in the filesystem. Section 3 reviews the straightforward application of such daemons in implementing transparent remote filesystem access (e.g. via anonymous ftp). Section 4 extends the remote access paradigm to the realm of non-filesystem-oriented data such as gopher and WWW, and proposes relaxing the distinction between files and directories (it is possible to view some pieces of structured or indexed information as both). Section 5 suggests extensions to the basic set of I/O operations, in order to permit advantageous use of such services as remote database search facilities. Section 6 describes details of the implementation, and section 7 discusses performance. Section 8 outlines open issues and future work; section 9 compares related work. The implementation described runs entirely in user mode; echoing a refrain of USENIX contributions past, "no kernel changes are required." 2. Filesystem Daemons The underlying mechanism on which the ideas in this paper are built is an extended filesystem tentatively named the Object- Oriented Filesystem, or OOFS [2]. The salient feature of this filesystem is that any file may enjoy customized I/O handling via the services of an associated process, or filesystem daemon. In effect, each file may be treated as an object to which I/O messages are passed. (A similar scheme is described in [Bershad 88].) Under OOFS, each UNIX system call which accepts a pathname is intercepted and the pathname inspected for the presence of a daemon attached to the referenced file, or to one of its parent directories. If a daemon is found, it is invoked, and a communication path is set up along which are passed messages which request further lookup and I/O operations. OOFS also intercepts system calls which accept file descriptors; I/O on descriptors which refer to daemon-opened files again results in messages being passed to the daemon for processing, rather than direct invocation of the conventional kernel routine. Daemons may respond to I/O requests in arbitrary ways, constructing the __________ 2. An admittedly obvious name. USENIX -- Winter '94 2 Summit Filesystem Daemons as a Unifying Mechanism... file's data on the fly, if they so wish [3]. The protocol between a daemon and its OOFS-aware client is simple and -- very importantly -- extensible. (Further rationale behind, and ramifications of, this extensibility are discussed in section 5; details of the daemon interface protocol can be found in section 6.) For the purposes of the following discussion, it suffices to know that daemon requests are specified by text strings; most are named after the UNIX system calls they implement. The OOFS mechanisms support conventional, slash-separated, UNIX-style pathname syntax. The semantics can however be arbitrarily tangled; as we shall see, various path components may end up representing access methods, machine names, etc. Furthermore, if a daemon requires any special parameters, these must typically be passed to it using daemon-specific syntax grafted on to the pathname. (Similar gyrations are discussed in [Roome 92].) Filesystem daemons are obviously useful for purposes other than networking. They can be used to implement access control lists, self-decompressing files, self-extracting archives, versioning systems [Korn 90, Roome 92], mailbox directories, and NNTP-based /usr/spool/news directories, among other things, but those applications are outside the scope of this paper. 3. Remote Filesystem Access Once things are set up so that daemons can intercept and provide special processing for certain pathnames, the groundwork is obviously laid for transparent network file access. For example, I have implemented a sort of "poor man's NFS" by attaching a daemon to the pseudodirectory /ftp. (This technique is very similar to an approach used by the Alex system [Cate 92].) This daemon handles all pathname components beneath the /ftp attachment point, such that pathnames of the form /ftp/machine/path are interpreted as a path accessed via anonymous ftp on a certain machine. The daemon acts as an ftp client, connecting to the specified machine's ftp server and performing appropriate ftp protocol transactions as necessary to satisfy its client's (that is, the daemon's clients) I/O requests. Under this scheme, the ftp user command becomes superfluous; files can be copied by anonymous ftp with the cp command (or moved with mv). For example, the command cp /ftp/ftp.nisc.sri.com/rfc/rfc1149.txt . __________ 3. Eggert and Parker discuss such "intensionalized" files extensively in [Eggert 93]. USENIX -- Winter '94 3 Summit Filesystem Daemons as a Unifying Mechanism... retrieves a file via anonymous ftp from site ftp.nisc.sri.com; no manual interaction with an ftp client is required. 4. Remote Non-Filesystem Access The preceding two sections are by way of prelude; they do not represent anything that has not been done several times before. This paper's principal contribution is the idea of accessing, transparently and as if part of the local filesystem, non- filesystem-structured network information services. 4.1. Gopher The Internet Gopher protocol [Anklesaria 93] presents a hierarchical tree of information nodes, residing potentially on many machines. Nodes are typed: some are simple pieces of text, others are "menus" (analogous to directories) pointing at other nodes, still others are simple search engines. A simple, connectionless protocol allows a node's contents to be fetched; nodes which are directories return information about each subnode contained: its type, the machine on which it resides, and the identifying tag by which it can be fetched. The conventional user interfaces to the Internet Gopher are menu based: each time a selection is made from a gopher menu, the user is presented either with another menu, or with a piece of text or other information. The invocation style is highly interactive; if there is a way to save oneself a copy of an interesting node, it is neither the way that one saves interesting mail messages nor news articles nor files discovered while exploring the net using ftp or some tool other than gopher. Gopherspace can be presented as a filesystem, however, by an OOFS daemon which maps gopher menu nodes to directories and other nodes to files. This daemon interprets pathnames of the form /gopher/machine-name as the root of the gopherspace tree on the host named machine- name, and it is therefore possible to explore gopherspace using one's favorite shell, using ls to see the contents of a "menu," cd to move among menus, and cat, more, or cp to view or save nodes of interest. Figure 1 shows a sample interaction using the OOFS gopher daemon. It must not be suggested here that the mapping from gopherspace to a simulated filesystem is perfect. The entries in gopher menus tend to be multiword titles, not the shorter, single- USENIX -- Winter '94 4 Summit Filesystem Daemons as a Unifying Mechanism... ________________________________________________________________ | $ cd /gopher/gopher.micro.umn.edu | | $ ls -l | | dr--r--r-- 1 0 0 Dec 31 1969 Information About Gopher | | dr--r--r-- 1 0 0 Dec 31 1969 Computer Information | | dr--r--r-- 1 0 0 Dec 31 1969 Discussion Groups | | dr--r--r-- 1 0 0 Dec 31 1969 Fun & Games | | ... | | | | $ cd Inf* | | $ pwd | | /gopher/gopher.micro.umn.edu/Information About Gopher | | $ ls -l | | -r--r--r-- 1 0 0 Dec 31 1969 About Gopher | | ?r--r--r-- 1 0 0 Dec 31 1969 Search Gopher News | | dr--r--r-- 1 0 0 Dec 31 1969 Gopher News Archive | | dr--r--r-- 1 0 0 Dec 31 1969 comp.infosystems.gopher | | ... | | | | $ cat Abou* | | This is the University of Minnesota Computer & Information | | Services Gopher Consultant service. | | ... | | | | Figure 1 | | Sample Gopher Session | |________________________________________________________________| word names one expects to see in directories [4]. As the example in Figure 1 shows, gopher nodes have neither meaningful owners, sizes, nor modification times [5]. The mapping is not without compensating merits, however: to a user who prefers a conventional shell, and dislikes using different interfaces in different contexts, the aberrations visible when trying to treat gopherspace as a filesystem are no more jarring than the differences between a conventional gopher client and that preferred shell. More importantly, access to gopherspace via the filesystem and a programmable shell leaves open the possibility of building new interaction or processing utilities using a toolkit approach. __________ 4. To ease the resulting burden slightly, the prototype gopher OOFS daemon implements a simple implicit wildcard scheme when matching pathnames against gopher menus: a name not otherwise matched will match a name of which it is an initial prefix, if unique. (Purists will note that this twist is entirely unnecessary, users could otherwise make frequent use of shell globbing when cd'ing through gopherspace.) 5. Some nodes, such as gopher search engines, may not even map to a proper UNIX "file type;" as the example shows, the ls command is unable to interpret the S_IFMT bits, and displays an unusual leading '?'. USENIX -- Winter '94 5 Summit Filesystem Daemons as a Unifying Mechanism... 4.2. World-Wide Web The World-Wide Web [WWW 93] is based on two functionally distinct ideas. The first is a connectionless information retrieval protocol, much like gopher's, by which a named text entity is retrieved from a server. The second, and more significant aspect of WWW is its hypertext model: in general, all text entities contain hypertext links to related entities. Interaction with WWW consists entirely of chasing links to narrow in upon information of interest (or just to explore). The existing WWW interaction tools, even the lowest-common- denominator text-only version, are excellently written. Besides formatting WWW's embedded HTML (Hypertext Markup Language) constructs appropriately, they make it very easy to chase links of interest and wander around in the Web. It is not my intent to criticize the WWW interfaces; they are superior. But they are not the shell, and one cannot transparently use arbitrary UNIX commands from within them [6]. Once again, however, a suitable daemon allows the Web to be accessed using familiar tools. The OOFS WWW daemon presents WWW text entities as conventional files which can be opened and read by any standard tool. At the same time, each text may also function as a directory: its subfiles or subdirectories (that is, the items it contains) are simply the text entities pointed to by its embedded links. The OOFS WWW daemon, then, provides an example of the utility of relaxing the file/directory distinction. Each text entity is simultaneously a file and a directory. A program such as cat, which opens an entity conventionally, reads text, while a program such as ls, which opens it as a directory, reads directory entries. When exploring the Web using a shell and the OOFS WWW daemon, traversing a link to a new node is done with cd, and viewing the current node can be done with the formerly meaningless invocation cat . Returning to a previously-visited node is of course accomplished with cd .. Figure 2 shows a sample foray into the Web using a conventional shell and the OOFS WWW daemon. This example shows another issue of interest when presenting WWW text entities as files: the text contains embedded HTML formatting requests, which __________ 6. To be sure, the WWW interfaces do provide shell escapes, and mechanisms for piping node text to arbitrary commands. USENIX -- Winter '94 6 Summit Filesystem Daemons as a Unifying Mechanism... should usually be processed before display. It is an intriguing question whether an OOFS daemon should implicitly perform such formatting; for now, it does not, and in these examples the existing WWW line-mode browsing tool, www, is used as a filter only (indicated by its - option) to format the HTML for display. _________________________________________________________________________ | $ cd /www/info.cern.ch/default.html | | $ cat . | | Overview of the Web | | | |

General Overview

| | There is no "top" to the World-Wide Web. | | You can look at it from many points of view. | | Here are some places to start: | |
| |
The Virtual Library | |
A classification by information by subject. | | ... | | | | $ cat . | www - | | Overview of the Web | | GENERAL OVERVIEW | | There is no "top" to the World-Wide Web. You can look at it from | | many points of view. Here are some places to start: | | The Virtual Library[1] | | A classification by information by subject. | | ... | | | | $ cd 1 | | $ cat . | www - | | The World-Wide Web Virtual Library: Subject Catalogue | | THE WWW VIRTUAL LIBRARY | | This is a distributed subject catalogue. See also arrangement by | | service type[1] ., and other subject catalogues[2] . | | ... | | | | Figure 2 | | Sample WWW Session | |_________________________________________________________________________| Accessing the Web using only a shell, cd, and cat may seem to mock the more sophisticated text formatting and linking features which the Web provides, and definitely skirts the edge of the mappings which are meaningfully appropriate under the "everything's a file" model. However, the ability to use a familiar, general-purpose shell and toolkit at least partially compensates for the lack of full hypertext awareness. Furthermore, it is intriguing to contemplate the possibility of a general-purpose "hypertext shell" which could be used to explore both the World-Wide Web and other hypertext systems. It would be easy to write such a shell if the details of various hypertext USENIX -- Winter '94 7 Summit Filesystem Daemons as a Unifying Mechanism... schemes were hidden behind a common, filesystem-like interface; in fact, a few simple shell aliases can go a long way towards providing hypertext-like features within a conventional, but OOFS- aware, shell. 5. Extended I/O Operations One of the more important services provided by network information retrieval protocols is searching or lookup. No matter how partial one is to one's own favorite grep variant or other tools, it is not practical to pull many megabytes of data over the network only to selectively discard most of it. A search or lookup operation provided by a network protocol allows the searching to be done on the machine which has local access both to the data and to any precomputed indices or inverted files. The existence of these specialized search and lookup operations is a more compelling force which tends to lock users into to the idiosyncratic user interfaces which support and are supported by the protocol. One's own existing tools are inevitably based on simple reads of chunks of data, and no amount of intelligent operation mapping by a daemon sitting at the level of a conventional I/O interface is going to be able to make use of a predefined search or lookup operation. It is for this reason that the OOFS daemon protocol has been left very open and extensible. Both gopher and WWW, for example, provide simple keyword searches (as does WAIS; keyword searching is its whole purpose). It is difficult to see how a filesystem daemon can make use of these features: no existing I/O call can meaningfully be mapped to a search or lookup operation; no existing general-purpose tools presuppose the existence of such a facility. If, however, the filesystem is to be the rendezvous point between applications and information servers, we may contemplate the invention of a new, filesystem-level call which will map to these search and lookup operations. In gopher and WWW, a search is performed against an existing object and yields a menu (under gopher) or a text entity full of links (under WWW) pointing at the objects which were found. Therefore, for gopher and WWW, we may devise a lookup operation which functions rather like mkdir: this new operation takes as parameters the name of an existing object, a search pattern, and a new name. If the search succeeds, a new directory entry, with the specified new name, is created in the searched-upon object, and points at the result of the search (which is actually a directory of search results). This new operation, called "mdlookup" (for "make-directory lookup") is implemented by both the gopher and WWW daemons. A simple program, lookup, provides a shell-invokable interface to the new operation; lookup can be used to perform searches either in gopherspace or the Web. (A simple shell alias could USENIX -- Winter '94 8 Summit Filesystem Daemons as a Unifying Mechanism... encapsulate the lookup invocation, new name selection, and cd into the created directory, if successful.) Figure 3 shows an example of the lookup command being used along with the WWW daemon. _________________________________________________________________________ | $ cd /www/info.cern.ch/default.html/1/9 | | $ cat . | www - | | The World-Wide Web Virtual Library: Computing | | COMPUTING | | Information categorised by subject. See also other subjects[1] . | | ... | | Jargon[7] Computer hacker's jargon index | | ... | | | | $ cd 7 | | $ cat . | www - | | Collection `Hacker's Jargon' | | HACKER'S JARGON | | A[1] | | B[2] | | C[3] | | ... | | | | $ lookup . kluge search1 | | $ cd search1 | | $ cat . | www - | | jargon?kluge | | THE FOLLOWING OBJECTS MATCH 'KLUGE' IN COLLECTION 'HACKER'S JARGON' | | kluge[1] | | kluge around[2] | | kluge up[3] | | | | $ cd 1 | | $ cat . | www - | | kluge | | KLUGE | | kluge: /klooj/ [from the German `klug', clever] 1. n. A Rube | | Goldberg (or Heath Robinson) device, whether in hardware or | | software. | | ... | | | | Figure 3 | | Sample lookup Operation | |_________________________________________________________________________| The role of these extended operations -- such as "mdlookup"-- be carefully understood. As they are neither part of the standard UNIX I/O interface nor utilized by standard tools, they may seem to be as idiosyncratic as the special-purpose user interfaces which they are attempting to replace. Their advantage is that they sit at the level of, and augment, an existing interface (namely the filesystem). They can therefore be used to build upon and extend existing, more standard operations, USENIX -- Winter '94 9 Summit Filesystem Daemons as a Unifying Mechanism... permitting efficiency and synergy without having to discard existing interfaces and toolsets. 6. Implementation Details The decision to go with a user-mode implementation was made for several reasons. One was pragmatic; machines with kernel sources and tolerant users were not available. Secondly, user mode code can be markedly easier to develop and debug than kernel code [Warnock 84]. Finally, there is an undeniable challenge in implementing things in user mode which classically "belong" in the kernel. The choice is not without its disadvantages, of course: it is somewhat difficult to preserve fork and exec semantics of open files, and unless a system supports dynamic linking or run-time interception of library routines and system calls (as many modern systems in fact do), it can be a nuisance to have to relink large numbers of programs. Since this is a user-mode implementation, eschewing kernel modifications, it does not rely on any modifications to the on- disk filesystem structure. Instead, the presence of a daemon attached to any file is recorded in a hidden file in the same directory. A central "fallback" daemon attachment file may also be used; this file allows users to attach daemons to files or directories (e.g. $MAIL or /) for which the parent directories are not writable. The implementation of the OOFS library and the various daemons which make it useful is relatively straightforward. Calls which take pathnames pass them to a central routine which examines a pathname component by component checking for daemon attachments [7]. If a daemon is found, it is invoked (if it is not already running). Communication with the daemon is by default with a pair of conventional pipes, one for reading and one for writing, but it is also possible to connect to an already- running daemon at a named UNIX-domain socket. (Allowing daemons to persist across client invocations eliminates multiple time- consuming remote server connection interactions, and can also simplify caching.) Most calls (stat, rename, unlink, etc.) pass a single request to the daemon and return its response to the caller. The open call, however, allocates an OOFS open file structure containing a pointer to the daemon, and returns an integer file descriptor which, when passed in again by the caller in a read, write, or other I/O call, will be recognized as an OOFS-handled file descriptor and will instigate a daemon transaction. Calls involving pathnames and file descriptors not associated with OOFS daemons are of course passed on to the corresponding UNIX kernel system calls for conventional interpretation. __________ 7. And symbolic links, which must be specially handled. USENIX -- Winter '94 10 Summit Filesystem Daemons as a Unifying Mechanism... In order to support a shell linked against the OOFS libraries, and to permit shell redirection to and from daemon- handled files, special handling is necessary during fork and exec calls, which the OOFS library also intercepts. Before an exec, current directory and open file state information is saved in the environment variable OOFSCONTEXT so that the copy of the OOFS library in the invoked program can recover it. Negotiations are performed before and after a fork so that the parent and child (which might otherwise share communication paths to a single daemon) will not interfere with each other, in particular to insure that one will not be able to close a file which should remain open in the other. (Tichy describes an alternate solution to this problem in [Tichy 84]; the problem is intriguingly similar to one uncovered when an early version of UNIX first implemented multitasking [Ritchie 84a].) The OOFS library (the part which is linked in with client applications) is currently written in approximately 4800 lines [8] of C, which compiles to 27 KB of object code. The three daemons discussed in this paper (ftp, gopher, and WWW) are written in approximately 4000, 2000, and 3000 lines of C, respectively, and have executables of size 57, 40, and 48 KB. (These sizes are all for a Sun 4; object sizes for non-RISC processors are somewhat smaller, while executable sizes for systems without shared libraries are somewhat larger.) 6.1. Daemon Interface Protocol The communication protocol between an application -- specifically, the OOFS library linked in with an application -- and an OOFS daemon is based on simple text lines, for simplicity and ease of debugging. Each request is a line of the form op modifiers wrsize rdsize [args] where op is a string representing the operation requested, modifiers is a string requesting optional kinds of behavior (none are yet defined), wrsize is the number (represented as a string) of bytes of data which accompany the request, rdsize is the number of bytes of returned data the caller is prepared to accept, and args is a list of zero or more operands specific to the particular operation. An escape mechanism permits operands which are pathnames to contain spaces or other special characters, if necessary. Each operation results in a return line of the form status retval rdcount [string] where status is a number (again represented as a string) indicating the success or failure of the operation (including __________ 8. As reported by wc; these are not SLOC. USENIX -- Winter '94 11 Summit Filesystem Daemons as a Unifying Mechanism... conditions such as "operation not supported"), retval is the value to be returned to the caller (or, for unsuccessful calls, the value to be placed in errno), rdcount is the number of bytes of data which follow, and string is an (optional) string encoding miscellaneous, possibly operation-specific, information. Currently the string is used to encode the return value of "seek" operations, since retval is limited to int-sized values; it will eventually also be used to encode error information at a higher level of detail than can be expressed in errno values. When a request must send some data (i.e. a write-like request), and when data must be returned (from a read-like request), the data immediately follows the request or response line; the counts which appear in the request and response lines inform the receiving end how many data bytes it should read from the pipe. The basic list of operations, most of which are supported by most daemons, includes the following requests: chdir open/dir seek chmod quit start chown read stat close read/dir unlink fork read/dir/stat utime mkdir rename write open rmdir The "start" request is the first operation sent to a newly- invoked daemon; it verifies successful daemon startup and performs protocol version negotiation. "quit" is similarly used to shut down a daemon. The other operations have functions suggested by the UNIX system calls after which they are named; a few have unusual behavior: The "stat" operation accepts either a pathname or an open file descriptor; it thus supports both the stat and fstat system calls. "open/dir" announces intent to read a file as a directory; "read/dir" reads an open file as a directory and returns filenames; "read/dir/stat" reads a directory and returns filename and selected stat information at the same time. (The data returned by the two "read/dir" variants is of course in a filesystem-independent format.) "fork" notifies the daemon that the client has forked and that it must be more careful about honoring "close" requests. It is not immediately fatal for a daemon not to support an operation; when a daemon cannot meaningfully perform some operation, the OOFS library simply returns -1 to the calling program, with errno set to EIO or EOPNOTSUPP. USENIX -- Winter '94 12 Summit Filesystem Daemons as a Unifying Mechanism... It will be noted that the basic protocol is synchronous; it is also fairly stateful. Extensions to the protocol are planned in order to support asynchronous operation; it is also intended that all descriptor-based operators ("read", "write", etc.) alternatively accept pathnames and offsets, to better support stateless operation. 6.2. System Call Interception A user-mode library such as OOFS which intercepts system calls faces a mildly-tricky problem at link time: it wishes to provide entry points with names such as _open, _read, _write, etc., while also calling actual UNIX system calls with the same names. (This is, of course, a simple inheritance problem.) Numerous solutions to this problem can be imagined; the OOFS library as currently implemented uses one of two. The library contains entry points named oofsopen, oofsread, etc. (i.e. the conventional system call names, prefixed with "oofs"). The actual linking strategy depends on the I/O calls being made by the application: 1. An application that uses the stdio package exclusively is linked against a reimplementation of the stdio library [Summit 89] which is based on the OOFS routines rather than the standard system calls. 2. An application that uses system calls directly is recompiled with invocation-line preprocessor defines of the form -Dopen=oofsopen (etc.) in effect. It would also be quite possible (and preferable) to provide a variant version of a dynamically-linked libc.a, or to use a UNIX kernel which provides well-defined support for system call interception, in either case eliminating tedious recompilation and/or relinking. (Jones discusses several relevant aspects of system call interception in [Jones 93]. Other investigators have demonstrated the feasibility of "intercepting" filesystem-related calls by implementing specialized NFS daemons, or using the automounter interface.) 6.3. Daemon Implementation Writing a simple, read-only daemon (i.e. one which can support, say, the cat program) is surprisingly easy; arranging for a daemon to map or simulate all UNIX filesystem semantics expected by any program is of course arbitrarily hard. Without going into too much detail, this section lists some of the difficulties (and surprises) encountered while implementing the daemons mentioned in this paper. A daemon must decide whether it will read or write the remote item on-the-fly as the client issues "read" and "write" requests, USENIX -- Winter '94 13 Summit Filesystem Daemons as a Unifying Mechanism... or whether it will perform I/O to and from a local temporary file, copying the entire file at once when the file is opened (for reading) or closed (after writing). On-the-fly I/O can both reduce overhead and provide lower startup latency: the "open" and the first "read"s may return almost immediately. I/O to and from a local temporary file, on the other hand, allows random access and simultaneous access of multiple files. (If files are to be cached, the use of a local temporary file is implied in any case.) A hybrid scheme, which builds a temporary file incrementally while performing on-the-fly I/O, is also possible. However, it is not easy for a daemon to decide which of these transfer models to use, particularly because it does not have all the information it could use in making the decision. For example, UNIX I/O semantics do not provide an indication at open time of whether I/O will be sequential or random access. Some protocols (notably ftp, and also of course many foreign filesystems) distinguish between text and binary files. Again, it is difficult for the daemon to decide which mode to use without information which UNIX programs are -- quite happily -- not accustomed to providing. The OOFS ftp daemon addresses this problem by interpreting special syntax in the pathname; Cate describes another solution in [Cate 92]. It is notoriously difficult to map error conditions from any new device or protocol onto the relatively small, fixed set of UNIX errno values. It is in general difficult or impossible to fill in all of the fields in the stat structure when a daemon performs a "stat" request on a piece of data which is not really a file. (The st_mtime and st_ino fields are particularly troublesome.) When these fields are not or cannot be filled in appropriately, some applications may misbehave [9]. Even when troublesome fields can be filled in, deriving values for them may be expensive, which is unfortunate if an expensively-derived field is not actually needed by the caller. Eventual extensions to OOFS may provide workarounds for some of these difficulties, such as: extra, optional tuning parameters to be specified at open time [10]; an extended perror mechanism; and an indication at the time of a stat call of which fields are needed by the caller and which are being reliably returned. Any use of these extensions by applications, however, would obviously __________ 9. find(1) has perhaps the most pressing requirements for accurate stat values, but even such lowly tools as mv, cp, and diff typically inspect st_dev and st_ino to determine whether two files are identical. 10. Such parameters would not represent abandonment of UNIX's typeless filesystem, but rather acknowledgement that foreign systems, with which interoperation is desired, are not always so enlightened. USENIX -- Winter '94 14 Summit Filesystem Daemons as a Unifying Mechanism... be nonstandard, nontransparent, and not universal. 7. Performance It is difficult to provide precise measurements of the performance of this system in a meaningful way. There are certainly three areas in which the use of the OOFS scheme, and its associated daemons, potentially degrades performance. First, all pathname lookups are complicated by the necessity of checking for the auxiliary files which record daemon attachment. Second, invoking a daemon necessarily involves fork and exec overhead. Finally, in general, all data read and written by the client process passes through the pipe to the daemon, increasing overhead. There are, however, compensating advantages of the scheme. The separate daemon process, though it necessitates extra IPC, is nevertheless a second process: if the I/O is at all complicated, the client may benefit by having the I/O offloaded to a second process. In any case, when a remote network service is involved, any extra IPC overhead on the local machine is likely to be swamped by the unavoidable network I/O. To assess at least a few performance aspects quantitatively, three tests were performed, both with and without using OOFS. The first test measures the time to stat() a file; differences reflect the extra pathname processing, auxiliary file lookup, and daemon invocation. This test was performed in three ways: without using the OOFS library at all; with OOFS but on a file without an attached daemon; and with OOFS on a file with a "pass-through" daemon, which passes requests back to the local filesystem. The second case pays the pathname processing and auxiliary file lookup penalty; the third case additionally suffers daemon invocation overhead. The second test measures the time to cat a 1 MB file, both with a conventional cat and an OOFS-aware cat plus the pass- through daemon mentioned above. Differences reflect both name lookup and IPC overhead. The third test compares the time required to ftp a 1.5 MB file as opposed to copying it with an OOFS-aware cp and the OOFS ftp daemon. (The OOFS-aware cp is at an additional disadvantage since it performs several extra stat operations, necessitating additional remote ftp server interactions.) The tests were performed on a lightly loaded Sun 4/280 running Sun/OS 4.1.3. For the tests involving OOFS, approximate user and system times for the daemon process are also presented. The data appear in Table 1. USENIX -- Winter '94 15 Summit Filesystem Daemons as a Unifying Mechanism... ________________________________________________________________________ | user, system, | | real user system daemon daemon | | Test 1 (stat) | | | | no OOFS 0.0003 0.00002 0.0003 - - | | OOFS, no daemon 0.0037 0.0008 0.0016 - - | | OOFS with daemon 0.25 0.041 0.12 0.0019 0.0009 | | | | Test 2 (cat 1 MB file) | | | | no OOFS 13.7 5.1 0.7 - - | | OOFS 23.5 6.8 4.5 1.4 2.0 | | | | Test 3 (ftp 1.5 MB file) | | | | ftp (no OOFS) 90.1 0.2 2.5 - - | | cp plus OOFS 110.3 4.5 11.3 2.7 8.0 | | | | | | Table 1 | | Performance Comparisons (all times in seconds) | |_______________________________________________________________________| There is an obvious performance degradation when the OOFS library is in use, although it should be noted that this is a prototype implementation against which no serious optimization attempts have yet been made. No attempt has been made to measure performance of the more interactive operations involving the gopher and WWW daemons. 8. Open Issues and Future Work The core OOFS mechanisms could use improvements in several areas. The library needs to be merged with an established, kernel-resident system call or I/O interception scheme, or at least made to work as a dynamically-linkable run-time library. A well-defined inheritance mechanism would cement its reputation as an object-oriented filesystem and provide support for such situations as remotely-mapped self-uncompressing files. Ideally, daemon attachment would be recorded in the inode; to do so would obviously require both kernel modifications and extensions to the on-disk inode structure. This project is one of those many that presume openness and trust; concerns of security and authentication have been secondary. Although the daemon interface protocol has a few authentication hooks, they are not really implemented in practice by the existing daemons. (Obviously, when a daemon is providing access to a public resource, authentication is a non-issue; the daemon doesn't really care who its client is.) USENIX -- Winter '94 16 Summit Filesystem Daemons as a Unifying Mechanism... The filesystem daemon scheme provides a fertile potential bed for the introduction of Trojan Horses and other mayhem. (Having a program fire up simply because a file is accessed is a cracker's dream come true.) On a timesharing system, if many users are using OOFS-aware applications, it may be appropriate to ensure that daemons run as their authors, and not as their invokers. When daemons are attached in relatively few places (the applications discussed in this paper involve only top-level pseudodirectories such as /ftp and /gopher), it is practical to manually maintain the auxiliary files which record daemon attachment. If heavier use of the scheme were to be made, it would be important to implement locking or other automated handling of the auxiliary files, to prevent conflicts, and to support automated updating of daemon attachments when files are renamed or deleted. The attempt to map non-filesystem-oriented information services such as gopher and WWW onto a simulated filesystem has revealed a few things that the designers of such protocols could do to make the mapping task easier and more meaningful. Specifically, it would be beneficial (for any automated use, not just for OOFS daemons) if a protocol could provide: 1. Useful distinctions between error conditions (e.g. "mal- formed request" vs. "item not found" vs. "permission denied" vs. "unexpected I/O error"); 2. A means of checking for the presence of an item, or returning status information about it, without retrieving its contents; 3. Short names for items (in addition to any implicit indices or links); 4. Dates and/or modification times for items; and 5. A well-defined way to retrieve partial items, for example to read the second 1 KB block of a 100 KB item. The preceding wish list is in increasing order of difficulty, and decreasing order of likelihood. OOFS is able to proceed without any of these features, and many protocols may be utterly unable to provide them, but where possible their availability would make filesystem emulation considerably more seamless. 9. Comparison to Other Work Remote/networked/distributed filesystems have been implemented and described many times; examples are the Newcastle Connection [Brownbridge 82], IBIS [Tichy 84], RFS [Rifkin 86], the Andrew File System [Howard 88], and NFS [Sun 89]. OOFS is more general, intended to allow arbitrary processing during file USENIX -- Winter '94 17 Summit Filesystem Daemons as a Unifying Mechanism... access; accessing remote filesystems over a network is but one obvious application. Several systems have supported heterogeneous filesystems (in part in support of networked file systems, as above) by inserting a level of indirection at the filesystem interface: examples are Sun's Vnodes [Kleiman 86] and the Version 8 typed file system [Weinberger 84, Rago 90]. Inserting extra functionality at the system call interface is a process which appears in several guises; many of the systems described in this section implement their extensions in this way. Jones discusses several aspects of system call extension and provides an excellent bibliography in [Jones 93]; another implementation is described in [Krell 92]. The idea of attaching arbitrary processing modules to certain I/O streams is not new; it is central to Research UNIX's streams [Ritchie 84b] and to apollo's DOMAIN system [Rees 86]. What this paper calls "filesystem daemons" have been described and implemented several times: one implementation is Bershad and Pinkerton's "Watchdogs" [Bershad 88]; a related idea is implemented by Eggert and Parker's IFS in [Eggert 93]. OOFS, then, is not terribly unique: it shares with Watchdogs and IFS the ability to do arbitrary processing, with per-file (as opposed to per-filesystem) granularity. It shares with the Newcastle Connection, IBIS, and IFS a user-mode implementation which requires no kernel modifications. One significant feature of the OOFS scheme is that it is designed to work with daemons which provide both more and less than the canonical level of processing: some daemons are barely able to provide a minimal simulation of filesystem semantics, but some daemons are able to support extended operations unheard of in conventional filesystems. Current networking issues of broad interest include the problems of resource discovery, resource naming, resource organization, and resource access. Resource discovery is the focus of archie [Emtage 92], gopher, and WWW. One attempt at a systematic approach to uniform resource naming in the face of heterogeneous protocols is the Uniform Resource Locator (URL) scheme [Berners-Lee 93]. Much ongoing research attacks the resource organization problem; the Prospero system [Neuman 89] provides an excellent example. The applications described in this paper are primarily directed at the resource access problem, although the abilities to browse heterogeneous resources using standard tools and target them with symbolic links provide some support for the discovery and organization problems. (Appropriate OOFS daemons, including those described in this paper, could also provide a tidy implementation of the URL scheme, in the form of a /url pseudodirectory containing subdirectories ftp, http, etc.) USENIX -- Winter '94 18 Summit Filesystem Daemons as a Unifying Mechanism... 10. Conclusions This paper has described a mechanism whereby heterogeneous network information services can be integrated more-or-less transparently into a local filesystem, such that they can be accessed and manipulated using standard, familiar tools. The emulation is not perfect (not all information sources can be made to mimic all filesystem semantics), but the limitations of the emulation are definitely balanced by the advantages of being able to use standard tools. When tools and protocols share standard interfaces, they can be combined in arbitrarily powerful ways to solve problems not originally envisioned. The integration depends, in this case, on making the filesystem interface (as embodied in the operating system calls open, read, etc.) a rendezvous point between network services and local utilities, such that services and utilities written at different times, under different sets of assumptions, with different immediate goals in mind, by different people, and without advance knowledge of each other, can nevertheless interoperate. Acknowledgements Thanks to Mark Brader, Stan Brown, Paul Eggert, Michael Jones, Jeff Mogul, and Melanie Summit for their comments on earlier drafts of this paper. Thanks to Robert Dinse at Eskimo North for providing the system on which most of this work was performed. USENIX -- Winter '94 19 Summit Filesystem Daemons as a Unifying Mechanism... References [Anklesaria 93] F. Anklesaria et al, "F.Y.I. on the Internet Gopher Protocol," March, 1993, URL=ftp: //boombox.micro.umn.edu/pub/gopher/ gopher_protocol/DRAFT_Gopher_FYI_RFC.txt (also RFC-1436). [Berners-Lee 93] Tim Berners-Lee, "Uniform Resource Locators," Internet Draft, March, 1993, URL=ftp: //info.cern.ch/pub/ietf/url4.txt . [Bershad 88] Brian N. Bershad and C. Brian Pinkerton, "Watchdogs -- Extending the UNIX File System," Computing Systems, 1:2, 1988, pp. 169-188. [Brownbridge 82] D.R. Brownbridge, L.F. Marshall, and B. Randell, "The Newcastle Connection or UNIXes of the World Unite!," Software -- Practice and Experience, 12, 1992, pp. 1147-1162. [Cate 92] Vincent Cate, "Alex -- a Global Filesystem," in Proceedings of the USENIX File Systems Workshop, Ann Arbor, MI, 1992, pp. 1-11. [Davis 90] Franklin Davis et al, "WAIS Interface Protocol Prototype Functional Specification," April 23, 1990, URL=ftp://quake.think.com/pub/wais/doc/ protspec.txt . [Eggert 93] Paul R. Eggert and D. Stott Parker, "File Systems in User Space," in Proceedings of the Winter 1993 USENIX Conference, San Diego. [Emtage 92] Alan Emtage and Peter Deutsch, "archie -- An Electronic Directory Service for the Internet," in Proceedings of the Winter 1992 USENIX Conference, San Francisco, pp. 93-110. [Howard 88] J.H. Howard, "An Overview of the Andrew File System," in Winter 1988 USENIX Conference Proceedings, Dallas, February, 1988. [Jones 93] Michael B. Jones, "Interposition Agents: Transparently Interposing User Code at the System Interface," in Proceedings of the 14th ACM Symposium on Operating Systems Principles, Asheville, NC, December, 1993. [Kahle 89] Brewster Kahle, "Wide Area Information Server Concepts," November 3, 1989, URL=ftp: //quake.think.com/pub/wais/doc/wais-concepts.txt . USENIX -- Winter '94 20 Summit Filesystem Daemons as a Unifying Mechanism... [Kleiman 86] S.R. Kleiman, "Vnodes: An Architecture for Multiple File System Types in Sun UNIX," in USENIX Association, Summer Conference Proceedings, Atlanta, 1986, pp. 238-247. [Korn 90] David G. Korn and Eduardo Krell, "A New Dimension for the Unix File System," Software -- Practice and Experience, 20:S1, June, 1990, pp. 19-34. [Krell 92] Eduardo Krell and Balachander Krishnamurthy, "COLA: Customized Overlaying," in Proceedings of the Winter 1992 USENIX Conference, San Francisco, pp. 3-7. [Krol 92] Ed Krol, The Whole Internet User's Guide & Catalog, O'Reilly & Associates, 1992, ISBN 1-56592-025-2. [Neuman 89] B. Clifford Neuman, "The Virtual System Model for Large Distributed Operating Systems," Technical Report 89-01-07, April, 1989, Department of Computer Science, University of Washington, Seattle. [Rago 90] Stephen Rago, "A Look at the Ninth Edition Network File System," in UNIX Research System Papers, Volume II, Saunders College Publishing, 1990, ISBN 0-03-047529-5. [Rees 86] Jim Rees, Paul H. Levine, Nathaniel Mishkin, and Paul J. Leach, "An Extensible I/O System," in USENIX Association, Summer Conference Proceedings, Atlanta, 1986, pp. 114-125. [Rifkin 86] Andrew P. Rifkin et al, "RFS Architectural Overview," in USENIX Conference Proceedings, Atlanta, GA, June, 1986. [Ritchie 84a] Dennis M. Ritchie, "A Stream Input-Output System," AT&T Bell Laboratories Technical Journal, 63:8, October, 1984, pp. 1897-1910. [Ritchie 84b] Dennis M. Ritchie, "The Evolution of the UNIX Time-sharing System," AT&T Bell Laboratories Technical Journal, 63:8, October, 1984, pp. 1577- 1593. [Roome 92] W.D. Roome, "3DFS: A Time-Oriented File Server," in Proceedings of the Winter 1992 USENIX Conference, San Francisco, pp. 405-418. [Summit 89] Steve Summit, "A Reimplementation of the Standard I/O Package," unpublished. USENIX -- Winter '94 21 Summit Filesystem Daemons as a Unifying Mechanism... [Sun 89] Sun Microsystems, Inc., "NFS: Network File System Protocol Specification," Internet RFC- 1094, March, 1989. [Tichy 84] Walter F. Tichy and Zuwang Ruan, "Towards a Distributed File System," in USENIX Association/Software Tools Users Group, Summer Conference Proceedings, Salt Lake City, 1984, pp. 87-97. [Warnock 84] Robert P. Warnock III, "User-Mode Development of Hardware and Kernel Software," in USENIX Association/Software Tools Users Group, Summer Conference Proceedings, Salt Lake City, 1984, pp. 224-226. [Weinberger 84] P.J. Weinberger, "The Version 8 Network File System" (abstract), in USENIX Association/ Software Tools Users Group, Summer Conference Proceedings, Salt Lake City, 1984, p. 86. [WWW 93] "Protocol for the Retrieval and Manipulation of Textual and Hypermedia Information," June, 1993, URL=ftp://info.cern.ch/pub/www/doc/http-spec.ps . Author Information Steve Summit attended the Massachusetts Institute of Technology from 1979 to 1983, receiving a Bachelor of Science degree in Electrical Engineering and Computer Science. He has worked as a software engineer since then, presently as an independent consultant. His interests and specialties are too undifferentiated, and his mistrust of buzzwords too complete, to attempt to list them here; this paper is however not unrepresentative. He can be reached at scs@eskimo.com . USENIX -- Winter '94 22