Next: Conclusion and Open Up: Towards a Semantic-Aware Previous: Application Scenarios

Related Work

Contemporary file systems use file type information to associate files with the appropriate applications to access them. Further, several systems have experimented with the idea of attribute-based file naming [7,8,13,16,18]. The file system supports searching on the basis of attributes; the results are reflected on virtual directories that contain pointers to the actual locations of files.

SFS [7] uses a hierarchical directory structure to organize refinements to previous query results. HAC [8] attempts to combine the benefits of hierarchical and content-based access to files at the same time. A virtual directory (resulting from a query) is an actual directory that allows ordinary file system operations. To maintain the consistency between links in a virtual directory and the files they point to, HAC re-executes queries periodically to update the links in virtual directories.

Several systems allow for more flexible ways to combine the hierarchical name space with attribute-based file naming. A file system by Transarc [3] allows each file to have an associated wrapper, called a synopsis, that contains tag/value attributes and defines methods to manipulate those attributes. Synopses are organized in inheritance hierarchies. Similarly, in a system described in [18], each query is given a label. Users can impose ``ancestor-descendant'' relationship on labels, and consequently can name files by specifying either the path name that contains labels, or a list of queries the files satisfy, or both. In the Prospero system [13], users can program ``filters'' that create personalized views of file systems.

In Presto [16], documents can be organized according to properties (attributes) that are associated with the documents, without the limitations of hierarchies. Properties can be specific to an individual document consumer. Unlike HAC, Presto does not intend to handle backward compatibility to the traditional file system abstraction.

All these systems focus mainly on simple attributes; queries are limited to ad-hoc attribute match. pStore provides a generic data model and implementation that capture a more extensive set of semantics. We anticipate that these attributed-based file systems can be easily implemented using pStore and pStore 's generality can be explored to provide new functionalities that do not exist in these systems.

Several projects study metadata management in a file system setting. Roma [20] provides an available, centralized repository of metadata to ``synchronize" a single user's files across a diversity of digital storage devices. Roma metadata include fully-extensible attributes that could be used for organizing and locating files. However, its current prototype does not utilize attributes for searching.

The Inversion file system [14] runs on top of the POSTGRES database. It allows fine-grained time travel---a user may ask to see the state of the file system at any time in the past. Accesses to the file system are transactional. It is possible to issue ad-hoc queries on the file system metadata, or even to file data. IBM's DataLink [9] project uses a relational database to capture a wide set of semantic information in file systems. The database contains references to objects in the file system. However, not all applications require the heavyweight ACID properties and features of a fully-fleshed database system. Moreover, database systems cannot effectively handle the incremental evolution of schema, common when managing unstructured data.

It is interesting to note that, as early as 1986, Mogul [12] has proposed a model of files that includes the concept of file properties. Mogul also agrees that database systems are too heavyweight, and relationships between files are important.

Our work complements the semantic Web [21] by concentrating on the system aspects and metadata management in a storage setting. Further, pStore provides additional functionality, e.g., tunable consistency based on an event-framework. It is a framework that provides predefined but customizable components. One example is the predefined types of metadata (e.g., content- and context-based semantics) each possibly with predetermined consistency models.

Next: Conclusion and Open Up: Towards a Semantic-Aware Previous: Application Scenarios

Magnus Karlsson
ti 17 jun 2003 14.32.10