Student Research Grantsby Gale Berkowitz<gale@usenix.org>
USENIX Deputy Executive Director
Intra-Address Space Protection Using Segmentation/Paging Protection
by Ganesh Venkitachalam and Prashant Pradhan Introduction Two major software applications trends call for operating systems support for establishing protection boundaries among program modules that execute in the same address space. First, the notion of dynamic extensibility has prevailed in almost every major software systems area, ranging from extensible database systems[4] to which third-party data blades can be added to perform type-specific data processing, extensible operating systems[1, 2, 3] that support application-specific resource management policies, to programmable active network devices[5, 6] that allow protocol code running on network devices to be tailored to individual applications. A key feature of extensible software systems is its support of live addition and removal of software modules into a running program. Therefore, an effective and efficient mechanism to protect the core of the running program from dynamically inserted modules is crucial to the long-term viability of extensible systems. Second, component-based software development (CBSD)[7] is emerging as the dominant software development methodology, because it significantly improves software productivity by encouraging modularity and reusability. As software components produced by multiple vendors are used to construct complete applications, a proper level of protection among software components is essential to address the most challenging problem of CBSD: interference among separately developed components and the resulting system instability. With appropriate intercomponent protection, it is easier to isolate the buggy component and pinpoint the cause of software malfunctioning. The Palladium project, sponsored by a USENIX Student Research grant, aims to develop an intra-address space protection mechanism by exploiting the virtual memory protection hardware at the paging and segmentation levels built into the Intel x86 processor architecture since 80836. Although a number of approaches have been proposed to provide intra-address space protection, such as software fault isolation[8], type-safe languages[1], interpretive languages[9], and proof-carrying code[10], there is no clear winner that completely solves all the design issues: safety from corrupting extension modules, run-time performance overhead, and programming simplicity. The commonality among all the above approaches is the use of software-only techniques to create protection domains within an address space, based on the assumption that hardware-based protection mechanisms are only applicable to inter-address space protection. In contrast, Palladium's intra-address space protection mechanism is efficient in terms of run-time overhead, guarantees the same level of safety as using separate address spaces, and does not increase the complexity of the deployment and development of software extensions. Although the proposed mechanism is geared towards the Intel x86 architecture, the fact that the architecture in question dominates more than 90% of the world's desktop computer market ensures that it will have wide applicability and thus significant impact. Approach The current Palladium prototype is implemented under Linux. A user-level Linux process can dynamically load an extension module into its address space using dlopen, dlsym, and dlclose functions. There is a potential protection issue between the main application, such as a database management system, and the dynamically loaded extension modules, such as type-specific access methods, because they reside in the same address. Note that we focus mainly on the protection between the main application and its extensions, rather than on the protection between extensions. In x86 architecture, a piece of code or data resides in a specific segment at a specific protection level. There are four protection levels, 0 being the most privileged and 3 being least privileged. x86 architecture also provides a two-level page-granularity protection, with the following mapping rule: segmentation protection levels 0, 1, and 2 are mapped to page protection level 0, and segmentation protection level 3 to page protection level 1. A code can access data only at the same or less privileged levels. Similarly, a code can transfer control only to another code that is at the same or less privileged levels. In order to transfer control from a less privileged to a more privileged level it must go through an explicit interrupt, which allows the kernel to step in and performs necessary checks. Existing operating systems on the x86 architecture, such as Linux and Windows NT, typically run the kernel at protection level 0 and user-level applications at protection level 3. Palladium, on the other hand, puts the main application in a segment at protection level 2 and the extension modules in another segment at protection level 3. These two segments cover exactly the same address space range, 0 to 3 Gb, but at different paging and segmentation protection levels. As a result, when an extension module attempts to access pages in the main application, the access could go through the segmentation check (because it uses its own segment descriptor), but it would be stopped by the page-level protection check, because the extension's page protection level is less privileged than that of the target page, which is owned by the main application. On the other hand, the kernel segment spans from 0 to 4 Gb. Therefore, although the pages in the kernel address space (34Gb) and the user address space (03Gb) are all at page protection level 0, the main user-level application cannot access the kernel address space because of segment-level length check. In the end, a user-level extension can only access its own code, data, and stack, as well as shared libraries and data regions exposed by the main application. To summarize: The segmentation check ensures that the kernel is protected from the applications, and the paging check protects the applications from their extensions exactly the protection properties we are looking for! To use Palladium's user-level extension mechanism, the main application has to use a safe version of the dynamic loading package, i.e., seg_dlopen, seg_dlsym, and seg_dlclose, to load, access, and close shared libraries. However, seg_dlsym should be used only for acquiring function pointers. To obtain pointers to data structures inside an extension segment, dlsym should be used instead. In addition, the main application should call the init_PL function in the beginning of the program to promote itself to segmentation protection level 2 and should mark all its pages as page protection level 0. To expose shared pages to extensions, the application can use the set_range system call to mark those pages as page protection level 1. To expose an application service that extensions could use, the application uses the set_call_gate system call to set up a call gate with a pointer to the corresponding application service function. Programming user-level extensions is exactly identical to developing a user-level library routine, except that xmalloc instead of malloc should be used to ensure that it's the extension segment's heap that is being allocated. Palladium's extensions are compiled with GCC, just like conventional shared libraries. Calling an extension function from an application and returning from a called extension back to the calling application follow exactly the standard C syntax, although applications and extensions reside at different privilege levels. Current Status As part of his Master's thesis, Ganesh Venkitachalam has built a working user-level extension system based on the Palladium architecture as described above. To demonstrate the utility of this system, he also built a LibCGI system that allows CGI scripts to be invoked as local procedure calls in a safe fashion, rather than through inter-process communications. Prashant Pradhan has successfully constructed a kernel extension mechanism based on the same segmentation hardware but slightly different software architecture. He has applied this kernel-extension mechanism to his Ph.D. dissertation project, a cluster-based high-performance programmable network router. Prashant is currently working on integrating the user-level and kernel-level extension mechanisms into a fully operational Palladium prototype, which will be part of the Stony Brook Linux Distribution (SLD). A paper that describes the initial design and implementation of Palladium appeared in the 1999 HotOS Workshop. Another paper that describes the LibCGI system in more detail, including how it exploits Palladium's user-level extension system, is scheduled to appear in the 1999 IEEE Workshop on Internet Applications. Both papers are available on the project's Web page: <http://www.ecsl.cs.sunysb.edu/extension.html>. Despite the fact that a rich set of protection hardware features have been built into Intel x86 architecture since the days of the 80386, to the best of our knowledge no operating systems or software applications that effectively exploited these hardware mechanisms have been reported in the literature. Palladium is, if not the first, one of the first successful attempts, and it completely solves the protection problem in the emerging extensible and componentized software systems. Reference [1] Bershad, B.N.; Savage, S.; Pardyak, P.; Sirer, E.G.; Fiuczynski, M.E.; Becker, D.; Chambers, C.; and Eggers, S. "Extensibility, safety and performance in the SPIN operating system." ACM Operating Systems Review, vol. 29, no. 5, pp. 26784, Dec. 1995. [2] Kaashoek, M.F.; Engler, D.R.; Ganger, G.R.; Briceno, H.M.; Hunt, R.; MaziËres, D.; Pinckney, T.; Grimm, R.; Jannotti, J.; and Mackenzie, K. "Application performance and flexibility on exokernel systems" ACM Operating Systems Review, vol. 31, no. 5, pp. 5265, Dec. 1997. [3] Seltzer, M.I.; Endo, Y.; Small, C.; and Smith, K.A. "Dealing with disaster: surviving misbehaved kernel extensions." ACM Operating Systems Review, vol. 30, pp. 21327, Oct. 1996. [4] Stonebraker, M.; and Kemnitz, G. "The POSTGRES next-generation database management system." Communications of the ACM, vol. 34, no. 10, pp. 7892, Oct. 1991. [5] Alexander, D.S.; Arbaugh, W.A.; Keromytis, A.D.; and Smith, J.M. "A secure active network environment architecture: realization in SwitchWare." IEEE Network, vol. 12, no. 3, pp. 3745, MayJune 1998. [6] Tennenhouse, D.L.; Smith, J.M.; Sincoskie, W.D.; Wetherall, D.J.; and Minden, G.J. "A survey of active network research." IEEE Communications Magazine, vol. 35, no. 1, pp. 8086, Jan. 1997. [7] Mendelsohn, N. "Operating systems for component software environments." The 6th Workshop on Hot Topics in Operating Systems (HotOS-VI), Cape Cod, Mass., May 1997. [8] Wahbe, R.; Lucco, S.; Anderson, T.E.; and Graham, S.L. "Efficient software-based fault isolation." ACM Operating Systems Review, vol. 27, no. 5, pp. 20316, Dec. 1993. [9] McCanne, S.; and Jacobson, V. "The BSD packet filter: a new architecture for user-level packet capture.." Proceedings of the Winter 1993 USENIX Conference, pp. 25969, Jan. 1993. [10] Necula, G.C.; and Lee, P. "Safe kernel extensions without run-time checking." ACM Operating Systems Review, vol. 30, spec. issue., pp. 22943, Oct. 1996. For more information about this project, visit <http://www.ecsl.cs.sunysb.edu/extension.html>, or contact the investigators directly: Ganesh Venkitachalam (<ganesh@cs.sunysb.edu>),or Tzi-cker Chiueh (<chiueh@cs.sunysb.edu>).
USEwebNET and PaperFinder
By Thanos Papathanasiou Overview USEwebNET and PaperFinder are two valuable tools that aim to help scientists and all interested users in finding useful information available on-line on the World Wide Web. They address the problem of "information overloading" by keeping track of the newest information that gets published on the Web about registered subjects and provide the opportunity for each user to create a personal profile with his own registered interests. USEwebNET may be described as a general purpose meta-search engine that operates on top of several popular search engines and filters the results in order to avoid presenting the users with already known information. PaperFinder is an extension of USEwebNET that retrieves scientific publications from well-known digital libraries. It also supports a Resource Discovery Mode of operation, which broadens a keyword-based query in order to reveal more useful information, then filters the results through an innovative similarity metric of documents. Information retrieval on the Web is like searching for a needle in a haystack: one needs the right tools (e.g., a metal detector) to separate the needle from the hay. According to the traditional model of searching for information on the Web, when a user wants to find information about a specific topic (s)he sends a query to a search engine, which replies with several URLs. Every time the user wants to find new information about the same topic, the search engine returns (roughly) the same URLs, flooding the user with unnecessary information. Finding the new and interesting URLs within a slow-moving flood of previously visited URLs is a boring and time-consuming task. Assume, for example, that a user is interested in learning new developments in the field of Web caching. A search for this information using a popular search engine like HOTBOT will return more than 83,000 URLs. Of those URLs, only a small percentage (recently, 63 of the 83,000) were created/modified within the last month. Under this search model, users who are interested in getting only new information must process 83,000 URLs in order to extract the few recent ones. One could argue that recent information can be retrieved effectively by requesting only documents that have been published/changed after a specific date (am option with several search engines). Unfortunately, this approach may still result in a document flood. Since the Web is growing at an alarming rate, the robots associated with search engines visit (and index) various sites rather infrequently. Popular Web robots such as AltaVista visit their archived sites every 2 to 3 months. As the amount of information available via the Web grows larger, this interval is bound to increase to perhaps once every 56 months. Thus, if a user wants to find previously unseen information on a specific topic (s)he would have to search for documents that are less than six months old. To return to our example: Such a search on HOTBOT for documents on Web caching returned more than 30,000 documents. The core of the search problem is that all these engines represent single-shot search mechanisms. A user who repeatedly searches using the same keywords is flooded with almost the same URLs, even if (s)he has visited the URLs as a result of some previous search. This single-shot search contrasts with modern methods of knowledge discovery based on re-searching. Re-searching is an iterative process that filters away useless or already acquired knowledge, focusing on new and unexplored territory. USEwebNET is a layer of software on top of existing search engines, filtering away previously seen information, providing a service that allows users to stay informed of new developments.
Figure 1: A screenshot from USEwebNET. The user is checking the results that have been downloaded for two registered queries. He also reads an article published in ERCIM News that has been retrieved from the query "Athanasios Papathanasiou." USEwebNET consists of a front end that interacts with users and a back end that interacts with search engines. Users register their interests with USEwebNET as a set of keyword-based queries. For example, people interested in caching mechanisms for the Web may register their interest as "Web caching." In addition, people indicate several search engines they would like to query about their interests. The user interface of USEwebNET is based on the popular and friendly interface of USENET News (Figure 1). Every registered query is used like a newsgroup and new URLs are equivalent to unread articles in a newsgroup. Periodically (usually every night) the back end of USEwebNET submits each query to the indicated search engines, which reply with a (usually long) sequence of URLs and a short description for each URL. USEwebNET gathers all replies, merges them, deletes duplicates, and constructs a list with URLs that satisfy the query. It then removes from the list all URLs that have been viewed previously by the user. When the user is interested in seeing recent developments on a given query, (s)he invokes USEwebNET and is presented only with new URLs. The user may decide to view a URL or may mark it as uninteresting and delete it. In either case, USEwebNET will consider the URL as viewed and will not present it to the user again. Viewed documents will be shown again to the user only if USEwebNET detects that an update has been made. To facilitate acquisition of knowledge, USEwebNET allows users to save URLs in folders. Thus, users can store all interesting URLs and visit them at some later time, or use them as references. Over time, folders will eventually become indispensable reference tools for research. In order to help users check the results of their registered queries, USEwebNET maintains a mail notification system, which, when activated by the user, sends an email notification whenever a significant amount of new information has been discovered. Finally, the AT&T Difference Engine[1] has been integrated into USEwebNET's system in order to help users keep track of modifications to especially interesting Web pages. PaperFinder Described USEwebNET has been designed as a layer of software that runs on top of existing information providers. Its purpose is to discover and filter information in a specific subject. It may be customized to work with specific databases in order to find information more effectively. This idea has been previously stated by M. Schwartz[2]: Information extraction is most effective when exploiting the semantics of particular types of files and particular execution environments. Thus, USEwebNET was extended in order to take advantage of the particular characteristics of scientific publications and the particular services provided by popular digital libraries. These extensions lead to the creation of PaperFinder, a research paper discovery tool, which operates on top of digital libraries and aims to help scientists staying informed about the latest evolutions on their field. Scientists always want to stay informed. In order to do so they subscribe to scientific journals, go to conferences, collaborate with colleagues, etc. To narrow down the information they receive, scientists subscribe only to a small subset of journals and follow only a small number of conferences. Unfortunately, the number of scientific publications increases year after year, making it increasingly harder for a single person to keep track of published papers in a field. Thus, a tool that could deliver to scientists only the interesting research papers in their field would be very useful. PaperFinder meets these requirements and comes very close to this ideal tool. Currently, most of the scientific publishers provide on-line databases with the titles, authors, abstracts, and sometimes even full text of their publications. PaperFinder can be used as a research paper discovery tool on top of several such databases. To use our earlier example, a scientist may submit to PaperFinder that (s)he is interested in "Web caching." PaperFinder will continually monitor the research paper databases to find papers that match the query. If such matches are found they are stored in a database. When the user invokes PaperFinder, (s)he will view the new papers found. After the user reads these papers they will not show up again unless the user saves them in a folder. Effectively, the user registers his or her interests with USEwebNET, and the tool continually delivers new research papers found on this field without delivering the same paper twice. Besides the classic keyword-based way of retrieving information from databases, PaperFinder provides an innovative Resource Discovery Mode of operation. The purpose of this mode is to collect as many papers as possible about a topic and help the user identify those of primary interest. In the Resource Discovery Mode, the user indicates one or more "seed papers" and expects PaperFinder to discover new articles that are related to them and/or sort the retrieved papers with respect to the seed papers. PaperFinder uses query generalization and filtering to discover papers related to the seed paper(s). Query Generalization The goal of this step is to find papers that are related to the seed paper. This may be accomplished by forming queries whose results should return the seed papers as well as several other papers. Methods for generalizing an initial query include the following: (i) Keyword extraction from the titles, abstracts, or/and text of the seed papers. This method complements the initial keyword-based query with more keywords or key phrases, which may lead to the retrieval of a larger amount of useful and interesting information. (ii) Use of bibliographic references. Every scientific publication contains a "Related Work" section. Conversely, a research paper forms the basis for future developments and is cited by future articles. The citations of a seed paper as well the articles that cite the seed paper provide a large number of interesting articles that should be presented to the interested researcher. A tool that follows this method is CiteSeer[3, 4]. (iii) Use of subject descriptors. A subject descriptor consist of a small number of keywords or/and key phrases that characterize the research area in which a publication belongs. An article may belong in several different research areas. Articles that have at least one common subject descriptor may be related and should be presented to the user. This method is used by the Digital Library maintained by ACM[5]. (iv) Use of "seed authors." Authors who tend to cooperate in publishing scientific papers have similar research interests. Thus, retrieving the articles of authors who are scientifically related to a specified seed author may reveal new interesting papers. The current implementation of PaperFinder supports a Query Generalization method that is based on a service provided by ACM's on-line Digital Library, through which papers belonging to the same subject descriptors (maintained and classified by ACM) as the seed paper are downloaded. Filtering The goal of this step is to filter (sort) the papers found in the previous stage and return the most relevant to the user. Finding which papers are relevant can be tricky and result in a flood of irrelevant publications. To find relevant papers PaperFinder applies a similarity metric to the papers. The similarity metric currently used by PaperFinder is a sorting algorithm based on author-distance, a notion that originated from the Erdos number[6]. Paul Erdos, who was a brilliant and widely traveled Hungarian mathematician, wrote hundreds of mathematical research papers in many different areas, many in collaboration with others. His Erdos number is 0. His co-authors have Erdos number 1. People other than Erdos who have written a joint paper with someone with Erdos number 1 but not with Erdos have Erdos number 2, and so on. Based on the Erdos number, we define a similar metric we call author-distance. Two authors have an author-distance number of one if they have co-authored at least one paper. Their author-distance is two if they are not co-authors but there exists at least one third author who has written a paper with both of them, and so on. The distance between two authors who are not co-authors is found by the transitive closure of the distances among co-authors. Using author-distances, the scores used for sorting the papers are calculated as follows. The sum of the author-distances among a retrieved paper's authors and each of the seed paper's authors is computed and the arithmetic mean is calculated. A small arithmetic mean shows that the respective publication is closely related to the seed paper. The rationale behind this approach is that closely related articles are expected to have authors who have similar interests and some of them probably have co-authored a paper. Project Status Two fully stable implementations of USEwebNET and PaperFinder have been developed. Both tools have been installed on a server at the Institute of Computer ScienceFORTH and have been operational for the past six months. Using USEwebNET for this time period has shown that people prefer to use USEwebNET instead of directly querying search engines, mainly because of USEwebNET'sability to retain the history of each query, automatically download new results about a registered topic, and provide several useful operations for each retrieved Web page. Currently, experiments are being conducted to evaluate PaperFinder's Resource Discovery Mode of operation. The first experimental results have proved that PaperFinder's sorting algorithm is able to provide at least as good results as those provided by the digital library of ACM. However, there is still room for improving the filter. For example, at least two other distance metrics may be used to improve our filtering algorithm: (1) Weighted-author distance: The distance between two co-authors is defined as the number of papers they have co-authored over the total number of papers they have authored. The distance between two authors who are not co-authors is found by the transitive closure of the distances among co-authors. (2) Keyword distance: Given two papers and a set of keywords, the distance between the papers is defined as the number of keywords that appears in (the title/abstract/text) of both papers. In addition, new ways of exploiting the ideas of seed authors (papers) and author-distances should be explored. The first experimental results showed that PaperFinder has the ability to identify the work of distinctive research groups just by using one of their publications as a seed paper in the filtering process, which in turn leads to the idea of identifying clusters of cooperating research groups and exploiting links (publications) among them in order to find articles that couldn't be discovered by other search methods. Implications We view USEwebNET and PaperFinder as value-added services on top of existing information providers. They offer several general advantages:
running at night they offload busy Web servers and proxies on the Internet. PaperFinder provides additional services:
You may use USEwebNET and PaperFinder by connecting to <http://cuckoo.ics.forth.gr:9002/>. Two papers (a refereed and a short paper) concerning USEwebNET have already been published [7, 8]. We hope to present USEwebNET and PaperFinder at a future USENIX conference. The most suitable conference for this purpose would be USITS '99, where both tools could be presented in a Works-in-Progress report. Contact Information
Athanasios E. Papathanasiou: ICS-FORTH; tel: +30 81 391437; Evangelos P. Markatos: ICS-FORTH; tel: +30 81 391655; email <markatos@ics.forth.gr>. References [1] Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsofios. "The AT&T Internet Difference Engine: Tracking and Viewing changes on the Web." World Wide Web Journal, 1(1), 1998. [2] Darren R. Hardy and Michael F. Schwartz. "Customized Information Extraction as a Basis for Resource Discovery." ACM Transactions on Computer Systems, Vol. 14, No. 2, May 1996, pp.17199. [3] Steve Lawrence, C. Lee Giles, and Kurt Bollacker. "Digital Libraries and Autonomous Citation Indexing." IEEE Computer, 1998. [4] C. Lee Giles, Kurt D. Bollacker, and Steve Lawrence. "CiteSeer: An Automatic Citation Indexing System." In Proceeding of Digital Libraries '98 Third ACM Conference on Digital Libraries, 1998, pp. 8998. [5] ACM Digital Library: <http://www.acm.org/dl/>. [6] Jerry Grossman. The Erdos Number Project: <http://www.acs.oakland.edu/~grossman/erdoshp.html>. [7] Evangelos P. Markatos, Christina Tziviskou, and Athanasios E. Papathanasiou. "Effective Resource Discovery on the World Wide Web." In Proceedings of WebNet 98 World Conference of the WWW, Internet, and Intranet, pp. 61116, Vol. I, Orlando, Florida, USA, November 1998.
[8] Athanasios E. Papathanasiou, Evangelos P. Markatos, and Stavros A.
Papadakis. "PaperFinder: A Tool for Scalable Search of Digital
Libraries." Work in Progress Short Paper in WebNet 98 World
Conference of the WWW, Internet, and Intranet, Orlando, Florida,
USA, November 1998.
|
|
|
Last changed: 30 Nov. 1999 mc |
|