USENIX ;login: - Summaries

2nd USENIX Symposium on Internet Technologies & Systems (USITS '99)
October 11—14, 1999
BOULDER, COLORADO, USA

These reports were originally published in the February 2000 issue of ;login:.

Keynote Address

Session: Shared Caching

Session: Applications

Session: Techniques

Session: Proxy Implementation

Session: Prefetching

Session: Architectures

Works-in-Progress Reports

Session: Caching Policies

Session: Server Implementation

Our thanks to the summarizers:
Steven Bird
Rick Casey

KEYNOTE ADDRESS

E-Commerce--An Optimistic View
Udi Manber, Yahoo! Inc.

Summarized by Steven Bird

Dr. Udi Manber posed the rhetorical question, "Will e-commerce change the world?" Using the meteoric rags-to-riches success that Yahoo! embodies, Manber described the transformation from bricks and mortar (BM) that e-commerce enables. The end result is still quite debatable, but Manber believes that the winners will share several traits. One of these traits will be the ability to provide a one-stop shop that will satisfy the bulk of visitors' needs. The winners will also realize that these needs are not strictly material, and that the direct translation of a BM operation to an online presence will prove to be tragically shortsighted. Manber believes that a successful online strategy must include an abundance of means by which the users can establish community. He described the development of email, chat rooms, clubs, and message boards as all being critical elements in the success of Yahoo! In addition, an intuitive user interface (20,000 help page hits out of a total of 200 million accesses) shows Yahoo!'s success here will be crucial.

Manber finished with a few speculations on what the future might hold. One prediction is that online advertisements will evolve from their present annoyance status to a source of useful connections and resources. His contention is that the ad strategy of today is equivalent to fishing with dynamite. A second and even more novel idea is a "Universal ID" that everything carries and that a person could "click" on to buy. The product would then be mailed to the purchaser and the owner of the item that had been clicked would receive a commission.

Udi Manber, winner of the 1999 Annual Software Tools Users Group award, is chief scientist at Yahoo! Before joining Yahoo! in 1998, he was a professor of computer science at the University of Arizona. He has written more than 50 technical articles (three of which won USENIX Best Paper awards), codeveloped Agrep, Glimpse, Harvest, and the Search Broker, and wrote a popular textbook on design of algorithms.

Shared Caching

Scalable Web Caching of Frequently Updated Objects Using Reliable Multicast
Dan Li and David R. Cheriton, Stanford University

Summarized by Steven Bird

Dan Li presented a method to address the issue posed by frequently updating objects in a Web cache. To avoid the repeated unicast that this would require, she proposes the use of MMO (multicast invalidation followed by multicast delivery using OTERS) to avoid the negation of the benefits provided by multicast. This is achieved by grouping objects into volumes, each of which maps to one IP multicast group. The benefit from reliable multicast, with volumes of appropriate size, were shown to outweigh the cost of delivering extraneous data. Li demonstrated the scalability of this approach using trace-driven simulations. The bandwidth saving vis-à-vis conventional approaches increased significantly as the audience size grew. Li presented a strong argument that MMO provides efficient bandwidth utilization and
service scalability. This should help to make strong Web-cache consistency for dynamic objects practical.

Hierarchical Cache Consistency in a WAN
Jian Yin, Lorenzo Alvisi, Mike Dahlin, and Calvin Lin, University of Texas at Austin

Jian Yin described a means of improving cache consistency using a flexible, efficient, and scalable tool. Using two primitive mechanisms, split and join, to manage consistency hierarchies and to address the fault-tolerance performance challenges of consistency hierarchies, Yin was able to demonstrate this as a promising configuration for providing strong consistency in a WAN in a two-level consistency hierarchy. His arguments were supported with the use of synthetic workload and trace-based simulation. One particularly promising configuration for the provision of strong consistency on a WAN is a two-level consistency hierarchy in which servers and proxies work to maintain consistency for the data cache at the client.

Organization-Based Analysis of Web-Object Sharing and Caching
Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Molly Brown, Tashana Landray, Denise Pinnel, Anna Karlin, and Henry Levy, University of Washington

Alec Wolman's group examined the sharing of Web documents from an organizational point of view. In light of the fact that performance-enhancing mechanisms on the Web primarily exploit repeated requests to Web documents by multiple clients, organization-based caching can possibly offer efficiencies. Wolman et al. evaluated the patterns of document-sharing access (1) among clients within single organizations and (2) among clients across different organizations. To perform the study, Wolman used the University of Washington as a model of a diverse collection of organizations. Within the university, he traced all external Web requests and responses, anonymizing the data but preserving organizational-membership information. Analysis of both inter- and intra-organization document sharing allowed them to test whether organizational membership was significant.

The results demonstrated a surprising lack of sharing within the organizations they delineated (~2% over random). In addition, there was an overarching commonality between the organizations studied in that there were 850 top servers handling over 50% of the requests analyzed. Also, lots of content is uncacheable. In the question-and-answer period following the presentation, it was speculated that the university is perhaps more homo-geneous than was initially appreciated.

Applications

The Ninja Jukebox
Ian Goldberg, Steven D. Gribble, David Wagner, and Eric A. Brewer, University of California at Berkeley

Summarized by Steven Bird

When presented with the appalling waste of resources represented by the numerous unused CD-ROM drivers at the UC Berkeley computer lab, the Ninja group sprang into action to create a realtime streaming directory of audio from these sites. After they wrote the porting software for streaming delivery, the MP3 revolution arrived and the Ninja Jukebox idea was spawned. The goal was to develop a service that allowed a community of users to build a distributed, collaborative music repository to deliver digital music to Internet clients. The project's success led to an abundance of music, 17 days' worth, and the associated copyright and filtering obstacles this presents. Interface-development efforts were then focused on the development of simple collaborative filtering based on users' song preferences and ownership. The Jukebox, implemented in Java, was designed to allow rapid service evolution and reconfiguration, simplicity of participation, and extensibility. DJ software was developed to profile the users' preferences in a portable and selectively accessible manner. Presenter Steven Gribble concluded with the assertion that the careful use of a distributed component architecture enabled rapid prototyping of the service. He also felt that use of carefully designed, strongly typed interfaces enabled the smooth evolution of the service from a simple prototype to a more complex, mature system. Future modifications include the possibility of using digital cash to provide general access. He is optimistic that the newer version of Java will operate more efficiently and meet the bandwidth demands more gracefully.

Cha-Cha: A System for Organizing Intranet Search Results
Michael Chen, Marti Hearst, Jason Hong, and James Lin, University of California at Berkeley

A standard search engine retrieves Web pages that fall within a diverse range of information contexts but presents these results uniformly in a ranked list.

Michael Chen presented a novel search engine that is based on the premise that intranets contain information associated with the internal workings of an organization. This engine, named "Cha-Cha," organizes Web search results in a manner that reflects the underlying structure of the intranet. This "outline" is created by first recording the shortest paths in hyperlinks from root pages to every page within the Web intranet. After the user issues a query, these shortest paths are dynamically combined to form a hierarchical outline of the context in which the search results occur. Pilot studies and user surveys suggest that some users find this structure more helpful than the standard display for intranet search. Currently a quarter of a million pages are indexed. More information is available at <https://cha-cha.berkeley.edu/>.

A Document-based Framework for Internet Application Control
Todd D. Hodes and Randy H. Katz, University of California at Berkeley

Todd Hodes presented a novel document-based framework for manipulating the components that comprise distributed Internet applications. In the framework, XML documents are used to describe both server-side functionality and the mapping between a client's applications and the servers it accesses. This system contrasts with explicitly context-aware application designs, whereby location information must be explicitly manipulated by the application to effect change.

Instead, Hodes and Katz interposed a middleware layer between client applications and services so that invocations between the two can be transparently remapped, and have found this useful for a subset of application domains, including one example domain of "remote control" of local resources (e.g., lights, stereo components, etc.). Hodes went on to illustrate how the framework allows for (1) remapping of a portion of an existing user interface to a new service, (2) viewing of arbitrary subsets and combinations of the available functionality, and (3) mixing dynamically generated user interfaces with existing user interfaces. The use of a document-based framework in addition to a conventional object-oriented programming language provides a number of key features. One of the most useful is that it exposes the mappings between programs/UI and the objects to which they refer, thereby providing a standard location for manipulation of this indirection.

Techniques

Sting: A TCP-based Network Measurement Tool
Stefan Savage, University of Washington

Summarized by Steven Bird

The tongue-in-cheek theme of Stefan Savage's presentation was that TCP represents an "opportunity" rather than a transport protocol. The novelty of this perspective led to some creative developments and garnered Savage the Best Student Paper award. Savage developed Sting, a tool to quantify one-way packet loss. This feature is not available in ping. Sting is able to accurately measure the packet-loss rate on both the forward and reverse paths between a pair of hosts. This achievement is accomplished by leveraging the behavior of TCP.

In TCP one knows the number of packets sent and the number received. This is enough for ping to work, but determining one-way packet loss requires more information. First, you need to know how many data packets were received at the other end. TCP has to know this, given that it is a reliable protocol. The second required piece of information is the number of ACKs that were sent to you. ACK parity requires that for every data packet received there is an ACK sent. Savage proposed a two-phase algorithm. Phase one is the data-seed phase and involves sending n in-sequence TCP data packets and counting the number of ACKs received. These are the probes of the network loss rate. The second phase is the hole-filling phase, which discovers which of the packets sent in phase one were lost. A new packet is sent that has a sequence number one greater than the last packet sent in the data-seed phase. If the target responds with an ACK for this packet, then no packets have been lost. If any were lost there will be a "hole" in the sequence space, and the target will respond with an acknowledgment indicating exactly where the hole is. This is filled with each subsequent retransmission, and a lost packet is recorded.

Using fast retransmit, which imposes upon the receiver the responsibility of sending an ACK for every packet that is out of sequence, can optimize this. Skipping the first packet will force an ACK for every packet sent. A second tweak involves the transmission of packets that differ by only one byte and thereby optimize the use of the receiver buffer. Firewalls and load balancers can become problematic when they send unwanted resets that would disrupt this metric, so they were kept at bay by advertising a zero-sized receive buffer that prevented them from sending. The findings Savage reported indicate that the forward packet loss rate is much less than the reverse packet loss. He felt that this asymmetry is due to the large differential in data transmission in the reverse versus the forward direction.

JPEG Compression Metric as a Quality-Aware Image Transcoding
Surendar Chandra and Carla Schlatter Ellis, Duke University

Transcoding is a generic term for any transformation process that converts a multimedia object from one form to another. The goal of this work was to increase the effectiveness of the transcoding technique applied to Internet data access. With the use of JPEG images, the efficacy of transcoding was assessed to arrive at a "quality-aware transcoding" metric that explicitly trades off image information with reductions in object size and/or clarity.

Surendar Chandra presented techniques to quantify the quality-versus-size tradeoff characteristics for transcoding JPEG images. He analyzed the characteristics of images available in typical Web sites and explored how to perform informed transcoding using JPEG compression. The effects of this transcoding on image storage size and image information quality were then demonstrated. He also presented ways of predicting the computational cost as well as potential space benefits achieved by the transcoding. He felt these results will be useful in any system that uses transcoding to reduce access latencies, increase effective storage space, and reduce access costs.

Proxy Implementation

Secondary Storage Management for Web Proxies
Evangelos P. Markatos, Manolis G.H. Katevenis, Dionisis Pnevmatikatos, and Michail Flouris, ICS—FORTH

Summarized by Rick Casey

Disk I/O is a known factor in limiting Web-server performance, contributing as much as 30% to total hit response time. A primary reason is that each URL in Web caches is stored in a separate file. An obvious method of improving system performance would be to reduce the overhead associated with file maintenance. The authors proposed a storage-management method, called BUDDY, for storing several URLs per file. By identifying URLs of similar size ("buddies") and storing them in the same file, disk I/O is reduced. Although BUDDY reduces file-management overhead, it makes no effort to reduce disk-head movement induced by write operations to various "buddy" files. To improve write throughput, the authors proposed STREAM, which, in addition to storing all URLs in a single file, writes URLs contiguously in this single file, reducing the number of disk-head movements (much as log-structured filesystems do). A third suggestion was to improve read throughput by clustering read operations together (LAZY-READS). Finally, to restore the locality present in a client request stream, the authors proposed to use locality buffers (LAZY-READ-LOC), which attempt to store URLs requested contiguously by a given client in contiguous file locations.

The results were tested with a combination of trace-driven simulations and experimental evaluations. Traces from DEC were used to compare Squid's file management method, BUDDY, STREAM, LAZY-READS, and LAZY-READS-LOC. The conclusion was that disk-management overhead can be reduced by as much as a factor of 25 overall by using these algorithms. Because disk bandwidth will improve faster than disk latency, the authors believe such algorithms will be an increasingly valuable means of improving Web-server performance.

More information is available at <https://archvlsi.ics.forth.gr>.

Compression Proxy Server: Design and Implementation
Chi-Hung Chi, Jing Deng, and Yan-Hong Lim, National University of Singapore

Automatic and optimized data compression of Web objects was examined as a means of improving server performance and reducing use of network bandwidth. The authors acknowledged that with faster, higher-capacity systems, where the compression-to-transfer-time ratio is higher, there is less need for compression. Still, there are many portions of the Internet where better compression on a proxy server would help overall network latency. Compression can be either explicit (decompressed at the client) or implicit (compressed and decompressed at the server). The problem of automatic compression is constrained by the HTTP protocol, the many file types of Web objects, and their varying sizes. Therefore, accurate, rapid classification of these objects is needed to select the best compression algorithms. Compression can be performed on an entire file, a data block (a single Web object), a data stream, or not at all. The benefit of compression must be considered with respect to the added overhead.

Implementing the compression methodology encountered three design issues: encoding of compression messages, memory allocation, and choice of a data structure. The Squid proxy server was modified to test the methodologies, using a trace collected at a Singapore college over a year. Experimental results revealed the distribution of Web objects (file types) by total bytes to be: GIF image 33%, JPEG image 12%, text 31%, and octet-stream — binaries of MPEG, MIDI, or other applications — 24%. File sizes within these categories were recorded. Results of compression effectiveness were "highly encouraging." Bandwidth saving was measured by file type and size; whole file compression was the highest, at 37%. Overall, about 30% of bandwidth was saved in this experiment, and compression/decompression was less than 1% of Web access latency (even on an "outdated" proxy server). The authors conclude that such Web-server compression is worthwhile and should be considered as a bandwidth-saving mechanism, particularly since it could cooperate with other techniques.

On the Performance of TCP Splicing for URL-Aware Redirection
Ariel Cohen, Sampath Rangarajan, and Hamilton Slye, Bell Laboratories, Lucent Technologies

This research examined a software switch that supports URL-aware redirection of HTTP traffic, known as "content-smart switching," using TCP splicing. The purpose of the splicing is to improve the performance of the switch. Ariel Cohen pointed out that while several vendors are beginning to announce such switches, little or no implementation or performance information is available. It was also noted that a hardware-based URL-aware switch has been reported by IBM researchers.

The switching functionality was implemented using a loadable module in the Linux kernel. A user-level proxy accepts connections from clients and decides which server will receive incoming requests. The proxy then removes itself from the data path by requesting the kernel to splice the TCP connection between the client and the proxy with the connection between the proxy and the server. The loadable module is actually two components: sp-mod, which monitors the connection, and NEPPI (Network Element for Programmable Packet Injection), which performs low-level header modifications. This worked with the Linux ipchains firewall to filter packets.

Performance results were tested using the WebWatch HTTP generator on five clients. The experiments were run for three minutes each with a concurrency setting of 75 in a thread pool of size 30. The servers and clients were fast PCs (400-550MHz) connected over Fast Ethernet on Lucent's intranet. Performance impacts were observed with and without TCP splicing. In all cases TCP splicing resulted in a significant performance improvement. At the average Web object size of 10KB, there was a 58% increase in connections and a 38% decrease in CPU utilization. Performance gains were, of course, more striking for larger objects.

Prefetching

Prefetching Hyperlinks
Dan Duchamp, AT&T Labs — Research

Summarized by Rick Casey

This paper, which developed a new method of prefetching Web pages into the client cache, won the Best Paper award for the conference. Dan Duchamp did an excellent job of presenting the highlights of his research without bogging the audience down in the details, and he honestly revealed where he came up short.

Duchamp began with two basic premises: (1) the next URL to be requested by a client is likely to be one embedded as a hyperlink in one of the last few pages requested by that client, and (2) past access patterns of a large population of clients are likely to be relevant to a particular client.

The basic method is:

1. The client sends to a page's server a record (called a "usage report") detailing which of that page's hyperlinks it referenced.

2. The server aggregates such information from many clients.

3. When responding to a GET, the server attaches a summary (called a "usage profile") of past usage reports for that page.

4. On the basis of a page's usage profile, the client decides whether to prefetch any of its hyperlinks. Usage reports and profiles are passed via a new HTTP extension header.

The paper presents a brief but comprehensive summary (not included in the presentation) of related research in the extensive area of prefetching, which falls into three categories: software systems; algorithms, simulations and/or prototypes; and methods establishing bounds. The features distinguishing Duchamp's method from previous work were: it has been implemented; how information on access patterns is shared by the server over clients; occurs in near-realtime; is client-configurable; many previously uncachable pages can be prefetched; both client and server can cap operations to limit impact on overhead and bandwidth; and it operates as an HTTP extension.

The overall results were very positive: client latency improved greatly (slightly over 50%), and less of the cache was wasted (about 60% of prefetched pages were eventually used).

Both client and server modifications can be implemented as proxies, eliminating the need to alter browsers or Web servers; however, there are disadvantages to a client proxy.

Other "gotchas" were: time-dependent accesses; objects set with zero expiration time; inaccessible HTML; sabotage (using prefetching to overload the network); privacy concerns; and the fact that usage patterns are beginning to have commercial value, raising payment issues.

The server-side implementation is a proxy based on W3C's HTTPd. Two client-side implementations exist: a modification of the Mozilla browser from Netscape and a proxy based on HTTPd. Performance was evaluated for prefetch accuracy, client latency, network overhead, program space overhead, and program time overhead.

Mining Longest Repeating Subsequences to Predict World Wide Web Surfing
Jim Pitkow and Peter Pirolli, Xerox PARC

This was a somewhat theoretical examination of the topic of predicting user "surfing paths," the sequence of Web pages that a given user will visit within a given Web site. The goal of the research was to develop a model that had limited complexity while retaining high predictive accuracy. The utility of predicting users' surfing has applications in improved searching, better recommendations for related sites, latency reduction through prefetching, and Web-site design.

Various Markov models were compared to assess their ability in pattern extraction and pattern matching. Two techniques were motivated, longest repeating subsequences (LRS) and weighted specificity. LRS is a means of identifying the most information-rich subsequences in navigation log files. This data was then decomposed into several different Markov models to compute conditional probabilities for sequential transitions; that is, if a user is on a page, what is the probability of the user clicking any of the available links? The models had to be compact enough to be of practical use. These models were about 130KB in size, small enough to reside in each thread of a Web server. Using a single data set from Xerox, the models were able to predict the correct sequence 27—31% of the time. The speaker cautioned that these results are tentative and need to be corroborated by future work. Finally, he presented a picture of "information scent," a visualization of user paths within a given Web site, with examples of "good info scent" and "bad info scent." While the algorithm producing the visualization was not discussed, it was presented as an alternative model to determine what information to prefetch to users.

Architectures

Active Names: Flexible Location and Transport of Wide-Area Resources
Amin Vahdat, Duke University; Michael Dahlin, University of Texas at Austin; Thomas Anderson and Amit Aggarwal, University of Washington

Summarized by Rick Casey

This paper described a new framework for organizing and delivering distributed services over the Internet, called Active Names. The research is motivated by the fact that Internet services are increasingly distributed across various machines, but the limitations imposed by DNS (Domain Name Service) have resulted in many confusing suggestions for improving it. Active Names is meant to be a general design solution that encompasses much previous research on extending DNS. Its main points are (1) it provides a flexible end-to-end naming abstraction for WAN services and (2) it provides a framework for composing customizations provided by both clients and servers. The benefits would be increased network performance (reduced client latency) and a standard, unified approach to operating networked services.

The key concepts in the design are active names, namespaces, delegation, and after-methods. Each active name identifies a name and a namespace in which that name should be interpreted, and each such namespace is embodied by a Namespace Program. Unlike URLs — which map to a specific IP address and specify where a service will be run — namespace programs are location-independent. Namespace programs accept incoming data, determine the next namespace where output will be sent, and construct an after-methods list to send with the output. The interface to namespace programs facilitates composability (the ability of one namespace to call other namespaces) in two ways: (1) through delegation, where one namespace passes a name to a sub-namespace, and (2) through after-methods that specify a chain of Active Name services remaining to be run to finish name resolution for a request. These namespace programs execute within a resolver virtual machine that provides security and limits resource use.

The authors have demonstrated a fully functional core of the system and have built several useful applications. In response to questions, the authors explained that the system provides basic facilities on which applications enforce security and provide end-to-end fault tolerance, and that providing higher-level support for security and fault tolerance would be useful future work. The test system was built at Texas, Duke, and Washington using Java, and results indicate that Active Names can significantly reduce client latency in distributed ser-vices, in one case providing a fivefold reduction.

The kinds of questions raised by distributed processing have created an area of active research, including Active Services, Active Networks, Intentional Name System, and Transaction Processing monitors. However, the authors believe each of these has limitations that Active Names overcomes.

More information is available at <https://www.cs.utexas.edu/users/less/bb/>.

Person-level Routing in the Mobile People Architecture
Mema Roussopoulos, Petros Maniatis, Edward Swierk, Kevin Lai, Guido Appenzeller, and Mary Baker, Stanford University

A platform for truly mobile, person-centric communication was the topic of this presentation. The goals are simple: maintain person-to-person reachability, protect privacy, and be deployable within the existing infrastructure. The primary focus is a merging of Internet and telephony communications. The Mobile Person Architecture (MPA) basically depends on routing all communications to a personal proxy, which acts like a router between the person it serves and any incoming communication. The proxy is a trusted software daemon under the control of the user, who tells it where he or she will be and how to respond to any communication. The personal proxy cooperates with a tracking agent, a rules engine, and a dispatcher. How these components were implemented in Java was described. The dispatcher is responsible for content conversion, which ensures that content arrives in a suitable form depending on where a person is at the time. The proxy is designed to be as easy to install and operate as any Web service, to help ensure its success (though no market testing of this has been done). Related research projects were described: cellular phone projects in Japan, the Iceberg project, the TOPS architecture, the SPIN project by the Canadian National Research Council, and transcoding proxies. All these have shortcomings when compared to MPA, which has an API that allows future extensions to incorporate any new communication service. It is interesting to note that the research was supported by a group of Japanese organizations, including NTT Mobile Communications Network, Inc., a phone company. More information is available at <https://mosquitonet.stanford.edu/>.

A User's and Programmer's View of the New JavaScript Security Model
Vinod Anupam, David M. Kristol, and Alain Mayer, Bell Laboratories, Lucent Technologies

This was a straightforward examination of the security weaknesses of JavaScript and of how the author and his team implemented a new security model using the public-domain Mozilla source code. The improvements they made are likely to have been implemented in Navigator 5.0, which was scheduled for release in late 1999.

JavaScript, of course, is the general-purpose scripting language invented at Netscape that runs within a browser. Meant for manipulating objects within the browser environment, it offers an adversarial programmer the means of attacking the client system. The presentation focused on the features of their new security model. This is based on two basic components: access control, which regulates what data a script can access; and trust management, which regulates how trust is established and terminated. The security policy is configurable to a great extent by the end user, from very strict to relaxed, and offers access to low-level settings or acceptance of predefined policies. This contrasts sharply with the current situation, in which a user can choose only to turn JavaScript on or off. A security policy can also be set at the organization level and installed via a service integration. For each type of security violation, the user can define what action should be taken — whether to stop, continue, or deny the requested access.

The programmer's view of the new model was described, with code snippets illustrating how security policy is implemented in trust management. The utility of this feature was shown in an e-commerce example which requires automated cooperation between business sites. The methodical process by which the authors tested their new security layer was described. The addition of the document.ACL attribute, a key innovation in the new model, is currently before the W3C as a proposed standard. The new security model has been offered to the Mozilla open-source-development community for scrutiny before its implementation by Netscape.

More information is available at <https://www.mozilla.org/projects/security>.

Works-in-Progress Reports

PerDiS: Persistent Distributed Store
Marc Shapiro, INRiA Rocquencourt and Microsoft Research Cambridge

Summarized by Steven Bird

Marc Shapiro described a persistent distributed store or Internet caching for cooperative engineering. It exports the abstraction of a shared memory across the Internet. Shapiro believes that PerDiS is particularly simple to use, claiming that large centralized programs (including a 400,000-line CAD tool) have been ported with relative ease. Application programs allocate arbitrary objects inside clusters (i.e., files), and objects refer to one another with native pointers. Between an application program and the shared store, writes are buffered in a transactional log. This allows engineers to work on shared designs without interference. PerDiS has two major modes of operation. In a LAN, the store is kept coherent, whereas sharing over a WAN follows a check-in/check-out mode. PerDiS is open source. <https://www.perdis.esprit.ec.org/>

PaperFinder
Athanasios Papathanasiou
<papathan@cs.rochester.edu>

Scientists always need to stay informed about developments in their fields. The increasing number of printed and electronic papers makes it increasingly difficult for a single person to keep up with all the relevant information that she or he might be interested in. There are simply too many sources of (potentially) useful information, many more than any single person has the time to track. This project developed PaperFinder, a tool that continually searches digital libraries of scientific publications, filters the relevant papers, and delivers them to interested scientists through a friendly user interface.

USEwebNET
Athanasios Papathanasiou
<papathan@cs.rochester.edu>

When a user wants to find information about a specific topic, he or she sends a query to a search engine (e.g., AltaVista), which replies with several URLs. Every time the user wants to find new information about the same topic, AltaVista returns the same URLs, flooding the user with unnecessary information. USEwebNET is designed to relieve users from the long waits and information flood associated with the traditional search model. Specifically, USEwebNET is a network tool with a user-friendly interface designed to retrieve documents about selected subjects (or updated versions of selected documents) from the Web and present them to the user along with information about them, following the user's preferences.

Improving Web Searching Performance Using Community-based Filtering
Liddy Shriver
<shriver@research.bell-labs.com>

Members of a community with shared interests search for similar things on the Web. Shriver and her group are employing community-based filtering to use the results of successful past searches by members of a community to guide new searches. They analyzed logs from a Web proxy server and found that searches done by members of a community are often repeated. Her group developed a prototype search assistant, SearchLight, which augments existing search engines by offering hints based on these past searches. Their analysis shows that SearchLight will offer hints 20% of the time and in some cases will decrease search time significantly.

Distributed Object Consistency Protocol
John Dilley
<jad@pimlico.hpl.hp.com>

The Distributed Object Consistency Protocol provides for stronger object consistency in Web proxy cache servers than HTTP can currently deliver. Dilley's simulation of the protocol showed that it can deliver content to users with lower response time while consuming fewer origin server and network resources than caches using the traditional Alex consistency protocol.

The Flash Web Server

Vivek Pai
<vivek@cs.rice.edu>

The Flash Web server is a high-performance Web server developed using a novel concurrency architecture. Flash combines the high performance of single-process event-driven servers on cached workloads with the performance of multiprocess and multithreaded servers on disk-bound workloads. Pai has found that the Flash Web server is easily portable, since it achieves these results using facilities available in all modern operating systems.

Webcard: A Java Card Web Server
Peter Honeyman
<honey@citi.umich.edu>

Webcard is a TCP/IP stack and Web server written in Java that runs on a Schlumberger Cyberflex Access smartcard 16KB eeprom and 1.2KB of RAM. ISO 7816 Smartcard and Java Card 2.0 compliant, Webcard handles one connection at a time and has minimal state maintenance (filename and TCP port are it) and three states (listen, established, and finword 1). It uses no options, no retransmissions, no checksums — who needs them when you use the sequence number as a file offset? — and no returns. Webcard only supports IP with a 250 byte mtu. Try it yourself at <https://smarty.citi.umich.edu/>.

Defeating TCP Congestion Control in Three Easy Steps
Stefan Savage
<savage@cs.washington.edu>

How to coerce a remote Web server to send at any rate you choose:

1. ACK Division. In TCP the sending point increases its congestion window by one segment with each successive ACK it receives. Action: Send multiple ACKs regardless of the packets received, up to 1,500 per packet,
and watch your window grow quite rapidly.

2. Duplicate ACK Spoofing. TCP recovery after three duplicate ACKs at a sender involves retransmitting the packet and increasing the congestion window by one packet per duplicate ACK received. Action: Send a stream of duplicate ACKs, and for every additional duplicate ACK sent the congestion window is grown by one, which allows you to control the size of your window and thus the rate of transmission.

3. Optimistic ACKing. This involves sending an ACK for a packet 35 msec early, with the consequence being that the sender will send subsequent packets early.

The above strategies are all implemented in the TCP Daytona that Savage put together using fewer than 75 lines of code on Linux. Those who are interested can see the full article, "TCP Congestion Control with a Misbehaving Receiver" in the October '99 ACM Computer Commu-nications Review.

Automating Usability Assessment for Information-centric Web Sites
Marti Hearst
<hearst@sims.berkeley.edu>

Hearst is investigating the use of simulation of user behavior relative to the usability of a given Web page's content and structure. This is anticipated to permit designers to choose among design alternatives before implementation. Modeling tools used have included trace-driven discrete event modeling and Monte Carlo simulation.

Appliance Data Servers
Armando Fox
<fox@cs.stanford.edu>

Fox is exploring how to connect input-centric consumer devices (digital cameras, handheld scanners, etc.) to the Internet service infrastructure, while maintaining a "no-futz, point and squirt" user experience. These devices are intended to allow users to inject data into the infrastructure — for example, a digital camera that uploads images to a Web page or automatically emails them to mom. Key obstacles to achieving this goal are finding a way to attach metadata to the input data and modifying the default action of your device. Currently Fox is developing an info-daemon that embodies a protocol gateway and verb extractor that behaves in a protocol- and device-specific fashion. It will figure out how to extract a verb that accompanies the data from one of these devices and canonicalize the data and the verb into a yet-to-be-determined format. The rest of the infrastructure would use this within a fixed piece of software (with the exception that you may want to plug in additional modules later on). An example was shown, a digital camera that has an IR port annotated with the command "Send this to mom." The first thing the info-daemon does is to look up the verb and convert to a canonical form with a browser edit option. Then the command is entered into the service infrastructure. The current prototype accepts PalmPilot IR and HP JetSend.

Caching Policies

Using Full Reference History for Efficient Document Replacement in Web Caches
Hyokyung Bahn, Seoul National University; Sam H. Noh, Hong-Ik University; Sang Lyul Min and Kern Koh, Seoul National University

Summarized by Rick Casey

This project focused on a better algorithm for directing the updating of documents in Web caches. It seeks to improve on previous algorithms by having the ability to optimize on any performance measure. The algorithm, Least Unified Value (LUV), uses the full reference history of documents to estimate the probability of being rereferenced.

Web caches have been much studied by the research community, but the authors believe the LUV algorithm is best to use in a Web cache replacement policy. According to their research, it offers the best overall effectiveness, performance, and robustness. The algorithm is basically a cost value computed for each cache document. The value is a weighted average of the document's reference potential multiplied by its weight, where weight is a function of cost and size. Reference potential is the likelihood of rereference, a function of past references. Cost can be considered in several ways, depending on the performance measure in which you are interested.

Since the LUV algorithm makes use of full reference history for each cache document, it might seem a burden to implement. But the speaker offered a proof that collapses the computation of the cost component. Implemented in a heap, this reduced computation of LUV to time O(log(2) n), where n is the number of cached documents. Experiments were done using traces from NLANR and DEC, though filtering was done on UDP and CGI requests and on requests larger than the size of the cache. Performance was compared to the nine cache algorithms for hit rate, byte hit rate, and delay-savings ratio. In most cases, the authors conclude that LUV outperformed all other algorithms irrespective of cache size. This algorithm considered only in-cache documents; future research will consider a perfect-history LUV, which includes replaced objects, for possible performance improvements.

Providing Dynamic and Customizable Caching Policies
J. Fritz Barnes and Raju Pandey, University of California at Davis

This paper investigated an infrastructure that allows customization of Web caching polices at the client, proxy, and/or server. The research is motivated by the fact that current Web-caching polices act on all Web objects uniformly; given the diversity of Web objects, cache performance could be enhanced by customization of the cache policies.

The research presented an object-oriented analysis of cache objects, specifying how customization policies can be applied; namely, in prefetch, routing, placement, coherency, removal, and "miscellaneous," the last category intended to address any protocol extensions. Implementation of the policies is accomplished by CacheL, a domain-specific language based on cache events developed by the authors. Two caching systems were built to evaluate the effectiveness of these ideas: DavisSim, an event-based cache simulator based on the Wisconsin Cache Simulator, and PoliSquid, an extension of the popular Squid Web cache. Analysis of the performance focused on whether caches benefit from customization and what overhead it demands. The analysis was broken down by client-customized, cache-customized, or server-customized polices. Results indicated that implementation was feasible and advantageous. Overhead was moderate, estimated at an 8.5% increase in latency without optimization. The possibility of cache-policy customization was demonstrated, but firmer evidence of performance improvements awaits future work.

More information is available at <https://pdclab.cs.ucdavis.edu/qosweb/CacheL>.

Exploiting Result Equivalence in Caching Dynamic Web Content
Ben Smith, Anurag Acharya, Tao Yang, and Huican Zhu, University of California at Santa Barbara

This work presented a proposal for a new protocol for enhancing Web caching and a prototype for implementing such a protocol. The basic idea is to identify query requests that have essentially equivalent or similar results and to ser-vice these subsequent requests from the cache. The usefulness of this is most apparent in image maps and queries conditionally qualified over some range.

The protocol, called Dynamic Content Caching (DCCP), classifies Web client requests according to three types of locality: identical, equivalent, or partially equivalent requests. Currently, Web cache hits can only identify requests with the same URL (identical request). DCCP goes further by allowing identification of requests with identical content (equivalent), or content that can serve as a temporary placeholder for a request (partially equivalent). This is accomplished by an extension mechanism in HTTP 1.1 for cache control directives.

Examples were shown using image maps, a weather service that uses ZIP codes to qualify queries, and a news service applicable to geographic regions. These are equivalent or partially equivalent requests that can be exploited by DCCP, implemented using the Swala cooperative Web cache. Evaluations were based on cache hit ratio and generated traffic using two real traces and one synthetic trace. Results were promising. For a map retrieval trace with three levels of error tolerance in matching, hit ratios can reach over 60% at a 10% error tolerance. The authors acknowledge that DCCP has a memory overhead cost, but this can be controlled by imposing a bound. They encountered difficulty in implementing efficient search when using complicated string-matching, and also did not address POST-based queries, which they plan to study in future work.

More information is available at <https://www.cs.ucsb.edu/research/swala>.

Server Implementation

Efficient Support for Content-based Routing in Web Server Clusters
Chu-Sing Yang and Mon-Yen Luo, National Sun Yat-Sen University

Summarized by Rick Casey

This paper explored the advantages of a clustered Web server that uses a new content-aware request-routing. It offers a survey of the clustered-server approach to servicing high-traffic Web sites and suggests improvements in routing requests to improve overall performance via content-aware processing.

Using a clustered-server approach at high-traffic Web sites, where the initial node directs requests to specialized servers, is an advantageous approach undisputed in the literature. The basic methods for accomplishing this were summarized: client-side, DNS-based, TCP connection routing, and HTTP redirection. The issues ignored by these approaches that the authors identified — and that their research addresses — were: session integrity, load balancing, differential services, and content deployment.

The research focused on the difficulties that the TCP protocol imposes on a ser-ver-directed solution, primarily the difficulty of migrating the established connection. In their design, the dispatcher decides how to route on the basis of the data structures for a cluster table, a mapping table, and a URL table. The dispatcher maintains an awareness of the TCP connections, releasing them when necessary. Since the overhead of a new connection is prohibitive, the dispatcher conveys packets to the backend servers, modifying the packet IP and TCP headers before forwarding.

The design was implemented in a loadable module for the Linux kernel. WebBench was used to evaluate performance on a heterogenous collection of back-end servers. Compared to a "content-blind" server cluster, the content-aware cluster had greater throughput, averaging about 20MB/sec more after 16 client connections. The authors acknowledged that the extra overhead, limited scalability, and the dispatcher as a single point of failure were drawbacks to their approach. The advantages were higher performance, better routing decisions, and general content-aware intelligence that might be useful in future configurations.

Rapid Reverse DNS Lookups for Web Servers
William LeFebvre, Group Sys Consulting; Ken Craig, CNN Internet Technologies

This paper presented a mechanism that supplants the usual way Web servers find which clients are contacting them using the Domain Name Service (DNS). Identifying a client is important for targeted dynamic content (e.g., Web-page ads) for advertising-supported Web sites. Conventional DNS lookups are prohibitively slow for busy Web servers; thus rapid reverse DNS lookups could have great significance to advertisers. The author's team implemented this design for CNN, one of the most heavily used news sites on the Internet, where it has exceeded expectations since its implementation in March 1999.

Surprisingly little other work has been in this area. The design depends heavily on the multithreaded capabilities of Netscape's enterprise server API. Basically, a Rapid DNS server is placed between the (modified) Web server and the conventional DNS server. Quick answers are provided to its client Web servers on the front end (about 2 milliseconds or better, on average), using a bucket hash keyed on IP address. The Rapid DNS server makes periodic queries to the true DNS server off the back end. The connection from front to back is through a fixed-sized stack called a "leaky bucket" because of its LIFO design, which drops an increasing backload of requests off the end. Negative caching, whereby a cache entry is maintained for unknown domain names, significantly improved cache hit rates.

A trio of Rapid DNS servers was used to support the CNN Web farm of about 60 Web servers. Performance results were impressive: even with over 250 client connections, servers sustained queries in excess of 400 operations per second. Future research will investigate different queuing and caching policies. The code for this project was developed for CNN and remains proprietary.

Connection Scheduling in Web Servers
Mark E. Crovella and Robert Frangioso, Boston University; Mor Harchol-Balter, Carnegie Mellon University

This research approached a common goal — improving Web-server perfor-mance — through a novel approach: applying scheduling theory to Web-server design and operating-system architecture. Task scheduling is always a consideration for the CPU, the disk I/O, and the network interface, normally all under control of the operating system. Using a simple idea from scheduling theory, the paper proposed placing scheduling more under application control, using a new algorithm based on the idea of "shortest-connection-first." Basically, this means that for tasks where the size is known, it is best to schedule shorter tasks earlier; in practice, the metric for this translates into Shortest Remaining Processing Time (SRPT). The detailed analyses of this idea suggested that a four- to fivefold increase in throughput could be achieved without penalizing longer tasks.

Implementing the idea was problematic because task scheduling is traditionally under the control of the operating system, not the application. Implementing the idea without modifications of the kernel was a design goal of the experiments. The technique used here was to control concurrency (i.e., task scheduling) by varying the number of threads allowed in each connection pool. This heavily influences, but does not control, task scheduling within the kernel. This approach allows the technique to be more easily tested and implemented by others.

The SURGE software was used to simulate workloads of 400 to 2,000 User Equivalents (light to heavy). The response performance (mean transfer time) was then recorded, primarily using Apache server version 1.2.5 under Linux 2.0.36, with size-independent scheduling versus SRPT scheduling. All indications were that SRPT scheduling matters very significantly. Overall throughput was improved in all tests, and not at the expense of longer tasks. This somewhat surprising result is explained by the average file-size distribution of Web requests being heavily skewed toward smaller file sizes.

These promising results encourage further research, which will be directed toward servers allowing more precise scheduling control and dynamically adjustable job priorities.

2nd USENIX Symposium on Internet Technologies & Systems (USITS '99) October 11—14, 1999 BOULDER, COLORADO, USA

2nd USENIX Symposium on Internet Technologies & Systems (USITS '99)
October 11—14, 1999
BOULDER, COLORADO, USA