2nd USENIX Symposium on Internet Technologies & Systems (USITS '99)
Our thanks to the summarizers:
E-Commerce--An Optimistic View
Udi Manber, Yahoo! Inc.
Dr. Udi Manber posed the rhetorical question, "Will e-commerce change the world?" Using the meteoric rags-to-riches success that Yahoo! embodies, Manber described the transformation from bricks and mortar (BM) that e-commerce enables. The end result is still quite debatable, but Manber believes that the winners will share several traits. One of these traits will be the ability to provide a one-stop shop that will satisfy the bulk of visitors' needs. The winners will also realize that these needs are not strictly material, and that the direct translation of a BM operation to an online presence will prove to be tragically shortsighted. Manber believes that a successful online strategy must include an abundance of means by which the users can establish community. He described the development of email, chat rooms, clubs, and message boards as all being critical elements in the success of Yahoo! In addition, an intuitive user interface (20,000 help page hits out of a total of 200 million accesses) shows Yahoo!'s success here will be crucial.
Manber finished with a few speculations on what the future might hold. One prediction is that online advertisements will evolve from their present annoyance status to a source of useful connections and resources. His contention is that the ad strategy of today is equivalent to fishing with dynamite. A second and even more novel idea is a "Universal ID" that everything carries and that a person could "click" on to buy. The product would then be mailed to the purchaser and the owner of the item that had been clicked would receive a commission.
Udi Manber, winner of the 1999 Annual Software Tools Users Group award, is chief scientist at Yahoo! Before joining Yahoo! in 1998, he was a professor of computer science at the University of Arizona. He has written more than 50 technical articles (three of which won USENIX Best Paper awards), codeveloped Agrep, Glimpse, Harvest, and the Search Broker, and wrote a popular textbook on design of algorithms.
Scalable Web Caching of Frequently Updated Objects Using Reliable
Dan Li and David R. Cheriton, Stanford University
Dan Li presented a method to address the issue posed by frequently
updating objects in a Web cache. To avoid the repeated unicast that
this would require, she proposes the use of MMO (multicast invalidation
followed by multicast delivery using OTERS) to avoid the negation of
the benefits provided by multicast. This is achieved by grouping
objects into volumes, each of which maps to one IP multicast group. The
benefit from reliable multicast, with volumes of appropriate size, were
shown to outweigh the cost of delivering extraneous data. Li
demonstrated the scalability of this approach using trace-driven
simulations. The bandwidth saving vis-à-vis conventional
approaches increased significantly as the audience size grew. Li
presented a strong argument that MMO provides efficient bandwidth
service scalability. This should help to make strong Web-cache consistency for dynamic objects practical.
Hierarchical Cache Consistency in a WAN
Jian Yin, Lorenzo Alvisi, Mike Dahlin, and Calvin Lin, University of Texas at Austin
Organization-Based Analysis of Web-Object Sharing and Caching
Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Molly Brown, Tashana Landray, Denise Pinnel, Anna Karlin, and Henry Levy, University of Washington
The results demonstrated a surprising lack of sharing within the organizations they delineated (~2% over random). In addition, there was an overarching commonality between the organizations studied in that there were 850 top servers handling over 50% of the requests analyzed. Also, lots of content is uncacheable. In the question-and-answer period following the presentation, it was speculated that the university is perhaps more homo-geneous than was initially appreciated.
The Ninja Jukebox
Ian Goldberg, Steven D. Gribble, David Wagner, and Eric A. Brewer, University of California at Berkeley
When presented with the appalling waste of resources represented by the numerous unused CD-ROM drivers at the UC Berkeley computer lab, the Ninja group sprang into action to create a realtime streaming directory of audio from these sites. After they wrote the porting software for streaming delivery, the MP3 revolution arrived and the Ninja Jukebox idea was spawned. The goal was to develop a service that allowed a community of users to build a distributed, collaborative music repository to deliver digital music to Internet clients. The project's success led to an abundance of music, 17 days' worth, and the associated copyright and filtering obstacles this presents. Interface-development efforts were then focused on the development of simple collaborative filtering based on users' song preferences and ownership. The Jukebox, implemented in Java, was designed to allow rapid service evolution and reconfiguration, simplicity of participation, and extensibility. DJ software was developed to profile the users' preferences in a portable and selectively accessible manner. Presenter Steven Gribble concluded with the assertion that the careful use of a distributed component architecture enabled rapid prototyping of the service. He also felt that use of carefully designed, strongly typed interfaces enabled the smooth evolution of the service from a simple prototype to a more complex, mature system. Future modifications include the possibility of using digital cash to provide general access. He is optimistic that the newer version of Java will operate more efficiently and meet the bandwidth demands more gracefully.
Cha-Cha: A System for Organizing Intranet Search Results
Michael Chen, Marti Hearst, Jason Hong, and James Lin, University of California at Berkeley
Michael Chen presented a novel search engine that is based on the premise that intranets contain information associated with the internal workings of an organization. This engine, named "Cha-Cha," organizes Web search results in a manner that reflects the underlying structure of the intranet. This "outline" is created by first recording the shortest paths in hyperlinks from root pages to every page within the Web intranet. After the user issues a query, these shortest paths are dynamically combined to form a hierarchical outline of the context in which the search results occur. Pilot studies and user surveys suggest that some users find this structure more helpful than the standard display for intranet search. Currently a quarter of a million pages are indexed. More information is available at <http://cha-cha.berkeley.edu/>.
A Document-based Framework for Internet Application Control
Todd D. Hodes and Randy H. Katz, University of California at Berkeley
Instead, Hodes and Katz interposed a middleware layer between client applications and services so that invocations between the two can be transparently remapped, and have found this useful for a subset of application domains, including one example domain of "remote control" of local resources (e.g., lights, stereo components, etc.). Hodes went on to illustrate how the framework allows for (1) remapping of a portion of an existing user interface to a new service, (2) viewing of arbitrary subsets and combinations of the available functionality, and (3) mixing dynamically generated user interfaces with existing user interfaces. The use of a document-based framework in addition to a conventional object-oriented programming language provides a number of key features. One of the most useful is that it exposes the mappings between programs/UI and the objects to which they refer, thereby providing a standard location for manipulation of this indirection.
Sting: A TCP-based Network Measurement Tool
Stefan Savage, University of Washington
The tongue-in-cheek theme of Stefan Savage's presentation was that TCP represents an "opportunity" rather than a transport protocol. The novelty of this perspective led to some creative developments and garnered Savage the Best Student Paper award. Savage developed Sting, a tool to quantify one-way packet loss. This feature is not available in ping. Sting is able to accurately measure the packet-loss rate on both the forward and reverse paths between a pair of hosts. This achievement is accomplished by leveraging the behavior of TCP.
In TCP one knows the number of packets sent and the number received. This is enough for ping to work, but determining one-way packet loss requires more information. First, you need to know how many data packets were received at the other end. TCP has to know this, given that it is a reliable protocol. The second required piece of information is the number of ACKs that were sent to you. ACK parity requires that for every data packet received there is an ACK sent. Savage proposed a two-phase algorithm. Phase one is the data-seed phase and involves sending n in-sequence TCP data packets and counting the number of ACKs received. These are the probes of the network loss rate. The second phase is the hole-filling phase, which discovers which of the packets sent in phase one were lost. A new packet is sent that has a sequence number one greater than the last packet sent in the data-seed phase. If the target responds with an ACK for this packet, then no packets have been lost. If any were lost there will be a "hole" in the sequence space, and the target will respond with an acknowledgment indicating exactly where the hole is. This is filled with each subsequent retransmission, and a lost packet is recorded.
Using fast retransmit, which imposes upon the receiver the responsibility of sending an ACK for every packet that is out of sequence, can optimize this. Skipping the first packet will force an ACK for every packet sent. A second tweak involves the transmission of packets that differ by only one byte and thereby optimize the use of the receiver buffer. Firewalls and load balancers can become problematic when they send unwanted resets that would disrupt this metric, so they were kept at bay by advertising a zero-sized receive buffer that prevented them from sending. The findings Savage reported indicate that the forward packet loss rate is much less than the reverse packet loss. He felt that this asymmetry is due to the large differential in data transmission in the reverse versus the forward direction.
JPEG Compression Metric as a Quality-Aware Image Transcoding
Surendar Chandra and Carla Schlatter Ellis, Duke University
Surendar Chandra presented techniques to quantify the quality-versus-size tradeoff characteristics for transcoding JPEG images. He analyzed the characteristics of images available in typical Web sites and explored how to perform informed transcoding using JPEG compression. The effects of this transcoding on image storage size and image information quality were then demonstrated. He also presented ways of predicting the computational cost as well as potential space benefits achieved by the transcoding. He felt these results will be useful in any system that uses transcoding to reduce access latencies, increase effective storage space, and reduce access costs.
Secondary Storage Management for Web Proxies
Evangelos P. Markatos, Manolis G.H. Katevenis, Dionisis Pnevmatikatos, and Michail Flouris, ICSFORTH
Disk I/O is a known factor in limiting Web-server performance, contributing as much as 30% to total hit response time. A primary reason is that each URL in Web caches is stored in a separate file. An obvious method of improving system performance would be to reduce the overhead associated with file maintenance. The authors proposed a storage-management method, called BUDDY, for storing several URLs per file. By identifying URLs of similar size ("buddies") and storing them in the same file, disk I/O is reduced. Although BUDDY reduces file-management overhead, it makes no effort to reduce disk-head movement induced by write operations to various "buddy" files. To improve write throughput, the authors proposed STREAM, which, in addition to storing all URLs in a single file, writes URLs contiguously in this single file, reducing the number of disk-head movements (much as log-structured filesystems do). A third suggestion was to improve read throughput by clustering read operations together (LAZY-READS). Finally, to restore the locality present in a client request stream, the authors proposed to use locality buffers (LAZY-READ-LOC), which attempt to store URLs requested contiguously by a given client in contiguous file locations.
The results were tested with a combination of trace-driven simulations and experimental evaluations. Traces from DEC were used to compare Squid's file management method, BUDDY, STREAM, LAZY-READS, and LAZY-READS-LOC. The conclusion was that disk-management overhead can be reduced by as much as a factor of 25 overall by using these algorithms. Because disk bandwidth will improve faster than disk latency, the authors believe such algorithms will be an increasingly valuable means of improving Web-server performance.
More information is available at <http://archvlsi.ics.forth.gr>.
Compression Proxy Server: Design and Implementation
Chi-Hung Chi, Jing Deng, and Yan-Hong Lim, National University of Singapore
Implementing the compression methodology encountered three design issues: encoding of compression messages, memory allocation, and choice of a data structure. The Squid proxy server was modified to test the methodologies, using a trace collected at a Singapore college over a year. Experimental results revealed the distribution of Web objects (file types) by total bytes to be: GIF image 33%, JPEG image 12%, text 31%, and octet-stream binaries of MPEG, MIDI, or other applications 24%. File sizes within these categories were recorded. Results of compression effectiveness were "highly encouraging." Bandwidth saving was measured by file type and size; whole file compression was the highest, at 37%. Overall, about 30% of bandwidth was saved in this experiment, and compression/decompression was less than 1% of Web access latency (even on an "outdated" proxy server). The authors conclude that such Web-server compression is worthwhile and should be considered as a bandwidth-saving mechanism, particularly since it could cooperate with other techniques.
On the Performance of TCP Splicing for URL-Aware Redirection
Ariel Cohen, Sampath Rangarajan, and Hamilton Slye, Bell Laboratories, Lucent Technologies
The switching functionality was implemented using a loadable module in the Linux kernel. A user-level proxy accepts connections from clients and decides which server will receive incoming requests. The proxy then removes itself from the data path by requesting the kernel to splice the TCP connection between the client and the proxy with the connection between the proxy and the server. The loadable module is actually two components: sp-mod, which monitors the connection, and NEPPI (Network Element for Programmable Packet Injection), which performs low-level header modifications. This worked with the Linux ipchains firewall to filter packets.
Performance results were tested using the WebWatch HTTP generator on five clients. The experiments were run for three minutes each with a concurrency setting of 75 in a thread pool of size 30. The servers and clients were fast PCs (400-550MHz) connected over Fast Ethernet on Lucent's intranet. Performance impacts were observed with and without TCP splicing. In all cases TCP splicing resulted in a significant performance improvement. At the average Web object size of 10KB, there was a 58% increase in connections and a 38% decrease in CPU utilization. Performance gains were, of course, more striking for larger objects.
Dan Duchamp, AT&T Labs Research
This paper, which developed a new method of prefetching Web pages into the client cache, won the Best Paper award for the conference. Dan Duchamp did an excellent job of presenting the highlights of his research without bogging the audience down in the details, and he honestly revealed where he came up short.
Duchamp began with two basic premises: (1) the next URL to be requested by a client is likely to be one embedded as a hyperlink in one of the last few pages requested by that client, and (2) past access patterns of a large population of clients are likely to be relevant to a particular client.
The basic method is:
1. The client sends to a page's server a record (called a "usage report") detailing which of that page's hyperlinks it referenced.
2. The server aggregates such information from many clients.
3. When responding to a GET, the server attaches a summary (called a "usage profile") of past usage reports for that page.
4. On the basis of a page's usage profile, the client decides whether to prefetch any of its hyperlinks. Usage reports and profiles are passed via a new HTTP extension header.
The paper presents a brief but comprehensive summary (not included in the presentation) of related research in the extensive area of prefetching, which falls into three categories: software systems; algorithms, simulations and/or prototypes; and methods establishing bounds. The features distinguishing Duchamp's method from previous work were: it has been implemented; how information on access patterns is shared by the server over clients; occurs in near-realtime; is client-configurable; many previously uncachable pages can be prefetched; both client and server can cap operations to limit impact on overhead and bandwidth; and it operates as an HTTP extension.
The overall results were very positive: client latency improved greatly (slightly over 50%), and less of the cache was wasted (about 60% of prefetched pages were eventually used).
Both client and server modifications can be implemented as proxies, eliminating the need to alter browsers or Web servers; however, there are disadvantages to a client proxy.
Other "gotchas" were: time-dependent accesses; objects set with zero expiration time; inaccessible HTML; sabotage (using prefetching to overload the network); privacy concerns; and the fact that usage patterns are beginning to have commercial value, raising payment issues.
The server-side implementation is a proxy based on W3C's HTTPd. Two client-side implementations exist: a modification of the Mozilla browser from Netscape and a proxy based on HTTPd. Performance was evaluated for prefetch accuracy, client latency, network overhead, program space overhead, and program time overhead.
Mining Longest Repeating Subsequences to Predict World Wide Web Surfing
Jim Pitkow and Peter Pirolli, Xerox PARC
Various Markov models were compared to assess their ability in pattern extraction and pattern matching. Two techniques were motivated, longest repeating subsequences (LRS) and weighted specificity. LRS is a means of identifying the most information-rich subsequences in navigation log files. This data was then decomposed into several different Markov models to compute conditional probabilities for sequential transitions; that is, if a user is on a page, what is the probability of the user clicking any of the available links? The models had to be compact enough to be of practical use. These models were about 130KB in size, small enough to reside in each thread of a Web server. Using a single data set from Xerox, the models were able to predict the correct sequence 2731% of the time. The speaker cautioned that these results are tentative and need to be corroborated by future work. Finally, he presented a picture of "information scent," a visualization of user paths within a given Web site, with examples of "good info scent" and "bad info scent." While the algorithm producing the visualization was not discussed, it was presented as an alternative model to determine what information to prefetch to users.
Active Names: Flexible Location and Transport of Wide-Area Resources
Amin Vahdat, Duke University; Michael Dahlin, University of Texas at Austin; Thomas Anderson and Amit Aggarwal, University of Washington
This paper described a new framework for organizing and delivering distributed services over the Internet, called Active Names. The research is motivated by the fact that Internet services are increasingly distributed across various machines, but the limitations imposed by DNS (Domain Name Service) have resulted in many confusing suggestions for improving it. Active Names is meant to be a general design solution that encompasses much previous research on extending DNS. Its main points are (1) it provides a flexible end-to-end naming abstraction for WAN services and (2) it provides a framework for composing customizations provided by both clients and servers. The benefits would be increased network performance (reduced client latency) and a standard, unified approach to operating networked services.
The key concepts in the design are active names, namespaces, delegation, and after-methods. Each active name identifies a name and a namespace in which that name should be interpreted, and each such namespace is embodied by a Namespace Program. Unlike URLs which map to a specific IP address and specify where a service will be run namespace programs are location-independent. Namespace programs accept incoming data, determine the next namespace where output will be sent, and construct an after-methods list to send with the output. The interface to namespace programs facilitates composability (the ability of one namespace to call other namespaces) in two ways: (1) through delegation, where one namespace passes a name to a sub-namespace, and (2) through after-methods that specify a chain of Active Name services remaining to be run to finish name resolution for a request. These namespace programs execute within a resolver virtual machine that provides security and limits resource use.
The authors have demonstrated a fully functional core of the system and have built several useful applications. In response to questions, the authors explained that the system provides basic facilities on which applications enforce security and provide end-to-end fault tolerance, and that providing higher-level support for security and fault tolerance would be useful future work. The test system was built at Texas, Duke, and Washington using Java, and results indicate that Active Names can significantly reduce client latency in distributed ser-vices, in one case providing a fivefold reduction.
The kinds of questions raised by distributed processing have created an area of active research, including Active Services, Active Networks, Intentional Name System, and Transaction Processing monitors. However, the authors believe each of these has limitations that Active Names overcomes.
More information is available at <http://www.cs.utexas.edu/users/less/bb/>.
Person-level Routing in the Mobile People Architecture
Mema Roussopoulos, Petros Maniatis, Edward Swierk, Kevin Lai, Guido Appenzeller, and Mary Baker, Stanford University
Vinod Anupam, David M. Kristol, and Alain Mayer, Bell Laboratories, Lucent Technologies
The programmer's view of the new model was described, with code snippets illustrating how security policy is implemented in trust management. The utility of this feature was shown in an e-commerce example which requires automated cooperation between business sites. The methodical process by which the authors tested their new security layer was described. The addition of the document.ACL attribute, a key innovation in the new model, is currently before the W3C as a proposed standard. The new security model has been offered to the Mozilla open-source-development community for scrutiny before its implementation by Netscape.
More information is available at <http://www.mozilla.org/projects/security>.
PerDiS: Persistent Distributed Store
Marc Shapiro, INRiA Rocquencourt and Microsoft Research Cambridge
Marc Shapiro described a persistent distributed store or Internet caching for cooperative engineering. It exports the abstraction of a shared memory across the Internet. Shapiro believes that PerDiS is particularly simple to use, claiming that large centralized programs (including a 400,000-line CAD tool) have been ported with relative ease. Application programs allocate arbitrary objects inside clusters (i.e., files), and objects refer to one another with native pointers. Between an application program and the shared store, writes are buffered in a transactional log. This allows engineers to work on shared designs without interference. PerDiS has two major modes of operation. In a LAN, the store is kept coherent, whereas sharing over a WAN follows a check-in/check-out mode. PerDiS is open source. <http://www.perdis.esprit.ec.org/>
Improving Web Searching Performance Using Community-based Filtering
Distributed Object Consistency Protocol
The Flash Web Server
The Flash Web server is a high-performance Web server developed using a novel concurrency architecture. Flash combines the high performance of single-process event-driven servers on cached workloads with the performance of multiprocess and multithreaded servers on disk-bound workloads. Pai has found that the Flash Web server is easily portable, since it achieves these results using facilities available in all modern operating systems.
Webcard: A Java Card Web Server
Defeating TCP Congestion Control in Three Easy Steps
How to coerce a remote Web server to send at any rate you choose:
1. ACK Division. In TCP the sending point increases its congestion
window by one segment with each successive ACK it receives. Action:
Send multiple ACKs regardless of the packets received, up to 1,500 per
and watch your window grow quite rapidly.
2. Duplicate ACK Spoofing. TCP recovery after three duplicate ACKs at a sender involves retransmitting the packet and increasing the congestion window by one packet per duplicate ACK received. Action: Send a stream of duplicate ACKs, and for every additional duplicate ACK sent the congestion window is grown by one, which allows you to control the size of your window and thus the rate of transmission.
3. Optimistic ACKing. This involves sending an ACK for a packet 35 msec early, with the consequence being that the sender will send subsequent packets early.
The above strategies are all implemented in the TCP Daytona that Savage put together using fewer than 75 lines of code on Linux. Those who are interested can see the full article, "TCP Congestion Control with a Misbehaving Receiver" in the October '99 ACM Computer Commu-nications Review.
Automating Usability Assessment for Information-centric Web Sites
Appliance Data Servers
Using Full Reference History for Efficient Document Replacement in Web
Hyokyung Bahn, Seoul National University; Sam H. Noh, Hong-Ik University; Sang Lyul Min and Kern Koh, Seoul National University
This project focused on a better algorithm for directing the updating of documents in Web caches. It seeks to improve on previous algorithms by having the ability to optimize on any performance measure. The algorithm, Least Unified Value (LUV), uses the full reference history of documents to estimate the probability of being rereferenced.
Web caches have been much studied by the research community, but the authors believe the LUV algorithm is best to use in a Web cache replacement policy. According to their research, it offers the best overall effectiveness, performance, and robustness. The algorithm is basically a cost value computed for each cache document. The value is a weighted average of the document's reference potential multiplied by its weight, where weight is a function of cost and size. Reference potential is the likelihood of rereference, a function of past references. Cost can be considered in several ways, depending on the performance measure in which you are interested.
Since the LUV algorithm makes use of full reference history for each cache document, it might seem a burden to implement. But the speaker offered a proof that collapses the computation of the cost component. Implemented in a heap, this reduced computation of LUV to time O(log(2) n), where n is the number of cached documents. Experiments were done using traces from NLANR and DEC, though filtering was done on UDP and CGI requests and on requests larger than the size of the cache. Performance was compared to the nine cache algorithms for hit rate, byte hit rate, and delay-savings ratio. In most cases, the authors conclude that LUV outperformed all other algorithms irrespective of cache size. This algorithm considered only in-cache documents; future research will consider a perfect-history LUV, which includes replaced objects, for possible performance improvements.
Providing Dynamic and Customizable Caching Policies
J. Fritz Barnes and Raju Pandey, University of California at Davis
The research presented an object-oriented analysis of cache objects, specifying how customization policies can be applied; namely, in prefetch, routing, placement, coherency, removal, and "miscellaneous," the last category intended to address any protocol extensions. Implementation of the policies is accomplished by CacheL, a domain-specific language based on cache events developed by the authors. Two caching systems were built to evaluate the effectiveness of these ideas: DavisSim, an event-based cache simulator based on the Wisconsin Cache Simulator, and PoliSquid, an extension of the popular Squid Web cache. Analysis of the performance focused on whether caches benefit from customization and what overhead it demands. The analysis was broken down by client-customized, cache-customized, or server-customized polices. Results indicated that implementation was feasible and advantageous. Overhead was moderate, estimated at an 8.5% increase in latency without optimization. The possibility of cache-policy customization was demonstrated, but firmer evidence of performance improvements awaits future work.
More information is available at <http://pdclab.cs.ucdavis.edu/qosweb/CacheL>.
Exploiting Result Equivalence in Caching Dynamic Web Content
Ben Smith, Anurag Acharya, Tao Yang, and Huican Zhu, University of California at Santa Barbara
This work presented a proposal for a new protocol for enhancing Web caching and a prototype for implementing such a protocol. The basic idea is to identify query requests that have essentially equivalent or similar results and to ser-vice these subsequent requests from the cache. The usefulness of this is most apparent in image maps and queries conditionally qualified over some range.
The protocol, called Dynamic Content Caching (DCCP), classifies Web client requests according to three types of locality: identical, equivalent, or partially equivalent requests. Currently, Web cache hits can only identify requests with the same URL (identical request). DCCP goes further by allowing identification of requests with identical content (equivalent), or content that can serve as a temporary placeholder for a request (partially equivalent). This is accomplished by an extension mechanism in HTTP 1.1 for cache control directives.
Examples were shown using image maps, a weather service that uses ZIP codes to qualify queries, and a news service applicable to geographic regions. These are equivalent or partially equivalent requests that can be exploited by DCCP, implemented using the Swala cooperative Web cache. Evaluations were based on cache hit ratio and generated traffic using two real traces and one synthetic trace. Results were promising. For a map retrieval trace with three levels of error tolerance in matching, hit ratios can reach over 60% at a 10% error tolerance. The authors acknowledge that DCCP has a memory overhead cost, but this can be controlled by imposing a bound. They encountered difficulty in implementing efficient search when using complicated string-matching, and also did not address POST-based queries, which they plan to study in future work.
More information is available at <http://www.cs.ucsb.edu/research/swala>.
Efficient Support for Content-based Routing in Web Server Clusters
Chu-Sing Yang and Mon-Yen Luo, National Sun Yat-Sen University
This paper explored the advantages of a clustered Web server that uses a new content-aware request-routing. It offers a survey of the clustered-server approach to servicing high-traffic Web sites and suggests improvements in routing requests to improve overall performance via content-aware processing.
Using a clustered-server approach at high-traffic Web sites, where the initial node directs requests to specialized servers, is an advantageous approach undisputed in the literature. The basic methods for accomplishing this were summarized: client-side, DNS-based, TCP connection routing, and HTTP redirection. The issues ignored by these approaches that the authors identified and that their research addresses were: session integrity, load balancing, differential services, and content deployment.
The research focused on the difficulties that the TCP protocol imposes on a ser-ver-directed solution, primarily the difficulty of migrating the established connection. In their design, the dispatcher decides how to route on the basis of the data structures for a cluster table, a mapping table, and a URL table. The dispatcher maintains an awareness of the TCP connections, releasing them when necessary. Since the overhead of a new connection is prohibitive, the dispatcher conveys packets to the backend servers, modifying the packet IP and TCP headers before forwarding.
The design was implemented in a loadable module for the Linux kernel. WebBench was used to evaluate performance on a heterogenous collection of back-end servers. Compared to a "content-blind" server cluster, the content-aware cluster had greater throughput, averaging about 20MB/sec more after 16 client connections. The authors acknowledged that the extra overhead, limited scalability, and the dispatcher as a single point of failure were drawbacks to their approach. The advantages were higher performance, better routing decisions, and general content-aware intelligence that might be useful in future configurations.
Rapid Reverse DNS Lookups for Web Servers
William LeFebvre, Group Sys Consulting; Ken Craig, CNN Internet Technologies
Surprisingly little other work has been in this area. The design depends heavily on the multithreaded capabilities of Netscape's enterprise server API. Basically, a Rapid DNS server is placed between the (modified) Web server and the conventional DNS server. Quick answers are provided to its client Web servers on the front end (about 2 milliseconds or better, on average), using a bucket hash keyed on IP address. The Rapid DNS server makes periodic queries to the true DNS server off the back end. The connection from front to back is through a fixed-sized stack called a "leaky bucket" because of its LIFO design, which drops an increasing backload of requests off the end. Negative caching, whereby a cache entry is maintained for unknown domain names, significantly improved cache hit rates.
A trio of Rapid DNS servers was used to support the CNN Web farm of about 60 Web servers. Performance results were impressive: even with over 250 client connections, servers sustained queries in excess of 400 operations per second. Future research will investigate different queuing and caching policies. The code for this project was developed for CNN and remains proprietary.
Connection Scheduling in Web Servers
Mark E. Crovella and Robert Frangioso, Boston University; Mor Harchol-Balter, Carnegie Mellon University
Implementing the idea was problematic because task scheduling is traditionally under the control of the operating system, not the application. Implementing the idea without modifications of the kernel was a design goal of the experiments. The technique used here was to control concurrency (i.e., task scheduling) by varying the number of threads allowed in each connection pool. This heavily influences, but does not control, task scheduling within the kernel. This approach allows the technique to be more easily tested and implemented by others.
The SURGE software was used to simulate workloads of 400 to 2,000 User Equivalents (light to heavy). The response performance (mean transfer time) was then recorded, primarily using Apache server version 1.2.5 under Linux 2.0.36, with size-independent scheduling versus SRPT scheduling. All indications were that SRPT scheduling matters very significantly. Overall throughput was improved in all tests, and not at the expense of longer tasks. This somewhat surprising result is explained by the average file-size distribution of Web requests being heavily skewed toward smaller file sizes.
These promising results encourage further research, which will be
directed toward servers allowing more precise scheduling control and
dynamically adjustable job priorities.