Related Work

The use of replication to improve system performance and reliability is not new. For example, process groups have been successfully incorporated into the design of some transaction monitors [21]. The performance benefits of Web server replication were first observed in [7,23]. The authors also pointed out that resource replication may eliminate the consistency problems introduced by proxy server caching.

The architecture of the Web++ client is closely related to Smart Clients [44]. In fact, the Web++ client is a specific instance of a Smart Client. While Yoshikawa et. al. describe smart clients implementing FTP, TELNET and chat services, we concentrate on the HTTP service. We provide a detailed description how the client applet can be integrated with the browser environment. In addition, we describe a specific algorithm for selection of replicated HTTP servers and provide its detailed performance analysis. Finally, we describe the design and implementation of the server end.

Cisco DistributedDirector is a product that provides DNS-level replication [14]. DistributedDirector requires full server replication, because the redirection to a specific server is done at the network level. As argued in the Introduction, this may be impractical for several reasons. The DNS-level replication also leads to several problems with recursive DNS queries and DNS entry caching on the client. DistributedDirector relies on a modified DNS server that queries server-side agent to resolve a given hostname to an IP address of a server which is closest to the querying DNS client⁹. DistributedDirector supports several metrics including various modification of routing distance (#hops), random selection and round-trip-time (RTT). However, it is not clear from [14] how the RTT delay is measured and how it is used to select a server.

The Caching goes Replication (CgR) is a prototype of replicated web service [4]. A fundamental difference between the designs of Web++ and CgR is that CgR relies on a client-side proxy to intercept all client requests and redirect them to one of the replicated servers. The client-side proxy keeps track of all the servers, however no algorithms are given to maintain the client state in presence of dynamic addition or removal of servers from the system. Our work does not assume full server replication and provides a detailed analysis of resource replica selection algorithm.

Proxy server caching is similar to server replication in that it also aims at minimizing the response time by placing resources ``nearby'' the client [20,30,3,10,28,36,15,37,13]. The fundamental difference between proxy caches and replicated Web servers is that the replicated servers know about each other. Consequently, the servers can enforce any type of resource consistency unlike the proxy caches, which must rely on the expiration-based consistency supported by the HTTP protocol. Secondly, since the replicated servers are known to content providers, they provide an opportunity for replication of an active content. Finally, the efficiency of replicated servers does not depend on access locality, which is typically low for Web clients (most client trace studies show hit rates below 50% [3,10,15,28,36]).

Several algorithms for replicated resource selection have been studied in [11,22,23,38,17,35,27,19]. A detailed discussion of the subject can be found in Section 4.