Design

Design

4 Design

This section adds to Section 1 by explaining two topics in greater depth: the protocol used to exchange usage reports and usage profiles and the prefetching algorithm used by the client.

4.1 Information Exchange Protocol

One new HTTP header is defined, Prefetch. There are five Prefetch directives: Negotiate, Report, Profile, Prefetch, and Halt. Some directives take arguments. In particular, usage reports are provided as arguments to the Report directive and usage profiles are provided as arguments to the Profile directive.

Both client and server can limit the amount and define the format of prefetching information they exchange, and specify when such exchanges may take place. When initiated by a client, Prefetch: Negotiate should be sent along with a message to which a response is expected, such as the first GET. Prefetch: Report may be sent by the client along with any message after negotiation has finished. Prefetch: Profile is sent by the server only on GET responses, since the profile pertains to only those URLs whose body is included in the message.

The Negotiate directive can be specialized through a number of arguments that specify whether usage reports will be sent occasionally, periodically, after a certain number of page accesses, or once the usage report reaches a certain size. The most common use of Negotiate, Report, Profile, and Prefetch directives is as follows:

client             server
------             ------
   ------------------>
       negotiate
   <------------------
       negotiate
         ...
   ------------------>
        report
         ...
   <------------------
        profile
         ...
   ------------------>
        prefetch
   <------------------
        prefetch

Three other arguments to Negotiate specify the desired format of future usage profile directives. One argument indicates how to age the data. For example, last=10,20,50 means that the server's usage profile should, if possible, summarize which embedded references have been used in the last 10, 20, and 50 references to the page. The server is bound only to make a best effort to retain enough data to deliver usage profiles in the format it has negotiated. The other two arguments specify limits on the size of the usage profile: absolute byte count and relative to the attached GET response.

4.1.1 Record Format

A usage report includes the referring and referred-to URLs, when the request took place, how long it took to satisfy, and the size of the result. Only successful references to HREFs are reported. Usage profiles comprise two sorts of records. One is Last N where N is an integer. The other sort of record is similar to a usage report prefaced by an integer, M. The two types of records are intermixed, with one Last N record followed by some number of ``M...'' records. This pattern repeats. The meaning of such a sequence of records is that the last N times this page was referenced, it happened M_i times that URL_i was referenced by that page. The sum of M_i need not equal N.

Auxiliary information, such as the elapsed time between references to P and some embedded HREF Q, appear in the summary as medians plus standard deviations.

4.2 Prefetching Algorithm

The client-side prefetch algorithm that has been used for the measurements in this paper is the following. During negotiation with a server, the client asks for the usage profile to summarize the last 10 and 50 references. The client continually measures the speed of its HTTP GET transfers, and maintains a running average; the speed varies depending on many factors so the data is inaccurate, but inaccurate data is regarded as better than no data. Whenever a page is demand-fetched, its embedded HREFs are noted during the parsing necessary to display it. These HREFs are put on a list and the list is passed to the prefetcher along with the usage profile that came in the HTTP header. After page display is complete, the prefetch algorithm runs, comparing the embedded hyperlinks with the usage profile. The comparison ensures that a recently deleted hyperlink mentioned in the usage profile will not be prefetched. The usage profile indicates the size of embedded HREFs (including the size of embedded images), and the prefetcher ensures that it never has GET requests outstanding for more than a certain fraction of the measured average bandwidth available to it: 50%, to be safe considering the inaccuracy of the bandwidth measure. Until the bandwidth limit is reached, the client prefetches URLs that have been accessed among the last 10 references, in descending order of popularity, down to a limit of 25%. That is, if the chances of a page being accessed are less than one quarter, that page is not prefetched. A prefetch request is not complete until the HTML page and all its embedded images have been loaded. A demand fetch aborts all prefetching efforts. As prefetch requests complete, more are issued from the ``last-10'' list and then from the ``last-50'' list, again in descending order of popularity down to the limit.