Check out the new USENIX Web site. next up previous
Next: Locality in Web Accesses Up: Cost-Aware WWW Proxy Caching Previous: Implementation Concerns

Web Proxy Traces

As the conclusions from a trace-driven study inevitably depend on the traces, we tried to gather as many traces as possible. We were successful in obtaining the following traces of HTTP requests going through Web proxies:

We are in the process of obtaining more traces from other sources.

We present the results of fourteen traces. They include all of Virginia Tech and Boston University traces, and eight subsets of the DEC traces. The subsets are Web accesses made by users 0-512, and users 1024-2048, in each week, for the three and a half weeks period from Aug. 29 to Sep. 22, 1996. The use of the subsets is partly due to our current simulator's limitation (it cannot simulate more than two million requests at a time), and partly due to our observation that a caching proxy server built out of a high-end workstation can only service about 512 users at a time.

We perform some necessary pre-processing over the traces. For the DEC traces, we simulated only those requests whose replies are cacheable as specified in HTTP 1.1 [HT97] (i.e. GET or HEAD requests with status 200, 203, 206, 300, or 301, and not a ``cgi_bin'' request). In addition, we do not include those requests that are queries (i.e. ``?'' appears in the URL), though such requests are a small fraction of total cacheable requests (around 3% to 5%). For Virginia Tech traces, we simulated only the ``GET'' requests with reply status 200 and a known reply size. Thus, our numbers differ from what are reported in  [WASAF96]. The Virginia Tech traces unfortunately do not come with latency information. For Boston University traces, we simulated only those requests that are not serviced out of browser caches.




next up previous
Next: Locality in Web Accesses Up: Cost-Aware WWW Proxy Caching Previous: Implementation Concerns

Pei Cao
Thu Oct 23 18:04:42 CDT 1997