Check out the new USENIX Web site. next up previous
Next: Measurement Methodology Up: Background Previous: Experimental Setup

Workloads


In order to use a widely-understood workload while still maintaining tractability in the analysis, we focus on a static content workload modeled on the SPECWeb96 and SPECWeb99 (19) benchmarks. These workloads are modeled after the access patterns of multiple Web sites, with file sizes ranging from 100 bytes to 900 KB, and are the de facto standards in industry, with more than 200 published results. File popularity is explicitly modeled - half of all accesses are for files in the 1KB-9KB range, with 35% in the 100-900 byte range, 14% in the 10KB-90KB range, and 1% in the 100KB-900KB range, yielding an average dynamic response size of roughly 14 KB. Each directory in the system contains 36 files (roughly 5 MB total), and the directories are chosen using a Zipf distribution with an alpha value of 1. The strong bias toward small files leads to the result that the most popular files consume very little aggregate space. Table 2 illustrates this heavy-tail feature well - the most popular 99% of the requests occupy at most 14% of the size of data set.


Table 2: SPECWeb's popularity distributions. All sizes shown in MB. Sizes do not scale linearly with the total data set size because directories are weighted using a Zipf popularity distribution
Data Set Top Top Top Top
Size 50 % 90 % 95 % 99 %
1024 2.1 39.5 64.6 138.3
2048 3.0 72.9 123.6 262.8
3072 4.4 101.8 181.2 385.7
4096 4.9 131.8 235.0 505.0



SPECWeb normally self-scales, increasing both data set size and number of simultaneous connections with the target throughput. However, this approach complicates comparisons between different servers, so we use fixed values for both parameters. To facilitate comparisons with previous work such as Haboob (21) and Knot (20), we use their parameters of a 3GB data set and 1024 simultaneous connections. With this data set size, most requests can be served from memory while a small portion will cause disk access. We also adopt the persistent connection model from these tests, with clients issuing 5 requests per connection before closing it. With these parameters, we maintain per-client throughput levels comparable to SPECWeb99's quality-of-service requirements.



next up previous
Next: Measurement Methodology Up: Background Previous: Experimental Setup
Yaoping Ruan
2006-04-18