Check out the new USENIX Web site. next up previous
Next: Conclusion Up: Sting: a TCP-based Network Previous: The Sting prototype

Experiences

  Anecdotally, our experience in using sting has been very positive. We've had considerable luck using it to debug network performance problems on asymmetric access technologies (e.g. cable modems). We've also used it as a day-to-day diagnostic tool to understand the source of Web latency. In the remainder of this section we present some preliminary results from a broad experiment to quantify the character of the loss seen from our site to the rest of the Internet.

For a twenty four hour period, we used sting to record loss rates from the University of Washington to a collection of 50 remote web servers. Choosing a reasonably-sized, yet representative, set of server sites is a difficult task due to the diversity of connectivity and load experienced at different points in the Internet. However, it is well established that the distribution of Web accesses is heavy-tailed; a small number of popular sites constitute a large fraction of overall requests, but the remainder of requests are distributed among a very large number of sites [BCF+99]. Consequently, we have constructed our target set to mirror this structural property - popular servers and random servers. Half of the 50 servers in our set are chosen from a list of the top 100 Web sites as advertised by www.top100.com in May of 1999. This list is generated from a collection of proxy logs and trace files. The remaining 25 servers were selected randomly using an interface provided by Yahoo! Inc. to select pages at random from its on-line database [Yah].

For our experiment we used a single centralized data collection machine, a 200Mhz Pentium Pro running FreeBSD 3.1. We probed each server roughly once every 10 minutes.


  
Figure: Forward loss measured across a twenty four hour period. Each point on this scatter-plot represents a measurement to one of 50 Web servers.
\begin{figure}

\epsfig {file=lossfor.ps,width=3.3in,angle=270}

\vspace*{-1.3
in}\end{figure}


  
Figure: Reverse loss measured across a twenty four hour period. Each point on this scatter-plot represents a measurement to one of 50 Web servers.
\begin{figure}

\epsfig {file=lossrev.ps,width=3.3in,angle=270}

\vspace*{-1.3in}\end{figure}

Figures 6 and  7 are scatter-plots showing the overall distribution of loss rates, forward and reverse respectively, during our measurement period. Not surprisingly, overall loss rates increase during business hours and wane during off-peak hours. However, it is also quite clear that forward and reverse loss rates vary independently. Overall the average reverse loss rate (1.5%) is more than twice the forward loss rate (0.7%) and at many times of the day this ratio is significantly larger.


  
Figure: CDF of the loss rates measured to and from a set of 25 popular Web servers across a twenty-four hour period.
\begin{figure}

\epsfig {file=popularcdf.ps,width=3.1in,angle=270}

\vspace*{-1.2in}\end{figure}


  
Figure: CDF of the loss rates measured to and from a set of 25 random Web servers across a twenty-four hour period.
\begin{figure}

\epsfig {file=randomcdf.ps,width=3.1in,angle=270}

\vspace*{-1.2in}\end{figure}

This reverse-dominant loss asymmetry is particularly prevalent among the popular Web servers. Figure 8 graphs a discrete cumulative distribution function (CDF) of the loss rates measured to and from the 25 popular servers. Here we can see that less than 2 percent of of the measurements to these servers ever record a lost packet in the forward direction. In contrast, 5 percent of the measurements see a reverse loss rate of 5 percent or more, and almost 3 percent of measurements lose more than a tenth of these packets. On average, the reverse loss rate is more than 10 times greater than the forward loss rate in this population. One explanation for this phenomenon is that Web servers generally send much more traffic than they receive, yet bandwidth is provisioned in a full-duplex fashion. Consequently, bottlenecks are much more likely to form on paths leaving popular Web servers and packets are much more likely to be dropped in this direction.

We see similar, although somewhat different results when we examine the random server population. Figure 8 graphs the corresponding CDF for these servers. Overall the loss rate is increased in both directions, but the forward loss rate has increased disproportionately. We suspect that this effect is strongly related to the lack of dedicated network infrastructure at these sites. Many of the random servers obtain network access from third-tier ISP's that serve large user populations. Consequently, unrelated Web traffic being delivered to other ISP customers directly competes with the packets we send to these servers.


next up previous
Next: Conclusion Up: Sting: a TCP-based Network Previous: The Sting prototype
Stefan Savage
8/31/1999