Experiments

In this section we describe the experiments we used to quantify the utility of nettimer.

Methodology

In this section we describe and explain our methodology in running the experiments. Our approach is to take tcpdump traces on pairs of machines during a transfer between those machines while varying the bottleneck link bandwidth, path length, and workload. We then run these traces through nettimer and analyze the results. Our methodology consists of 1) the network topology, 2) the hardware and software platform, 3) accuracy measurement, 4) the network application workload, and 5) the network environment.

**Table:** This table shows the different path characteristics used in the experiments. The Short and Long column list the number of hops from host to host for the short and long path respectively. The RTT columns list the round-trip-times of the short and long paths in ms.
Type	Short	RTT	Long	RTT
Ethernet 100 Mb/s	4	1	17	74
Ethernet 10 Mb/s	4	1	17	80
WaveLAN 2 Mb/s	3	4	18	151
WaveLAN 11 Mb/s	3	4	18	151
ADSL	14	19	19	129
V.34 Modem	14	151	18	234
CDMA	14	696	18	727

Our network topology consists of a variety of paths (listed in Table 1) where we vary the bottleneck link technology and the length of the path. WaveLAN [wav00] is a wireless local area network technology made by Lucent. ADSL (Asymmetric Digital Subscriber Line) is a high bandwidth technology that uses phone lines to bring connectivity into homes and small businesses. We tested the Pacific Bell/SBC [dsl00] ADSL service. V.34 is an International Telecommunication Union (ITU) [itu00] standard for data communication over analog phone lines. We used the V.34 service of Stanford University. CDMA (Code Division Multiple Access) is a digital cellular technology. We tested CDMA service by Sprint PCS [spr00] with AT&T Global Internet Services as the Internet service provider. These are most of the link technologies that are currently available for users.

In all cases the bottleneck link is the link closest to one of the hosts. This allows us to measure the best and worst cases for nettimer as described below. The short paths are representative of local area and metropolitan area networks while the long paths are representative of a cross-country, wide area network. We were not able to get access to an international tracing machine.

**Table:** This table shows the different software versions used in the experiments. The release column gives the RPM package release number.
Name	Version	Release
GNU/Linux Kernel	2.2.16	22
RedHat	7.0	-
tcpdump	3.4	10
tcptrace	5.2.1	1
openssh	2.3.0p1	4
nettimer	2.1.0	1

All the tracing hosts are Intel Pentiums ranging from 266MHz to 500MHz. The versions of software used are listed in Table 2.

We measure network accuracy by showing a lower bound (TCP throughput on a path with little cross traffic) and an upper bound (the nominal bandwidth specified by the manufacturer). TCP throughput by itself is insufficient because it does not include the bandwidth consumed by link level headers, IP headers, TCP headers and retransmissions. The nominal bandwidth is insufficient because the manufacturer usually measures under conditions that may be difficult to achieve in practice. Another possibility would be for us to measure each of the bottleneck link technologies on an isolated test bed. However, given the number and types of link technologies, this would have been difficult.

The network application workload consists of using scp (a secure file transfer program from openssh) to copy a 7476723 byte MP3 file once in each directions along a path. The transfer is terminated after five minutes even if the file has not been fully transferred.

We copy the file in both directions because 1) the ADSL technology is asymmetric and we want to measure both bandwidths and 2) we want to take measurements where the bottleneck link is the first link and the last link. A first link bottleneck link is the worst case for nettimer because it provides the most opportunity for cross traffic to interfere with the packet pair property. A last link bottleneck link is the best case for the opposite reason.

We copy a 7476723 byte file as a compromise between having enough samples to work with and not having so many samples that traces are cumbersome to work with. We terminate the tracing after five minutes so that we do not have to wait hours for the file to be transferred across the lower bandwidth links.

The network environment centers around the Stanford University campus but also includes the networks of Pacific Bell, Sprint PCS, Harvard University and the ISPs that connect Stanford and Harvard.

We ran five trials so that we could measure the effect of different levels of cross traffic during different times of day and different days of the week. The traces were started at 18:07 PST 12/01/2000 (Friday), 16:36 PST 12/02/2000 (Saturday), 11:07 PST 12/04/2000 (Monday), 18:39 PST 12/04/2000 (Monday), and 12:00 PST 12/05/2000 (Tuesday). We believe that these traces cover the peak traffic times of the networks that we tested on: commute time (Sprint PCS cellular), weekends and nights (Pacific Bell ADSL, Stanford V.34, Stanford residential network), work hours (Stanford and Harvard Computer Science Department networks).

Within the limits of our resources, we have selected as many different values for our experimental parameters as possible to capture some of the heterogeneity of the Internet.

Results

Varied Bottleneck Link

**Table:** This table summarizes `nettimer` results over all the times and days. ``Type'' lists the different bottleneck technologies. ``D'' lists the direction of the transfer. ``u'' and ``d'' indicate that data is flowing away from or towards the bottleneck end, respectively. ``Path'' indicates whether the (l)ong or (s)hort path is used. ``N'' lists the nominal bandwidth of the technology. ``TCP'' lists the TCP throughput. ``RB'' lists the `nettimer` results for Receiver Based packet pair. ( $\sigma$ ) lists the standard deviation over the different traces.
High bandwidth technologies (Mb/s):
Type	D	P	N	TCP ( $\sigma$ )	RB ( $\sigma$ )
Ethernet	d	s	100	21.22 (.13)	88.39 (.01)
Ethernet	d	l	100	2.09 (.41)	59.15 (.04)
Ethernet	u	s	100	19.92 (.05)	90.16 (.06)
Ethernet	u	l	100	1.51 (.58)	92.03 (.02)
Ethernet	d	s	10.0	6.56 (.06)	9.65 (.00)
Ethernet	d	l	10.0	1.85 (.14)	9.62 (.00)
Ethernet	u	s	10.0	7.80 (.03)	9.46 (.00)
Ethernet	u	l	10.0	1.66 (.21)	9.30 (.02)
WaveLAN	d	s	11.0	4.33 (.16)	6.52 (.20)
WaveLAN	d	l	11.0	1.63 (.13)	7.25 (.22)
WaveLAN	u	s	11.0	4.64 (.17)	5.30 (.12)
WaveLAN	u	l	11.0	1.51 (.32)	5.07 (.14)
WaveLAN	d	s	2.0	1.38 (.01)	1.48 (.02)
WaveLAN	d	l	2.0	1.05 (.09)	1.47 (.02)
WaveLAN	u	s	2.0	1.07 (.05)	1.21 (.01)
WaveLAN	u	l	2.0	0.87 (.26)	1.17 (.00)
ADSL	d	s	1.5	1.21 (.01)	1.24 (.00)
ADSL	d	l	1.5	1.16 (.01)	1.23 (.00)

Low bandwidth technologies (Kb/s):
Type	D	P	N	TCP ( $\sigma$ )	RB ( $\sigma$ )
ADSL	u	s	128	96.87 (.19)	109.28 (.00)
ADSL	u	l	128	107.0 (.01)	109.51 (.00)
V.34	d	s	33.6	26.43 (.04)	27.04 (.03)
V.34	d	l	33.6	26.77 (.04)	27.52 (.04)
V.34	u	s	33.6	27.98 (.01)	28.62 (.01)
V.34	u	l	33.6	28.05 (.00)	28.82 (.00)
CDMA	d	s	19.2	5.30 (.05)	10.88 (.05)
CDMA	d	l	19.2	5.15 (.09)	10.83 (.09)
CDMA	u	s	19.2	6.76 (.24)	18.48 (.05)
CDMA	u	l	19.2	6.50 (.53)	17.21 (.11)

One goal of this work is to determine whether nettimer can measure across a wide variety of network technologies. Dealing with different network technologies is not just a matter of dealing with different bandwidths because different technologies have very different link and physical layer protocols that could affect bandwidth measurement.

Using Table 3, we examine the short path Receiver Based Packet Pair results for the different technologies. This table gives the mean result over all the times and days of the TCP throughput and Receiver Based result reported by nettimer.

The Ethernet 100Mb/s case and to a lesser extent the Ethernet 10Mb/s case show that using TCP to measure the bandwidth of a high bandwidth link can be inaccurate and/or expensive. For both Ethernets, the TCP throughput is significantly less than the nominal bandwidth. This could be caused by cross traffic, not being able to open the TCP window enough, bottlenecks in the disk, inefficiencies in the operating system, and/or the encryption used by the scp application. In general, using TCP to measure bandwidth requires actually filling that bandwidth. This may be expensive in resources and/or inaccurate. We have no explanation for the RBPP result of 59Mb/s for the down long path.

In the WaveLAN cases, both the nettimer estimate and the TCP throughput estimate deviate significantly from the nominal. However, another study [BPSK96] reports a peak TCP throughput over WaveLAN 2Mb/s of 1.39Mb/s. We took the traces with a distance of less than 3m between the wireless node and the base station and there were no other obvious sources of electromagnetic radiation nearby. We speculate that the 2Mb/s and 11Mb/s nominal rates were achieved in an optimal environment shielded from external radio interference and conclude that the nettimer reported rate is close to the actual rate achievable in practice.

Another anomaly is that the nettimer measured WaveLAN bandwidths are consistently higher in the down direction than in the up direction. This is unlikely to be nettimer calculation error because the TCP throughputs are similarly asymmetric. Since the hardware in the PCMCIA NICs used in the host and the base station are identical, this is most likely due to an asymmetry in the MAC-layer protocol.

The nettimer measured ADSL bandwidth consistently deviates from the nominal by 15%-17%. Since the TCP throughput is very close to the nettimer measured bandwidth, this deviation is most likely due to the overhead from PPP headers and byte-stuffing (Pacific Bell/SBC ADSL uses PPP over Ethernet) and the overhead of encapsulating PPP packets in ATM (Pacific Bell/SBC ADSL modems use ATM to communicate with their switch). Link layer overhead is also the likely cause of the deviation in V.34 results.

The CDMA results exhibit an asymmetry similar to the WaveLAN results. However, we are fairly certain that the base station hardware is different from our client transceiver and this may explain the difference. However, this may also be due to an interference source close to the client and hidden from the base station. In addition, since the TCP throughputs are far from both the nominal and the nettimer measured bandwidth, the deviation may be due to nettimer measurement error.

We conclude that nettimer was able to measure the bottleneck link bandwidth of the different link technologies with a maximum error of 41%, but in most cases with an error less than 10%.

Resistance to Cross Traffic

We would expect that the long paths would have more cross traffic than the short paths and therefore interfere with nettimer. In addition, we would expect that bandwidth in the up direction would be more difficult to measure than bandwidth in the down direction because packets have to travel the entire path before their arrival time can be measured.

However, Table 3 shows that the RBPP technique and nettimer's filtering algorithm are able to filter out the effect of cross traffic such that nettimer is accurate for long paths even in the up direction.

In contrast, ROPP is much less accurate on the up paths than on the down paths (Section 4.2.3).

It was pointed out by an anonymous reviewer that there may be environments (e.g. a busy web server) where packet sizes and arrival times are highly correlated, which would violate some of the assumptions described in Section 2.2.1. There are definitely parts of the Internet containing technologies and/or traffic patterns so different from those described here that they cause nettimer's filtering algorithm to fail. One example is multi-channel ISDN, which is no longer in common use in the United States. We simply claim that nettimer is accurate in a variety of common cases which justifies further investigation into its effectiveness in other cases.

Different Packet Pair Techniques

**Table:** This table shows 11:07 PST 12/04/2000 `nettimer` results.``Type'' lists the different bottleneck technologies. ``D'' lists the direction of the transfer. ``u'' and ``d'' indicate that data is flowing away from or towards the bottleneck end, respectively. ``P'' indicates whether the (l)ong or (s)hort path is used. ``Nom'' lists the nominal bandwidth of the technology. ``RO'' and ``SB'' list the Receiver Only or Sender Based packet pair bandwidths respectively. ( $\sigma$ ) lists the standard deviation over the duration of the connection.
High bandwidth technologies (Mb/s):
Type	D	P	Nom	RO ( $\sigma$ )	SB ( $\sigma$ )
Ethernet	d	s	100.0	87.69 (.12)	29.22 (.46)
Ethernet	d	l	100.0	63.65 (.27)	22.56 (1.8)
Ethernet	u	s	100.0	697.39 (.12)	52.28 (.22)
Ethernet	u	l	100.0	706.34 (.04)	13.96 (1.1)
Ethernet	d	s	10.0	9.65 (.03)	92.80 (.49)
Ethernet	d	l	10.0	9.65 (.04)	12.44 (2.5)
Ethernet	u	s	10.0	84.03 (.47)	4.63 (.04)
Ethernet	u	l	10.0	97.85 (.04)	6.42 (2.6)
WaveLAN	d	s	11.0	8.00 (.22)	3.40 (3.1)
WaveLAN	d	l	11.0	8.36 (.25)	2.11 (.33)
WaveLAN	u	s	11.0	11.30 (.03)	2.43 (.29)
WaveLAN	u	l	11.0	11.56 (.03)	1.77 (.31)
WaveLAN	d	s	2.0	1.46 (.03)	0.76 (.03)
WaveLAN	d	l	2.0	1.46 (.04)	0.74 (.05)
WaveLAN	u	s	2.0	1.20 (.03)	0.60 (.00)
WaveLAN	u	l	2.0	1.20 (.03)	0.59 (.06)
ADSL	d	s	1.5	1.24 (.03)	0.59 (.04)
ADSL	d	l	1.5	1.24 (.04)	0.59 (.05)

Low bandwidth technologies (Kb/s):
Type	D	P	Nom	RO ( $\sigma$ )	SB ( $\sigma$ )
ADSL	u	s	128.0	465.34 (.04)	54.53 (.01)
ADSL	u	l	128.0	390.58 (.07)	53.89 (.04)
V.34	d	s	33.6	26.43 (.04)	6.94 (.95)
V.34	d	l	33.6	28.54 (.07)	5.35 (1.3)
V.34	u	s	33.6	831.67 (3.6)	14.45 (.05)
V.34	u	l	33.6	674.15 (2.7)	14.50 (.03)
CDMA	d	s	19.2	11.40 (.17)	9.85 (.36)
CDMA	d	l	19.2	12.07 (.09)	12.45 (.36)
CDMA	u	s	19.2	508.12 (1.5)	11.08 (.26)
CDMA	u	l	19.2	484.07 (1.2)	7.56 (2.0)

In this section, we examine the relative accuracy of the different packet pair techniques. Table 4 shows the Receiver-Only and Sender-Based results of one day's traces.

Sender Based Packet Pair is not particularly accurate, reporting 20%-50% of the estimated bandwidth, even on the short paths. As mentioned before, this is most likely the result of passively using TCP's non-per-packet acknowledgements and delayed acknowledgements. We discuss possible solutions to this in Section 5.

In the down direction for both long and short paths, Receiver Only Packet Pair is almost as accurate as RBPP. In contrast, Receiver Only Packet Pair is amazingly inaccurate in the up direction. For ROPP to make an accurate measurement, packets have to preserve their spacing resulting from the first link during their journey along all of the later links. ROPP cannot filter using the sent bandwidth (Section 2.2.2) because it does not have the cooperation of the sending host. Consequently, ROPP has poor accuracy compared to RBPP.

Agility

**Figure:** This graph shows the bandwidth reported by `nettimer` using RBPP at a particular time for Ethernet 10Mb/s in the down direction along the long path. The Y-axis shows the bandwidth in b/s on a log scale. The X-axis shows the number of seconds since tracing began.
$\includegraphics[angle=0,width=8cm]{graphics/connection_startup}$

In this section, we examine how quickly nettimer calculates bandwidth when a connection starts. Figure 5 shows the bandwidth that nettimer using RBPP reports at the beginning of a connection. The connection begins 1.88 seconds before the first point on the graph. nettimer initially reports a low bandwidth, then a (correct) high bandwidth, then a low bandwidth, then converges at the high bandwidth. The total time from the beginning of the connection to convergence is 3.72 seconds. It takes this long because scp requires several round trips to authenticate and negotiate the encryption.

If we measure from when the data packets begin to flow, nettimer converges when the 8th data packet arrives, 8.4 ms after the first data packet arrives, 10308 bytes into the connection. TCP would have reported the throughput at this point as 22.2Kb/s. Converging within 10308 bytes means that an adaptive web server could measure bandwidth using just the text portion of most web pages and then adapt its images based on that measurement.

Resources Consumed

**Table:** This table shows the CPU overhead consumed by nettimer and the application it is measuring. ``User'' lists the user-level CPU seconds consumed. ``System'' lists the system CPU seconds consumed. ``Elapsed'' lists the elapsed time that the program was running. ``% CPU'' lists (User + System) / `scp` Elapsed time.
Name	User	System	Elapsed	% CPU
server	.31	.43	32.47	4.52%
client	9.28	.15	26.00	57.6%
`scp`	.050	.21	16.37	1.59%

In this section, we quantify the resources consumed by nettimer. In contrast to the other experiments where we took traces and then used nettimer to process the traces, in this experiment, nettimer captured its own packets and calculated the bandwidth as the connection was in progress. We measure the Ethernet 100Mb/s short up path because this requires the most efficient processing. We use scp and copy the same file as before. The distributed packet capture server ran on an otherwise unloaded 366MHz Pentium II while the packet capture client and nettimer processing ran on an otherwise unloaded 266MHz Pentium II.

Table 5 lists the CPU resources consumed by each of the components. The CPU cycles consumed by the distributed packet capture server are negligible, even for a 366MHz processor on a 100Mb/s link. Nettimer itself does consume a substantial number of CPU seconds to classify packets into flows and run the filtering algorithm. However, this was on a relatively old 266MHz machine and this functionality does not need to be collocated with the machine providing the actual service being measured (in this case the scp program).

Transferring the packet headers from the libdpcap server to the client consumed 473926 bytes. Given that the file transferred is 7476723 bytes, the overhead is 6.34%. This is higher than the 5.00% predicted in Section 3.3.2 because 1) scp transfers some extra data for connection setup, 2) some data packets are retransmitted, and most significantly, 3) the libdpcap server captures its own traffic. The server captures its own traffic because it does not distinguish between the scp data packets and its own packet header traffic, so it captures the headers of packets containing the headers of packets containing headers and so on. Fortunately, there is a limit to the recursion so the net overhead is close to the predicted overhead.