Experiment methodology

In the first data set, we sampled 42,991 LDNS servers from our measurement study. We obtained the second data set by sending DNS queries to these 42,991 LDNS servers using the dig command for a domain name of a Web site that we know is a customer of a given CDN. 27,918 of these LDNS servers do not use access control and hence answered the queries from our machines, as if these machines were their clients. To answer our queries, these LDNSs recursively resolved our queries with the CDN in question. The server selected by the CDN for this DNS query is exactly the same server that would be used by any real client associated with this LDNS, as if that client and not our machine initiated the DNS query.⁸

The third data set was obtained in a similar way, except we added a large number of additional LDNS servers to the 27,918 LDNS servers above, for a total of 41,754 different local DNS servers. This is to increase the likelihood of finding all CDN servers of a particular CDN for a given domain. The extensive list of geographically distributed LDNS servers was obtained from DNS server logs for a large Web site. The set of servers to which a given CDN resolved queries from these LDNSs represents the servers available in this CDN at the time of the experiment. We obtained our second and third data sets at around the same time each day to find the set of servers available to a CDN at the time it performed its server selection in the second experiment.

Note that our set of available servers is conservative, since we might not have discovered all available CDN servers. However, if a CDN performs a suboptimal server selection among a subset of all available servers, its server selection will remain suboptimal for a larger set: suboptimal means that we already found a closer server to the client than the one selected by the CDN. A superset of the list of servers would suffer from the same suboptimal assignment.

Many CDNs claim a much larger number of caches. However, CDNs do not utilize all servers for all Web sites and many of their locations may contain multiple caches. The statistics we gathered are for a particular domain served by a CDN. For example, when examining multiple different domain names served by the largest CDN in our study, we found multiple CDN IP address sets of approximately equal size which only partly overlapped. Each unique server IP address we discover may also account for multiple servers.

Table 10 shows the statistics of the CDN server IP addresses of the three CDNs studied for a single domain name obtained on August 7, 2001. These numbers were fairly stable during the course of our study. All three CDNs examined appear to redirect client requests by using DNS, although they may differ in the details of the algorithms. This table lists the total number of CDN servers discovered and the number of AS and network clusters these CDN servers represent. The data in Table 10 confirm our conjecture that CDNs today cover only a small number of all available network clusters for a single domain they serve. While the overall list of LDNSs used for generating the third data set represents 5,788 AS and 21,786 network clusters, the discovered CDN servers represent only a small fraction of these, even in the case of the largest CDN in our study.

**Table 10:** CDN cache servers for a particular domain name
	# of AS	# of network	# of CDN
CDN	clusters	clusters	servers IPs
	with servers	with servers
CDN X	622	740	1,567
CDN Y	120	152	195
CDN Z	60	79	154

With the three data sets above, we evaluate the quality of server selection by these CDNs by examining what percentage of clients are actually redirected to servers in their own cluster, among those clients that have at least one server in their cluster.

**Table 11:** The evaluation of server selection according to AS clustering
CDN	CDN X	CDN Y	CDN Z
Clients w/ CDN	1,679,515	1,215,372	618,897
server in cluster
Verifiable clients	1,324,022	961,382	516,969
Misdirected clients	809,683	752,822	434,905
(% verifiable clients)	(60%)	(77%)	(82%)
(% clusters occupied)	(92%)	(94%)	(94%)
MC w/ LDNS
not in client's cluster	443,394	354,928	262,713
(% misdirected
clients)	(55%)	(47%)	(60%)