In the first data set, we sampled 42,991 LDNS servers from our measurement study. We obtained the second data set by sending DNS queries to these 42,991 LDNS servers using the dig command for a domain name of a Web site that we know is a customer of a given CDN. 27,918 of these LDNS servers do not use access control and hence answered the queries from our machines, as if these machines were their clients. To answer our queries, these LDNSs recursively resolved our queries with the CDN in question. The server selected by the CDN for this DNS query is exactly the same server that would be used by any real client associated with this LDNS, as if that client and not our machine initiated the DNS query.8
The third data set was obtained in a similar way, except we added a large number of additional LDNS servers to the 27,918 LDNS servers above, for a total of 41,754 different local DNS servers. This is to increase the likelihood of finding all CDN servers of a particular CDN for a given domain. The extensive list of geographically distributed LDNS servers was obtained from DNS server logs for a large Web site. The set of servers to which a given CDN resolved queries from these LDNSs represents the servers available in this CDN at the time of the experiment. We obtained our second and third data sets at around the same time each day to find the set of servers available to a CDN at the time it performed its server selection in the second experiment.
Note that our set of available servers is conservative, since we might not have discovered all available CDN servers. However, if a CDN performs a suboptimal server selection among a subset of all available servers, its server selection will remain suboptimal for a larger set: suboptimal means that we already found a closer server to the client than the one selected by the CDN. A superset of the list of servers would suffer from the same suboptimal assignment.
Many CDNs claim a much larger number of caches. However, CDNs do not utilize all servers for all Web sites and many of their locations may contain multiple caches. The statistics we gathered are for a particular domain served by a CDN. For example, when examining multiple different domain names served by the largest CDN in our study, we found multiple CDN IP address sets of approximately equal size which only partly overlapped. Each unique server IP address we discover may also account for multiple servers.
Table 10 shows the statistics of the CDN server IP addresses of the three CDNs studied for a single domain name obtained on August 7, 2001. These numbers were fairly stable during the course of our study. All three CDNs examined appear to redirect client requests by using DNS, although they may differ in the details of the algorithms. This table lists the total number of CDN servers discovered and the number of AS and network clusters these CDN servers represent. The data in Table 10 confirm our conjecture that CDNs today cover only a small number of all available network clusters for a single domain they serve. While the overall list of LDNSs used for generating the third data set represents 5,788 AS and 21,786 network clusters, the discovered CDN servers represent only a small fraction of these, even in the case of the largest CDN in our study.
|# of AS
|# of network
|# of CDN
With the three data sets above, we evaluate the quality of server selection by these CDNs by examining what percentage of clients are actually redirected to servers in their own cluster, among those clients that have at least one server in their cluster.
|Clients w/ CDN
|server in cluster
|(% verifiable clients)
|(% clusters occupied)
|MC w/ LDNS
|not in client's cluster