Check out the new USENIX Web site. next up previous
Next: Traceroute divergence Up: Analysis results Previous: Analysis results

AS and network clustering


Table 4 shows the aggregate statistics from the data we collected--the number of clusters containing clients, the number of clusters containing local DNS servers, and the total number of clusters. We note that from daily routing table analysis from several major ISPs [9], up to 12,000 unique ASes were identified as being in use on November 12, 2001. The theoretical limit on the possible number of ASes is determined by the 16-bit AS identifier, resulting in a total of 64K ASes. Thus, we observed close to 80% of ASes that were identified on November 12, 2001 and close to 15% of the total possible ASes. With regard to network clusters, the maximum number of network clusters is 440K, since we used 440K unique prefixes. A one day extract from the 1998 Winter Olympic Games server log has 9,853 client clusters [13]. Thus, our measurement data contains close to ten times as many client clusters from one day of a popular Web server log and close to 25% of all possible network clusters. We conclude that the data we collected is extensive and covers a significant number of ASes and network clusters.

Table 4: Aggregate statistics of AS/network clustering
Metrics # of client # of LDNS total # of
clusters clusters clusters
AS clustering 9,215 8,590 9,570
Network clustering 98,001 53,321 104,950

Table 5: Percentage of client-LDNS associations sharing the same cluster classified according to the types of domains visited by the clients
Metrics Client IPs HTTP requests
educational commercial combined educational commercial combined
AS cluster 70% 63% 64% 83% 68% 69%
Network cluster 28% 16% 16% 44% 23% 24%

Table 5 shows the percentage of client-LDNS associations sharing the same cluster for clients visiting educational sites, commercial sites, and all sites in our measurement study. We observe that clients visiting educational sites have better proximity to their local DNS servers using the network- and AS- clustering metrics. This is expected since most of these clients also come from universities, which generally have a denser distribution of local DNS servers and better local DNS configurations than commercial ISPs. Because the majority of our log results from hits to the commercial sites, the proximity values for clients visiting all participating sites are very close to those visiting commercial sites alone. Because CDNs are most likely to accelerate commercial sites, we believe our client mix is representative of clients visiting a CDN-accelerated site. In the following discussion, we consider clients visiting all participating sites.

Using AS clustering, 64% of distinct client-LDNS associations share the same AS. Thus, more than half of the clients use a local DNS server in the same AS. This is expected, since it is common for an administrative domain to run its own DNS server. If users configure their DNS settings correctly, they typically use the LDNS in their administrative domain by default. About 69% of the HTTP requests come from clients using an LDNS server in the same AS cluster. This means clients with LDNS in the same AS are slightly more active than those that use an LDNS in another AS.

The above results indicate that in about 64% of the cases, CDNs could select appropriate servers using DNS redirection with the granularity of ASes. Thus, even if a CDN deployed a cache in every AS in the world, it could select the closest cache according to the AS metric only in 64% of the cases. However, AS clustering does not reveal how well redirection works for finer-grained load-balancing. An AS can span large geographical regions, causing network delays between two hosts within the same AS to be relatively high. For finer-grained load-balancing it is therefore important to consider network clustering, which groups together IP addresses that are close together topologically and likely to be under the same administrative domain.

The observations using network clustering are significantly different from the AS clustering results. Only 16% of the client-LDNS associations are in the same network cluster. This shows that most clients are not in the same routing entity as their local DNS servers. If the HTTP request count is taken into account, about 24% of the HTTP requests in our logs originated from clients that used an LDNS in the same network cluster. Again, the difference between these two numbers demonstrate that clients with LDNS in the same network clusters are more active than those with LDNS in a different network cluster.

Overall, these results indicate that DNS-based redirection can confidently select appropriate CDN servers with the granularity of an AS. However, for CDNs with multiple servers in the same AS, the selection may not be as accurate. If there is a CDN server in each network cluster, then DNS-based redirection will only select the CDN server in the same network cluster as the client about 24% of the time.

next up previous
Next: Traceroute divergence Up: Analysis results Previous: Analysis results