Tables 11 and 12 show the results of our server selection evaluation using AS and network clustering. We collected 3,234,449 distinct client IP addresses in our logs. The first row of the table contains the number of clients with CDN servers in their clusters for the considered CDNs. Depending on the server density of each CDN, the number of clients with servers in their AS clusters ranges from 19% to 52% of the total clients in the log. This fraction is an order of magnitude lower in the context of network clusters. Thus, according to either metric, most clients will have to be served by remote servers. But a more interesting question is how many clients that could have been served by local servers are in reality directed to remote ones.
|Clients w/ CDN
|server in cluster
|(% verifiable clients)
|(% clusters occupied)
|MC w/ LDNS
|not in client's cluster
|(% misdirected clients)
To answer this question, we concentrate on clients with servers in their clusters and consider the LDNS-CDN server associations for these clients from the second data set. Unfortunately, not all of these LDNS servers respond to DNS queries from our machines. The second row of the tables gives the number of clients, among those with CDN servers in their clusters, whose LDNS servers responded to our queries. We call these clients verifiable because we could find out which CDN servers a CDN would redirect these clients to. The third row shows the number of clients that a CDN directed to an external CDN server (one that was outside the client's cluster), when there was an available CDN server within that cluster. We refer to such clients as misdirected clients (MC) based on the assumption that CDN servers within the cluster are closer than external ones, although we accept that other factors than proximity may have affected the assignment. We see a large number of misdirected clients according to both proximity metrics. To confirm that these misdirected clients are not due to any anomaly of clients belonging to a small number of clusters, we also show in the third row the percentage of clusters occupied by these clients relative to the total number of clusters of verified clients. The cluster percentage values are at least as big as the client percentage values. This means that the misdirected clients are fairly spread out in the number of clusters they occupy.
We conjecture that the reason that these clients are misdirected is that their LDNS servers are topologically distant from these clients. CDNs select a server close to the LDNS servers. The servers selected may therefore be suboptimal from the client's perspective. The last row of the tables shows misdirected clients with their LDNS outside their clusters. This row indicates the number of clients that inherently cannot be directed to the most proximal server using a DNS-based mechanism. According to Table 11, for AS clustering, they represent only half of misdirected clients. To understand why CDNs choose a CDN server in a different AS than the one containing the client and its LDNS server, we sampled a dozen of these clients using traceroute followed by DNS name resolution of the last-hop router IP address to estimate the geographic locations9 of the client, CDN servers in the client's AS, and selected CDN servers in a different AS. We found that in most cases, the selected CDN servers by CDNs are geographically closer to the client than CDN servers in the same AS. Assuming peering links between the client's AS and the selected CDN server's AS are not congested, redirecting to a nearby CDN server in a different AS may be a better decision than redirecting to a distant CDN server in the same AS. This observation also confirms our finding that AS clustering is a very coarse-grained metric for evaluating proximity.
For network clustering, the last row of Table 12 indicates that an overwhelmingly majority of misdirected clients have their LDNS servers in a different network cluster. This confirms our hypothesis that such misdirection is due to the fact that clients and their LDNS servers are often not proximal. It also shows the usefulness of network clustering because it is a fine-grained metric for evaluating proximity. We emphasize that we do not know the exact server selection policy used by a commercial CDN, so we cannot fully evaluate the effectiveness of its server selection decisions. However, given that there is such a strong correlation between misdirection and an LDNS being in a different cluster, we can infer that when the LDNS and client do not belong to the same network cluster, this limits the accuracy of server selection.
Table 13 shows the evaluation of DNS-based server selections according to the traceroute divergence metric.10 We performed traceroute from probe site 3 to a sample of client and local DNS servers from the log and the CDN cache servers from the third data set. The sample is chosen by randomly selecting one client-LDNS pair from the top 1200 client clusters generating the most HTTP requests. We found over 70% of the clients to be directed to a CDN server that is more distant than another available CDN server. Selecting the closest CDN server would have reduced traceroute divergence by as much as 19 hops for some clients.
|Client-LDNS pairs examined
|Clients with CDN servers at smaller
|TD than ones redirected to
|Median TD of CDN servers
|clients redirected to
|Median TD of closest CDN
|servers to clients
|Median TD improvement
Overall, we conclude that, among the clients we could verify, knowing the client's IP address would allow more accurate server selections in a large number of cases (443,394 for CDN X). The last row of Tables 11 and 12 also indicate the number of improved CDN server selections if the client's IP address were known to the CDN. Relative to the total number of clients, in the case of CDN X, this represents a small percentage: specifically 14% (443,394 out of 3,234,449). In general, the number of misdirected clients depends on the server density, placement, and selection algorithms.