In this section, we focus on the impact that client-LDNS associations have on DNS-based server selection. We study this impact in detail for three of the largest commercial CDNs. We anonymize the CDN names to properly reflect the nature of this work as a research vehicle rather than any form of competitive analysis. All three CDNs chosen rely on deploying caches in multiple networks. ISP-based CDNs deployed by companies like AT&T and Qwest are excluded from this study, since their caches are located in one or two ASes. Since a client and its LDNS are very likely to be in the same AS (about 69% of HTTP requests in our study), an ISP-based CDN can easily identify a peering link that is suitable for the AS containing both of them7. The results described below are representative of all the data we collected and remained stable during our entire study.
Previous work by Johnson, et al.  has shown that DNS-based CDNs do not always pick the best server available. Here we study whether this is partly due to the inherent limitations of DNS-based server selection. The answer to this largely depends on the proximity between clients and local DNS servers and the location of CDN servers.
The proximity evaluation of client-LDNS associations using the network clustering metric indicates that, if a CDN had a server in each network cluster, about 84% of the selection decisions for the client population in our log could be suboptimal. This is because our study found only 16% of these clients have their LDNS in the same network cluster. For clients with their LDNS in different network clusters, the CDN would most likely resolve the DNS query from a client's LDNS to the CDN server in the LDNS's cluster and not the cluster where the client resides. In reality, and as we show below, even the biggest CDN today does not have a CDN server in every network cluster. Thus, it is important to examine the impact of DNS-based redirection in a commercial content distribution setting.
We assume that on average a CDN server within the client's AS/network cluster or smaller traceroute divergence (TD) is closer than one in a different cluster or larger TD. For clients with CDN servers in their clusters, if a CDN selects a server not in a client's cluster, this may be a suboptimal decision in terms of proximity. We also assume that CDNs attempt to optimize for proximity in most cases. Network bandwidth is less important, since the content delivered by these CDNs is relatively small in size. Although CDNs may also incorporate the avoidance of overloaded servers in their server selection algorithms, we believe that our assumption is reasonable because CDNs today are highly overprovisioned from the perspective of server capacity. Furthermore, we repeated our experiments on separate dates to avoid any possibility of a skew due to a flash event, and the results were always similar. One limitation in our results below is that we do not quantify suboptimal server selection in terms of end user performance, nor how close it is to the optimal server selection.
We first describe our measurement methdology then use AS/network clustering and traceroute divergence to study how the proximity between client and LDNS affect DNS-based server selection in three commercial CDNs.