In this paper, we propose a novel technique for finding client and local DNS server associations and potentially hidden load factors in a fast, non-intrusive, and accurate manner. Based on the results, we evaluate the proximity between clients and their LDNS using four metrics: AS clustering, network clustering, traceroute divergence, and round-trip time correlation.
We evaluate the potential effectiveness of DNS-based server selection in CDNs based on these metrics. We conclude that DNS is good for very coarse-grained server selection, since 64% of the associations belong to the same AS. DNS is less useful for finer-grained server selection, since only 16% of clients use a DNS server in the same network-aware cluster. These values can be improved to 88% and 66% respectively, if clients are configured to use a closer local DNS server. Since current CDNs are not present in many network-aware clusters, we conclude that although DNS-based server selection has inherent limitations due to potentially poor proximity correlation between a client and its LDNS, the impact is small due to the sparse distribution of CDN servers in today's CDNs.
At least one CDN has stated a goal of ultimately placing CDN servers in every edge network. The high fraction of clients using LDNS servers in different network-aware clusters suggests that CDNs may be unable to use DNS request routing for such fine-grained server selection unless DNS itself scales to provide each edge network with a local DNS server that communicates directly with the Internet. Thus, from an economic perspective, due to the inherent limited precision of DNS-based server selection, it is less beneficial to have so many CDN servers that the performance to two nearby servers is indistinguishable.
In addition to the proximity evaluation and the novel measurement methodology, our work also provides two additional contributions in improving DNS-based CDNs in general. From our observation, client-LDNS associations are fairly static. Thus, CDNs can build up a database of such associations to infer the geographic location of clients associated with an LDNS IP address to improve server selection. Furthermore, based on the URL-rewriting technique in our measurement methodology, CDNs can completely eliminate the originator problem by embedding the client IP addresses in the URLs of the Web pages, when a client initially requests the base page.