
In Figure 2, we illustrate the distribution of internode round trip times between nodes in the three data sets. The King measurements were limited to a maximum of 800ms. The data exhibit one important characteristic: spread. The applicationlevel, Azureus round trip times spread across four ordersofmagnitude, while the interDNS, King data set spreads across three. In theory, this is not a harbinger of higher embedding error; in practice, however, as Hong et al. have shown, the error between nodes whose distance is near the middle of the latency distribution tends to be the lowest [33]: with longer tails to this distribution, there are more edges to be inaccurate. (We found ICMP measurements exhibit a similarly wide distribution; see Section 7.5.) This wide spread is a warning sign that Azureus will have higher error than a system with a narrower round trip time distribution.

Network coordinate embeddings that use Euclidean distances make the assumption that the triangle inequality is not violated to a great extent by a large fraction of pairs of nodes. The triangle inequality states that for any triangle the length of a given side must be less than the sum of the other two sides but greater than the difference between the two sides, i.e., the sides must be able to form a triangle. When the latencies between node triples cannot form a triangle, they are said to violate the triangle inequality. Nodes with large and frequent violations tend to be the ones with the largest individual prediction error and their existence decreases overall accuracy (see [18] and Section 7.3).
We use a method from Tang and Crovella to examine the severity of
triangle inequality violations [31]. This method
normalizes the severity of each violation, permitting an allpairs
comparison.
For each node pair, we find the shortest path between the two that
passes through a third node. Thus, for all pairs of nodes i and
j, we find the best alternative path through a node and
normalize by the latency between i and j:
We examined the cause of the large fraction of pairs with very low rpl (less than ) in Azureus. We found that only a few nodes were members of many of these low rpl pairs. What distinguished these nodes  and what was the cause of their frequent participation in triangle inequality violations  was that their delay to nonPlanetLab nodes was atypically large, on the order of seconds, while their delay to other PlanetLab nodes remained typical (less than a second). In effect, this extended one side of the triangles these nodes participated in: d(i,j) became large while d(i,k) and d(k,j) did not. Because PlanetLab nodes that exhibited this behavior were colocated, we conjecture that the Azureus traffic to nonPlanetLab sites was being artificially limited at site gateways, while traffic to PlanetLab nodes avoided this traffic shaping. Rather than being a construct of the PlanetLab environment, this effect, leading to bi or multimodal latency distributions, will be the norm for at least some participants in Internetscale applications that use wellknown ports and consume a large amount of bandwidth, such as Azureus, because some sites will limit traffic and some will not. Like the round trip time spread, Azureus' violations foreshadow a higher embedding error.
Network coordinates would be less useful if a large number of dimensions were needed to capture the internode latencies of the Internet. Tang and Crovella used Principal Component Analysis (PCA) to hint at the number of dimensions required to encompass this information for several small data sets [31]. Because we wanted to know if few dimensions would be sufficient for a large, broad spectrum of endpoints, we used the same method to examine the intrinsic dimensionality of Azureus.
PCA is a linear transformation from one coordinate system to a new, orthogonal coordinate system. The new system is chosen such that each subsequent axis captures the maximum possible remaining variance in projections from points in the old system to points in the new: the first new axis captures the most variance, the second less, and so on. While an input system of k elements will produce an output system also of k elements, often only the first several dimensions of the output system will summarize all or part of the same distance information of the original set of points. Singular values are a result of the PCA transformation: each new axis has a corresponding singular value that describes the amount of variance captured by this axis. Thus, if a singular value is very small or zero, this suggests that this axis is unnecessary in describing the variance in a particular data set.
Because PCA requires a full matrix, we first used the following two techniques to fill in the remaining 9 percent of the Azureus matrix and the missing percent of the MIT matrix. We filled half of the missing Azureus values with the King technique [14] (King fails in certain cases, e.g., when the endpoint cannot be resolved). We interpolated the remaining values in both matrices by embedding each matrix and extracting the missing values.
We use a scree plot to illustrate how much variance each new singular value is capturing, which in turn hints at the inherent dimensionality of the underlying data set. The independent variables of a scree plot are the singular values, sorted by their magnitude; the dependent variables are their corresponding magnitudes. At the point where the magnitude of the singular values becomes zero or nearly zero, the relative importance of this and subsequent singular values (i.e., dimensions) is low. Up to this point, these dimensions are necessary to capture the values in the original input matrix, which in this case is made up of internode latency values.
We show the normalized singular values for the King, PlanetLab, and Azureus data sets in Figure 4. For comparison, we created synthetic 5d and 10d systems each containing 250 random points in a unit hypercube and found their singular values. As one would expect, the synthetic 5d and 10d data sets show a sharp knee soon after 5 and 10 singular values, respectively. In contrast, the bulk of the internode latency information from two Internetbased data sets requires very few dimensions. Azureus, in particular, is dominated by a single dimension, and MIT King by two. However, the next several dimensions remain significant for the few nodes that need to navigate around the clusters of nodes that have found good positions. In the data, this is shown by the continued relevance of singular values when compared to synthetic data sets. To lower the error for these nodes, we find 45 dimensions is appropriate for Internetscale network coordinates. While the previous two characteristics, round trip times and violations of the triangle inequality, suggest that the Azureus latency distribution will experience higher error than MIT King, its intrinsic dimensionality does not appear to be an additional impediment.
While the Azureus data set is clearly of low dimensionality, a more concrete way to examine the ``flatness'' of this largescale network is to look at its intercontinental latency distribution. In a way, it is surprising that embedding latencies found on a globe (the Earth) into a Euclidean space works at all. If messages could be routed in any direction of the Earth's surface, using a Euclidean metric would be a poor choice. Previous work on spherical coordinates, however, found they had significantly larger error than Euclidean ones [9]. Anecdotal evidence suggested that the main reason why the Internet embeds into a low dimensional Euclidean space is because the world is flat: traffic between Asia and Europe flows through North America [9].
An examination of our Azureus data set confirms that this traffic flow is indeed the case. We mapped the IP addresses in the data set to countries through their autonomous system record and, in turn, mapped these countries to continents. As Figure 5 illustrates, no messages from Asia to Europe were faster than those from Asia to North America; the same holds in the other direction. All paths between Asia and Europe appear to travel in a line across two oceans. This trend continues until the speed of the connection to ISPs or other coarse delays begin to dominate.
This flatness suggests why hyperbolic coordinates [28] also work well: North America maps to the center of the hyperbolic space. Indeed, research comparing these two methods found neither a Euclidean nor a hyperbolic metric dominates in all cases [19]. Thus, because the distribution of latencies is ``flat''  at least at a high level  using a Euclidean metric is sufficient. In the future, new direct transmission lines between Europe and Asia may change the Internet's shape, perhaps driving a shift to spherical coordinates.
From our experience tuning a network coordinate system on PlanetLab, we developed two techniques that lead to more stable and accurate coordinates on a small ``live'' system [18]. The Azureus and Harvard teams worked together to integrate these techniques into the Azureus code. After confirming that these techniques worked as expected, we found and resolved a new problem: skewed neighbor sets. This problem particularly disrupts largescale, live coordinate systems like Azureus that rely solely on other application communication for maintenance (i.e. they have zero maintenance costs) and has has been suggested as a goal for coordinate systems [9]. Through experimentation with these techniques in simulation and periodic measurement of the live system, we arrived at coordinates that are not perfect, but are a satisfactory start. We include a review of the techniques we developed as part of our previous research in Section 4.1 and describe our new technique, neighbor decay, in Section 4.2.
In previous work, we developed two simple filters that had distinct beneficial effects on a coordinate system running on PlanetLab [18]. The first type, which we call a latency filter, takes the stream of latency measurements from a remote node and turns these into an expected latency value. For a stream of measurements between nodes i and j, the goal of the latency filter is to summarize the measurements, providing a current and stable description of the expected latency between i and j. There were two main considerations affecting the value . First, anomalous measurements, sometimes several ordersofmagnitude larger than the baseline, would appear in the stream of measurements. For example, we would measure a roundtrip time of 1000ms when typical measurements were 200ms. Although we were using applicationlevel UDP measurements, we found these anomalies also occurred with ICMP. Second, the expected value could not be fixed at a single value. Due to congestion and BGP changes, the underlying latency between pairs of nodes changes. We found that using a simple, short, moving median worked as a latency filter compensating for both anomalous measurements and plateau shifts.
The second type of filter we developed on PlanetLab focuses on making coordinates more stable, not more accurate. These update filters tackle a problem shared across many types of applications that use network coordinates: discerning when a coordinate has changed ``enough'' to potentially necessitate an applicationlevel reaction (e.g., a service migration). In an early application we developed that used network coordinates [25], we found it was hard for the application to immediately determine if it should react to coordinate updates, which were occurring several times per minute. A single threshold (``react if moved more than 50ms'') did not work for all nodes because the volume through which each coordinate moved was nodedependent. We developed a generic filtering technique to allow applications to easily determine when to update coordinates. Applications that find all updates useful can bypass the filters.
Update filters make the distinction between constantly evolving ``systemlevel'' coordinates and stable ``applicationlevel'' coordinates, providing a barrier between these two: systemlevel coordinates fine tune the coordinate further with each measurement, while applicationlevel coordinates change only when the underlying coordinate has undergone a significant migration to a new location relative to other coordinates. In our previous work, we examined several heuristics for distinguishing between a systemlevel coordinate that was moving around a single point (not requiring applicationlevel notification) and one that had migrated to a new location (potentially requiring application activity). We found heuristics that compare windows of previous systemlevel coordinates to one another, especially those that augment this comparison with distances to other nodes in the system, perform well. Applications can tune how much these windows may differ before being notified.
Researchers have posited that a network coordinate subsystem could become a useful component of numerous largescale distributed applications, particularly if it could perform its job passively, that is, without generating any extra traffic. In our Azureus implementation, this passivity was forced upon us: we had no control over the selection of which nodes we gossipped with or when we gossipped with them, because the information necessary for a coordinate update was piggybacked on to other applicationlevel messages, e.g., DHT routing table maintenance. Due to this passivity and to churn, nodes did not have fixed sets of neighbors with which they could expect regular exchanges. In fact, nodes would frequently receive 13 updates from a remote node as that node was being tested for entry into the routing table and then never hear from that node again. The net effect of these limited exchanges was that each node's ``working set'' was much smaller than the number of nodes with which it actually communicated. Nodes were having blips of communication with many nodes, but constant communication with few. The goal of neighbor decay is to expand the size of the working set, which in turn improves accuracy.
A standard, gossipbased coordinate update involves taking new information from a single remote node and optimizing the local coordinate with respect to that node. If some set of remote nodes is sampled at approximately the same frequency, a node's coordinate will become optimized with respect to these remote coordinates (which are in turn performing the same process with their neighbors). However, if some remote nodes are sampled at a far greater frequency than others, the local coordinate optimization process will become skewed toward these nodes. In the theoretical limit, the result would be the same, but in practice, these skewed updates  a problem that could be expected in any passive implementation  slow the global optimization process.
Our solution to the problem of skewed neighbor updates is simple.
Instead of refining our coordinate with respect to the remote node
from which we just received new information, we refine it with respect
to all nodes from which we have recently received an update. To
normalize the sum of the forces of this recent neighbor set, we
scale the force of each neighbor by its age: older information
receives less weight. This allows nodes that we hear from only a few
times to have a lasting, smooth effect on our coordinate.
Algorithmically, we set the effect of a neighbor j on the aggregate force
F to be:
This use of an expanded neighbor set that decays slowly over time has two main benefits. First, because the force from each update is effectively sliced up and distributed over time, nodes' coordinates do not jump to locations where they have high error with respect to other members of the neighbor set. Second, by keeping track of recent, but not old, neighbors, neighbor decay acts to increase the effective size of the neighbor set, which in turn leads to higher global accuracy. In our implementation, nodes expired from the recent neighbor set after 30 minutes.
Note the distinct effects of neighbor decay from both latency and update filters. Latency filters generate a current, expected round trip time to a remote node and update filters prevent systemlevel coordinate updates from spuriously affecting application behavior. Neighbor decay, in contrast, handles the problem of skewed updates that can occur when network coordinates are maintained as a passive subsystem. It allows the smooth incorporation of information from a wider range of neighbors, particularly in a system where contact between nodes is highly transient. In simulation, we confirmed that neighbor decay substantially increased stability and moderately improved continuous relative error.
In this section, we review metrics used to evaluate coordinate systems and other latency services.
Relative Error.
Relative error, the most basic and intuitive measure of accuracy, is
the difference between the expected and actual
latencies between two nodes:
Relative error comes in three forms: global, continuous, and neighbor. Global relative error is the accuracy from the viewpoint of an omniscient external viewer: at one instant, the metric is computed for all links. With the simulations that use a latency matrix, this is what we compute because we do indeed have this viewpoint. Continuous error is what a node computes onthefly as it receives new observations from remote notes. This error is added to a statistic, such as an EWMA, as in Vivaldi's confidence. Two disadvantages to continuous error are (a) a single measurement may result in a large change in value and (b) it can become skewed by a handful of remote nodes if the ``working set'' of active gossip is small. Instead of continuous error, we use neighbor error as a proxy for global error when live nodes are performing the computation themselves, e.g., within live Azureus clients. Neighbor error is the distribution of relative errors for a set of recently contacted nodes. With a large number of neighbors, neighbor error generally provides a close approximation of global error.
Stability.
Stable coordinates are particularly important when a coordinate change triggers application
activity.
In our distributed streaming query system, for example, a coordinate
change could initiate a cascade of events, culminating in one or more
heavyweight process migrations [25]. If the systems'
coordinates have not changed significantly, there is no reason to
begin this process.
A stable coordinate system is one in which coordinates are
not changing over time, assuming that the network itself is
unchanging.
We use the rate of coordinate
change
Descriptions of and results from other metrics are included in technical report version of this paper [17].
Using a latency matrix can only tell part of the story of an Internet coordinate system. It helps describe the network's characteristics, e.g., its intrinsic dimensionality, but misses out on problems that may occur only in a running system, such as churn, changes in latencies over time, and measurement anomalies. We used three distinct methods to understand the online performance of Azureus' coordinates: (a) on PlanetLab, we ran instrumented Azureus clients that recorded the entirety of their coordinaterelated behavior (Section 6.2), (b) we crawled approximately ten thousand Azureus clients that internally tracked the performance of their coordinates using statistics we inserted into the Azureus code (Section 6.3), and (c) we ran a benchmark to determine the effectiveness of the coordinates on applicationlevel decisions (Section 6.4).
Because updates to the CVS tree could take weeks to proliferate to a majority of users, changing single variables or techniques was not feasible. Instead we relied on simulations and ongoing measurement to guide the rollout of two major coordinate versions.
Azureus' coordinates originally used two dimensions, height, and none of the three filtering techniques we described in Section 4. We call this version 2D+H. To create version 5D, we incorporated the two techniques from our previous research, latency and update filters, into the code. Based on our ongoing PlanetLab coordinate service, which did not use height and reliably exhibited low error, we also dropped height and added three more dimensions. Unfortunately, removing height proved to be a mistake. Through simulations of the Azureus latency matrix (see Figure 6), we realized we could expect a substantial improvement in accuracy by converting the last dimension of the 5d implementation to height without changing the gossip packet structure. We also found the highly skewed neighbor sets slowed convergence and developed the neighbor decay technique to compensate. We combined these changes and rolled out version 4D+H.
We took snapshots of each version by running clients on approximately 220 PlanetLab nodes. Each snapshot lasted for at least three days, and logged updates with approximately 10,000 Azureus nodes. We collected a snapshot for each of the three versions in March, July, and September 2006, respectively. Note that these instrumented clients never stored or transferred any content that travels over the Azureus network.
We compare data gathered from the different versions in Figure 7. Because the data are aggregated across roughly the same source PlanetLab nodes, the three snapshots provide a reasonable, though imperfect, way to isolate the effects of the different techniques. In all cases, we find 4D+H is more accurate and stable than both the original 2D+H and our initial rollout of 5D.
Our first revision had mixed results. Based on this data and on simulations with and without height, the data convey that the removal of height damaged accuracy more than the filters aided it. In retrospect, given the Azureus round trip time distribution (see Section 3.2), in which 7.6 percent of the node pairs exhibit round trip times greater than or equal to one second, it is not surprising that using height helped many nodes find a low error coordinate. In addition, given that two dimensions are enough to capture much of Azureus' inherent dimensionality, it is also not surprising that the addition of three dimensions did not radically improve accuracy. Although the 5D coordinates are less accurate, they are more than 21/2 ordersofmagnitude more stable because the latency filters prevent anomalous measurements from reaching the update algorithm.
Our second change was more successful. The introduction of neighbor decay and the reintroduction of height in 4D+H create a much more accurate coordinate space than either of the previous two snapshots. This increase in accuracy occurs because neighbor decay enables nodes to triangulate their coordinates with a larger fraction of the network (and their neighbors are doing the same) and because height supplies the numerous nodes on DSL and cable lines with the additional abstract distance over which all their physical communication must travel.
We first evaluated neighbor decay in simulation. To confirm its continued effectiveness in a live system, we performed an experiment where we monitored the convergence of a node with and without neighbor decay enabled as part of the 4D+H coordinate system. In an average of three trials, we found neighbor decay improved median accuracy by 35, 40, and 54 percent at the 15, 30, and 60 minute marks respectively.
The logs from our Azureus clients running on PlanetLab nodes provide a detailed view of a narrow slice of the system. To obtain a picture of the broader system, we inserted online statistics collection into the Azureus CVS tree. Using its recent neighbor set, each node computed its neighbor error and stability statistics on demand when probed. We present results from Azureus endhosts running version 4D+H.
Figure 8 ``live (all)'' illustrates the data from a crawl of 9477 endhosts. We exclude live nodes with fewer than 10 percent of the maximum 512 neighbors because their metrics are skewed to a very small percentage of the network. The data show that the bulk of the Azureus system experiences accuracy similar to clients running on PlanetLab. However, the error on the greater Azureus network has a long tail: at the 95th percentile, its accuracy is 76 percent worse. As we discuss in Section 7.1, we conjecture that the high rate of churn causes much of this difference in the tail.
In order to hint at the exigencies caused by running ``in the wild'' as opposed to safely in the lab, we compared the statistics from live Azureus nodes to those from our simulated embeddings of the Azureus latency matrix. In Figure 8, we compare live and simulated relative error. The data show a significant gap between live and simulated performance. (Prior work using the same simulator found simulations of PlanetLab mirrored live results [18].) The medians of the relative error distributions are 26 percent and 14 percent for live and simulated coordinates, respectively, a difference of 45 percent.
The data suggest that network coordinates have been partially tamed, but can be made substantially more accurate, and, therefore, more useful for distributed applications that would like to make cheap, quick decisions between providers of the same service. We show how the current level of accuracy affects these anycast decisions in the following section.
Accuracy and stability metrics capture applicationindependent, lowlevel behavior. To understand how Internetscale coordinate systems can affect applicationlevel behavior, we also examined how Azureus uses them to make higherlevel, anycast decisions in one of its common tasks: DHT key lookup. Azureus performs this operation for each tracker announcement, torrent rating lookup and publish, and NAT traversal rendezvous lookup and publish (for tunnelling through NATs).
We modified an Azureus client so that it used network coordinates to optimize lookup delay. Our experiment to evaluate the change in lookup delay first stored a set of keys in the DHT, then looked up each key using four distinct node selection methods, recording the time for the lookup operation. For each key, we ran the methods in random order.
Each method selects one node from a small set, i.e., is performing an anycast: all choices will make logical progress toward the target, some have lower latency than others. Azureus uses Kademlia, which defines the logical distance between two DHT keys as the exclusiveor of their bits [20]. Starting with the logically nearest known nodes to the target: XOR picks the logically nearest node, 2D+H picks the node whose latency as predicted by the 2D+H coordinates is smallest, 4D+H picks the lowest latency node as predicted by the 4D+H coordinates, and RANDOM picks randomly from the set. Each node contacted returns its neighbors that are logically close to the target. This repeats until either a node storing the key is found or the lookup fails. Because Azureus performs DHT lookups iteratively, we were able to experiment with the lookup algorithm through code updates on only a single node.
We plot the distribution of delays from storing 250 keys and performing 2500 lookups in Figure 9. Compared to the XOR method, which always chooses the nearest logical node, the data show that 4D+H reduces lookup delay by 33 percent at the 80th percentile. It is 12 percent faster than the early version of the coordinates, 2D+H, also at the 80th percentile. Because no latency prediction information is currently returned to the caller, the optimization only affects the selection of the first hop. In addition, we were not able to predict latencies to 34 percent of nodes due to version incompatibilities. Both of these factors suggest these improvements are conservative. We excluded lookups that timed out due to dropped UDP messages to avoid dependence on a particular timeout handling mechanism. These data show that using network coordinates can provide a substantial improvement to an applicationlevel process.
In this section, we examine five primary causes of the remaining difference between the current live accuracy and what appears to be achievable based on simulation results. The five barriers are: churn, drift, intrinsic error, corruption, and latency variance. We present techniques that address the first three barriers and nonmalicious corruption. However, malicious corruption and latency variance remain unsolved; indeed, the latter requires a fundamentally new approach to latency prediction. Based on our simulation and PlanetLab results and on monitoring Azureus over time, we have added the techniques that address churn, drift, and nonmalicious corruption to the Azureus code. While preliminary experiments suggest they function as expected, we have not yet fully quantified their effects and do not include results for them here.
Distributed network coordinate algorithms traditionally consider churn as part of their network model. Researchers ask the question: given an existing, stable system, how quickly can a new node find a stable, accurate coordinate? Unfortunately, implicit in this question is the assumption that the existing system has converged, and this assumption breaks down in many largescale distributed systems, including Azureus. As Figure 10, illustrates, Azureus follows the longtailed lifetime distribution typical of peertopeer systems [4]. (Azureus clients track uptime using an internal, queryable statistic.)
Because coordinate updates were on the order of tens of seconds or sometimes minutes apart, nodes often did not have much time to settle into a stable position before they exited the system. Using the data from our crawl of the live network, we separated nodes into ones that had been in the system for an hour or more and those that had not. We plot the relative error experienced by these two groups in Figure 11. The data confirm that these shortlived nodes, which make up the majority of the system, are substantially less accurate than longlived ones.
We considered three potential solutions to the problem of sustaining a coordinate system under high churn rates. First, nodes could perform a rapid initial triangulation process before shifting to a lower update rate. However, adjusting the gossip rate over time has two problems: (a) ``passive'' (i.e. maintenancefree) coordinate systems have no control over gossip and (b) in an ``active'' system, it would be a new, complex knob. Second, we considered ``greedy optimization,'' where instead of just stepping once through the update process, nodes would repeat until a (local) minimum had been reached with respect to the currently known neighbors. Unfortunately, we found that this form of optimization does not work well until many neighbors are known, which is not the case early in a node's lifetime. Finally, we found a solution that is both extremely simple and had positive results in simulation: instead of starting from scratch when restarting a client, have it begin where it left off. We performed an experiment where we varied the amount of churn in simulation and toggled whether or not nodes ``remembered'' their coordinate on reentry. In Figure 12, we show the results of this experiment. We found that when nodes started at the origin on reentry, they had a deleterious effect not only on themselves, but on overall system convergence. In contrast, with this simple technique, accuracy remained about the same as when there was no churn. While this technique assumes limited drift (see next section), it appears to be a promising start to resolving the noxious effect of churn on live coordinate systems.
Monitoring our PlanetLabbased coordinate service over several months revealed that coordinates migrated in a fairly constant direction. That is, the centroid of the coordinates did not move in a ``random walk,'' but instead drifted constantly and repeatedly in a vector away from the origin. This was surprising because our previous study, based on a shorter, threeday trace, had not exhibited this pattern [18].
While coordinates are meant to provide relative distance information, absolute coordinates matter too. One problem with drift is that applications that use them often need to make assumptions on maximum distances away from the ``true'' origin. For example, one could use Hilbert functions to map coordinates into a single dimension [5]. This requires an a priori estimate of the maximum volume the coordinates may fill up. Mapping functions like Hilbert require that the current centroid not drift from the origin without bound. As Freedman et al. note, a second, more practical problem with drift is that it limits the amount of time that cached coordinates remain useful [13].
A ``strawman'' solution to drift would be to continuously redefine the
origin as the centroid of the systems coordinate. Unfortunately, this
would require accurate statistical sampling of the coordinate
distribution and a reliable mechanism to advertise the current
centroid.
Our solution to drift is to apply a polynomiallyincreasing
gravity to coordinates as they become farther away from the
true origin. Gravity G is a force vector applied to the node's
coordinate after each update:
Drift does not occur in simulation if one is using a latency matrix and updating nodes randomly, because this form of simulation does not capture timedependent RTT variability. Instead, we used a 24hour trace of our PlanetLab service to simulate the effect of gravity; we show the effect of different strengths of gravity in Table 1. The data show that this simple technique does, in fact, keep the coordinate centroid highly stationary without affecting accuracy.
To confirm the effect of gravity on a live system, we added it to our ongoing PlanetLab service, which had approximately 300 participants. In Figure 13, we compare drift before and after adding gravity over two 18 day periods. The data show that gravity effectively eliminates drift. In addition, it did not reduce accuracy, which, in both cases, had a median of about 10 percent. While gravity does not actively limit rotation, we did not observe a rate greater than one full rotation per three days. Determining the cause of drift is beyond the scope of this work.
Violations of the triangle inequality occur more frequently and to a greater extent on Azureus than either on PlanetLab or for sets of DNS servers (see Section 3.3). We found, perhaps surprisingly, that removing a small number of the worst violators causes a large improvement in global accuracy. Not only do the violations these nodes take part in damage their own coordinates, but the damage they cause continues to reverberate throughout the system.
We performed an experiment where we removed a small percentage of the nodes with the largest triangle violations from the Azureus latency matrix and compared this to removing a random subset of nodes of the same size. We then computed a system of coordinates and found the relative error of each link. As Figure 14 illustrates, removing only the worst 0.5 percent of nodes leads to a 20 percent improvement in global accuracy. This data parallels results from theoretical work that showed how to decrease embedding distortion by sacrificing a small fraction of distances to be arbitrarily distorted [2]. These results show that if a mechanism could prevent these nodes from affecting the rest of the system, it would improve overall accuracy. Two example mechanisms for node selfdetection and removal from the coordinate system are: (a) directly evolving an estimate of the extent of their violations by asking neighbors for latencies to other neighbors, and (b) determining if they are subject to traffic shaping (based on the modality of their latency distribution), and therefore a major cause of triangle violations. Preliminary experiments with selfexclusion based on a simple bimodality test show an improvement in accuracy of 8 percent at the 95th percentile.
Kaafar et al. have begun investigating the more interesting side of the problem of coordinate corruption: malicious behavior [16]. They divide attacks into four classes: disorder, isolation, freeriding, and landmark control. While we did not see any evidence of intentionally corrupt messages, it would be trivial to install a client, or a set of clients, that responded with random values, for example (just as the MPAA runs clients with spurious content advertisements to squelch piracy). As Internetscale coordinate systems come into wider use, they will need to grapple with both oblivious and malicious corruption.
The prior ``barriers to accuracy'' paint a rosy picture; most problems have a fairly simple solution that practitioners can use to build more accurate, live coordinate systems. The existence of wide variation in latency measurements between the same pair of nodes over a short period of time is a harder problem with broad ramifications. If variances are very large what does it actually mean to ``predict'' the latency from one node to another? Using the data from our longest snapshot (5D), we show the standard deviation of latency between each pair of nodes in Figure 15. This problem affects other latency prediction systems as well. A reactive measurement service, such as Meridian, will be more errorprone or have higher overhead if small numbers of pings do not sufficiently measure the latency to a high variance target. In fact, coordinate systems may be in a better position to address this problem because they can retain histories of internode behavior.
As reviewed in Section 4.1, we developed latency filters in previous work. They act as a lowpass filter: anomalies are ignored while a baseline signal passes through. Additionally, they adapt to shifts in the baseline that BGP route changes cause, for example. These filters assign a link a single value that conveys the expected latency of the link. While we found these simple filters worked well on PlanetLab, describing a link with a single value is not appropriate with the enormous variance we observe on some of Azureus' links.
We ran an experiment where we compared ICMP, filtered, and raw latency measurements that were taken at the same time. To determine which destination nodes to use, we started Azureus on three PlanetLab nodes and chose five pingable neighbors after a twentyminute startup period. We then let Azureus continue to run normally for six hours while simultaneously measuring the latency to these nodes with ping. We plot the data in Figures 16 and 17. Figure 16 illustrates a pair similar to our PlanetLab observations: there was raw applicationlevel and ICMP variance, but a consistent baseline that could be described with a single value. In contrast, Figure 17 portrays a high variance pair: while the filter does approximate the median round trip time, it is difficult to say, at any point in time, what the latency is between this pair.
The impact of the dual problems of high latency variance and modifying algorithms to deal with high latency variance is not limited to network coordinate systems. Latency and anycast services deployed ``in the wild'' need to address this problem. While there may exist methods to incorporate this variance into coordinate systems  either through ``uncertainty'' in the latency filters or in the coordinates themselves  resolving this problem is beyond the scope of this paper.
Early work on latency prediction services focused on reducing the intractability of allpairs measurements through clustering. Based on the assumption that nodes in the same cluster would have similar latencies to nodes in another cluster, researchers examined how to create accurate clusters and how to minimize inter and intracluster measurement overhead. Francis et al. created clusters based on IP address prefixes, but found that prediction error was heavily dependent on the initial choice of representatives [12]. Chen et al. addressed this problem through the automatic formation of clusters and representatives; they found the cluster size and, more generally, the amenability of the network to clustering had a large effect on accuracy [6]. Ratnasamy et al. proposed a hybrid approach: nodes that are similar distances away from fixed landmarks place themselves into the same cluster; they also found error was highly dependent on the number of bins [26]. Because all of this clustering involves measurement and lower network layers are already performing much of this measurement, Nakao et al. proposed reducing overhead by tapping into this existing information; unfortunately, this requires a change in the interface of Internet routers [21].
While this research and the work on network coordinates that grew out of it focus on generalized latency prediction  maintaining an infrastructure that works well for most queries  a separate body of work has focused more directly on the problem of finding the nearest of many replicated services. In direct response to an applicationlevel request, Meridian finds the nearest overlay node (i.e., one running Meridian) to an arbitrary point in the Internet through a set of pings that progress logarithmically closer to the target [32]. Freedman et al. developed OASIS, a distributed service explicitly designed to help clients find and choose a ``good'' server out of many [13]. Building on Meridian, OASIS primarily focuses on network locality, but also incorporates liveness and load. OASIS employs a reliable core of hosts to map clients to nearby servers, which are assumed to be longlived. Note the distinct purposes of these anycast services from those of network coordinates: Meridian and OASIS are designed for the case where contact with the service will be frequent and longlived enough to outweigh the high upfront cost of finding the best service. With their current levels of accuracy (good but not perfect) and maintenance (zero), network coordinates fall to the other side of the tradeoff: shortlived, cheap decisions for which finding the exact answer is not worthwhile, but repeatedly finding a good answer leads to aggregate savings. While Meridian (and OASIS) are inherently reactive  acting in response to a query  they too could be more tightly integrated with an application, using its messages to dampen ring maintenance, for example.
There exist two main classes of algorithms for calculating coordinates: landmarkbased schemes, in which overlay nodes use a fixed number of landmark nodes to calculate their coordinates, and simulationbased schemes, which are decentralized and calculate coordinates by modeling nodes as entities in a physical system.
Landmarkbased. In GNP [22], nodes contact multiple landmark nodes to triangulate their coordinates. The drawbacks of this approach are that the accuracy of the coordinates depends on the choice of landmark nodes and landmark nodes may become a bottleneck. Lighthouses [24] addresses this by supporting multiple independent sets of landmarks with their own coordinate systems. These local coordinates map into a global coordinate system. PIC [8] does not use explicit landmarks, incorporating measurements to any node using a simplex optimization algorithm to obtain an uptodate coordinate. These landmarkbased schemes require a reasonably stable infrastructure and, to the best of our knowledge, have not been adopted for widespread use.
Simulationbased. Vivaldi [9] and Big Bang Simulation [27] determine coordinates using springrelaxation and forcefield simulation, respectively. In both, nodes attract and repel each other according to network distance measurements. The lowenergy state of the physical system corresponds to the coordinates with minimum error.
de Launois et al. propose a different method for stabilizing coordinates: asymptotically dampening the effect of each new Vivaldi measurement [10]. While this factor does mitigate oscillations in a fixed network, it prevents the algorithm from adapting to changing network conditions.
We have demonstrated that network coordinates in the wild do behave somewhat differently than do tame coordinates on PlanetLab or in simulation. Fortunately, even these wild coordinates can be tamed. Our analysis of a large, Internetscale coordinate system has convinced us to join the network coordinate supporters camp. While the initial network coordinate implementation illustrated some of the problems that critics often cite, we found that simple, but effective techniques overcame nearly all these issues. In Azureus, network coordinates provide a simple and efficient mechanism for anycast, as part of DHT lookups, and may soon be used to optimize streaming media. In addition to providing a wealth of data and analysis from a live, largescale deployment, we have deployed and evaluated six techniques that improve the accuracy and/or stability of network coordinate systems: latency filters, update filters, neighbor decay, coordinate memory, gravity, and violator exclusion. Together, these yield efficient, accurate, and stable network coordinates in the millionnode Azureus network. In the future, we plan to add the remaining techniques to the Azureus code and monitor their effectiveness.
We wish to thank Peter Pietzuch for early discussions on the myriad potential reasons for the discrepancy between simulated and real network coordinates, Olivier Chalouhi of Aelitis for allowing us to tinker with the Azureus source code, and Michael Parker of UCLA for putting us in touch with Aelitis and for porting our implementation to Java. In addition, we wish to thank our anonymous reviewers and our shepherd, Emin Gün Sirer, who provided extremely detailed constructive criticism.
This document was generated using the LaTeX2HTML translator Version 200221 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html split 0 show_section_numbers local_icons wildweb.tex
The translation was initiated by Jonathan Ledlie on 20070223
Jonathan Ledlie 20070223Last changed: 16 March 2007 ljc 