We pose a grand challenge for anonymity: the development of a network architecture that enables applications to customize routes that tradeoff between anonymity and performance. Towards this challenge, we present the Application-Aware Anonymity (A3) routing service. We envision that A3 will serve as a powerful and flexible anonymous communications layer that will spur the future development of anonymity services.
Today's Internet routing protocols, while arguably robust and efficient, are not designed to support anonymous communication. To be routable, packets must include source and destination addresses, and any forgeries or omissions hinders deliverability and reliability. While there have been several attempts at providing anonymity with the use of application-level overlay networks, these solutions typically come at the expense of network performance, and do not consider the diverse requirements of potential client applications.
For example, in the case of decentralized anonymity techniques in which next-hops are selected by routers (``jondos'') in the network (e.g., Crowds  and Hordes ), there is no intrinsic support for favoring paths that meet certain application constraints. While source-routing techniques such as Tor  and onion routing  allow the application to select a path of anonymous routers, they do not support mechanisms for choosing nodes based on metrics such as latency, bandwidth, network location, AS membership, trust, etc.
On the other hand, to address the rigidity of the current Internet infrastructure, several proposed routing infrastructures such as NIRA , i3-ROSE  and P2  argue for providing end-host control over routing in the Internet. These proposals generally focus solely on performance-driven metrics such as latency, bandwidth, and resilience, and do not address anonymity.
We assert that the fundamental reason why the above proposals optimize for either performance or anonymity, but not both, is the inherent tension between the two: network path diversity leads to increased anonymity at the expense of network performance. To illustrate this tension in practice, we conducted measurements on PlanetLab  that demonstrate the relationship between the number of overlay nodes in a path and its round-trip-time (RTT). Figure 1 shows that an increase in anonymity (as indicated by the number of nodes in the path) corresponds to an increase in path RTT. We observe similar tradeoffs when comparing the number of distinct ASes per path against other performance metrics such as path throughput and jitter.
We argue that there exist several applications that desire not only customizable routing, but also the ability to fine-tune performance and anonymity to meet their specific needs. For example, an anonymous video conferencing system may require high bandwidth and low latency, but be willing to tolerate anonymity that provides only probable innocence1 . In such a case, the anonymous path from source to sink should include only intermediary nodes that provide high bandwidth and low latency. Anonymity goals such as traversing multiple autonomous systems (ASes) are secondary. In contrast, an anonymous email system may require very strong anonymity (i.e., beyond suspicion) while imposing no constraints on bandwidth or latency. Here, maximizing network diversity to reduce the feasibility of omniscient eavesdropping is paramount.
With this in mind, we introduce a grand challenge for anonymity: The development of a network architecture that allows applications to tradeoff between anonymity and performance, enforce application-specific constraints, and efficiently customize routes, without sacrificing either existing network scalability or security.
As a first step towards our challenge, we make the following contributions in this paper. First, we outline the design goals of our grand challenge: scalability, robustness, measurable anonymity and performance, and security. Second, we propose an initial design and implementation of the Application-Aware Anonymity (A3) routing service that attempts to meet our design goals. Third, we propose path selection algorithms that can be tuned to meet different anonymity and performance requirements of applications. These path selection algorithms can in turn be used to support anonymous unicast, multicast, and anycast, and may be optimized for a variety of metrics such as latency, jitter, bandwidth, etc. We then conclude with a discussion of future research directions.
We focus the discussion of A3 at the application layer as an overlay network, with an eye towards feasible deployment over the existing Internet infrastructure. Ultimately, we hope that the lessons learned will provide insights and spark debate into more fundamental security/performance tradeoffs that are inherent in the Internet.
In contrast to mix-based approaches, other techniques such as Crowds  and Hordes  rely on the network to provide anonymity. Here, packets are forwarded to anonymizing routers called jondos. Upon receiving an encrypted message, a jondo flips a weighted coin to determine whether the message should be forwarded to another jondo or to the final recipient. Clearly, such a system provides no policy enforcement mechanism. Since jondo selection occurs beyond the reach of the sender, including application-specified constraints with the initial request offers no guarantees that the requirements will be enforced.
In more recent work, there have been proposals on designing distributed anonymous networks [14,23,18]. Apart from scalability, these decentralized networks have the advantage that any participatory node may function as an anonymizing router, requiring eavesdroppers to observe more disparate network locations in order to break anonymity. These networks typically leverage peer-to-peer overlays such as distributed hash tables (DHT)  for scalable routing. We build upon these proposals by providing support for application-tunable routes.
Scalability: The service should not depend on any centralized authority (e.g. Tor's directory), hence providing graceful scaling in terms of the number of participating nodes.
Robustness: Given that the Internet is dynamic and routes may be transient, anonymized path selection algorithms must be highly adaptable to changing network conditions.
Measurable anonymity and performance: In order to decide the ``most desired'' path, the initiator of communication must be able to measure the anonymity and performance provided by a proposed path of anonymizing routers. For example, anonymity can be quantified by the number of AS crossings and performance in terms of latency, loss rate, bandwidth, etc. The measurements themselves must preserve anonymity of the initiator and all participating nodes.
Security: Malicious nodes should not be able to conduct denial-of-service (DoS) attacks against the system, nor should they be able to break anonymity (for example, by conducting Sybil  attacks). At a minimum, the anonymity service should be ``as secure'' as the current Internet - i.e., it should not introduce any additional security holes, nor should it ease an eavesdropper's task of breaking anonymity.
In the next section, we introduce the A3 routing service that attempts to meet the above goals.
The functions f_agg( ) and f_accept(R) are application-defined functions that are used to decide which nodes to select as intermediate hops in the path. In addition, the functions f_randomNode() and f_metric(S,D) require calls to external services which we describe in Section 4.2.
In all algorithms, denotes the sender's address, and denotes the number of desired nodes in the path.
Intuitively, Random offers a high degree of anonymity since nodes are chosen uniformly at random. However, anonymity comes at a cost - if f_accept is conservative in its acceptance of routes, then the algorithm will take a long time to terminate, resulting in high communication overhead (since, as we show below, f_randomNode and f_metric require anonymous network queries).
Adaptive ensures that during path construction, only hops that do not violate the application constraint will be added. We consider two special cases of Adaptive. The first, EnforcedAvg( ,N), restricts each hop in the path to consume up to an equal share of some total tolerable metric. That is, if is the maximum tolerable measurement (e.g., latency) for the anonymous path, then EnforcedAvg ensures that no hop will have a cost greater than . For example, if f_agg is the summation of non-negative costs, returns true if current aggregate ( ) divided by the current path length is less than or equal to . Note that in general, we can use EnforcedAvg only if the aggregation function f_agg is monotonic.
Intuitively, EnforcedAvg sacrifices some anonymity in favor of lowering communication cost by limiting per-hop cost. If an eavesdropper knows and , she has more information to infer potential next-hops along the path. Furthermore, depending upon the metric of interest, EnforcedAvg may lead to a selection of nodes that reside in a particular area of the Internet. This is especially relevant when the metric is latency. In their analysis of location diversity in the Tor network , Feamster and Dingledine show that the proximity of nodes in the anonymous path may increase vulnerability to traceforward attacks, particularly when the attacker can observe all traffic within an autonomous system. EnforcedAvg is therefore most appropriate when only limited anonymity is necessary and fast path selection is required.
To reduce the likelihood of traceforward attacks, we propose a variant of the Adaptive algorithm called PlusMinus( ,N, ). This algorithm allows some leeway when selecting nodes via the tolerance parameter , as long as the expected per-hop cost between any two hops in that path remains .
PlusMinus is implemented by modifying EnforcedAvg as follows. With probability , the function in PlusMinus returns true if the next potential link (i.e., to ) contributes no more than to the maximum tolerable cost ( ). To compensate for links with potentially greater-than average costs, with probability , returns true iff the candidate link has a cost less than or equal to . As with EnforcedAvg, the technique is applicable when the function f_agg is monotonic.
While the above path selection algorithms are not exhaustive, they serve to illustrate practical approaches in which applications can tradeoff anonymity and performance. In addition, we believe that these algorithms are sufficiently general to support a wide array of application-specified constraints (e.g., bandwidth, trust, loss).
We next describe the supporting functionalities of A3 that enable these algorithms. The functions f_randomNode() and f_metric(S,D) require the implementation of a directory service to retrieve nodes randomly, and a measurement service to measure pair-wise metrics of any two candidate nodes.
To achieve our goal of decentralization, we avoid using any centralized directory services for selecting candidate nodes. Instead, we utilize DHTs as a distributed node location service where nodes are selected randomly from the identifier space. In addition, to conceal the identity of any candidate node, an anonymized version of the basic DHT lookup  primitive must be used.
Once candidate nodes are selected, the f_metric(S,D) function computes the metric between any two nodes S and D. The naïve approach in which the sender asks each candidate node S to perform measurements to another node D introduces a significant communication overhead, requiring the establishment of expensive bidirectional onion routes during the path selection process. Remote measurement, on the other hand, requires only anonymous queries using the DHT.
An interesting direction that we are currently pursuing is the use of Vivaldi , a coordinate-based measurement system. In Vivaldi, each node is assigned a set of coordinates which it periodically adjusts such that the Cartesian distance between any two nodes corresponds to the distance of their chosen metric. The metric of interest is usually latency, but the same approach can be applied using any measurable quantity such as bandwidth, trust, loss rate, reachability, etc. The resulting coordinates are published into the DHT using the node's unique identifier, and retrieved anonymously2 using the same anonymized lookup primitive described above. Each metric (e.g., bandwidth, latency, trust, etc.) requires its own logical coordinate system.
Having presented path selection algorithms, we demonstrate how A3 can utilize these algorithms to provide supports multiple communication primitives including anonymous unicast, multicast, and anycast. In all cases, connections are established in A3 through the following steps:
In the case of multicast communication, the initiator may publish into the DHT additional meta-information concerning the multicast stream along with the IP address of the root of the multicast tree. Interested parties may then query the DHT to obtain the information required to join to the multicast tree3.
To ensure the robustness of constructed paths in a dynamic network, the initiator sends periodic pings (heartbeats) to the last node in the path via the onion route. If there is no response to pings, a new anonymous path is re-established by repeating steps 1-2. In addition, the initiator can monitor the performance of the path over time, and construct a new path if the measurement performance of a path falls below an application-specified threshold. For example, if the roundtrip time (the time between sending the ping and receiving the echo reply) exceeds a threshold, a new path must be selected. Similar measurements can be carried out for other performance metrics such as network jitter and throughput. For added resilience, A3 allows the creation of redundant anonymized routes. When a path is disrupted, a secondary path can be relied upon while a new path is constructed.
We present a preliminary comparison of the path selection algorithms via simulation. The simulator takes as input transit-stub topologies generated using the GT-ITM topology generator . The transit-stub topology consists of 585-590 stub ASes per transit domain, and 3 nodes per stub AS. We increase the number of nodes in the network by varying the number of transit domains from 5 to 10. The latency between transit nodes is set to ms, the latency between a transit and a stub node is ms, and the latency between any two nodes in the same stub is ms. For simplicity, we assume that each node knows the round-trip time between itself and all other peers.
In each iteration of the simulator, a sender is selected at random, and a path selection algorithm is executed to construct a path. In all cases, we enforce an application constraint that all end-to-end path latency have to be less than 700 ms. Fifty iterations are conducted for three different network sizes, and the results are averaged over the 150 total iterations.
Figure 2 shows the performance of the Random, EnforcedAvg, and PlusMinus path selection strategies as measured by their anonymity (left) and network performance (right). Random is the top performer in terms of anonymity, but results in the worst end-to-end latency. In contrast, EnforcedAvg offers the best network performance and the least anonymity. PlusMinus, which uses a tolerance parameter of , constitutes a middle-ground between the two. Although we only evaluated path anonymity and latency, the graphs clearly illustrate that the effectiveness of each algorithm depends on the metric being optimized. Our results suggest that applications can utilize different path selection algorithms (or vary ) to best meet their anonymity and performance requirements.
A3 is our initial attempt at designing an anonymous routing service that is decentralized, scalable, and adaptable in environments with a high rate of node churn. We emphasize that while we propose using DHTs and the Vivaldi coordinate system as building blocks for anonymous node selection and measurements, other services that satisfy our design goals may also be applicable and worth exploring. For example, Vivaldi is one among several synthetic coordinate systems, and there exists several third-party measurement services. The choice of Vivaldi is particularly attractive because it is non-intrusive, lightweight, and it leverages the same DHT infrastructure that we already use to locate nodes. On the downside, the use of DHTs exposes a number of known security vulnerabilities . While there are well-known, orthogonal fixes to these vulnerabilities, exploring alternative technologies is an interesting avenue of future work.
In this paper, we present the case for application-aware anonymous routing, and lay out a grand challenge of developing network architectures that can better adapt to application-specific anonymity and performance constraints. Towards this challenge, we present the Application-Aware Anonymity A3 routing service. We envision that A3 will serve as a powerful and flexible anonymous communications layer that will spur the future development of anonymity services.
Our future work is proceeding along several fronts. First, we are implementing A3 with the eventual goal of deploying on PlanetLab as an anonymous routing service. Second, we plan to explore declarative approaches  for applications to specify their anonymity and performance constraints. Third, we intend to formally analyze the anonymity properties of A3. Finally, we hope that our experiences will lead to fundamental insights into security and performance tradeoffs inherent in the Internet.
This work was supported in part by the NSF under contracts CNS-0524047, CNS-0627579, and NeTS-0721845.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer
Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -show_section_numbers -local_icons a3.tex
The translation was initiated by Micah Sherr on