NSDI '05 Paper
[NSDI '05 Technical Program]
Sustaining Cooperation in Multi-Hop Wireless Networks
Ratul Mahajan   Maya Rodrig   David Wetherall   John Zahorjan
Selfish behavior is an important design consideration whenever parties with varied interests come together to achieve a common goal. Examples where individual behavior can be at odds with the system goal include free-riding in peer-to-peer file sharing networks (1,13,36,25,41,32), cheating in online games (33,6), ISP competition in Internet routing (12,38), and network congestion control (17,23,28,16,26,37,4). As has been observed in many of these systems, some parties will behave selfishly if there is gain to be had, even to the detriment of others. (Interestingly, ``TCP accelerators'' have been a concern but have not become pervasive because the bottleneck is usually close to the host, implying that there is little to be gained by deviating from the protocol.) A high-level goal in these systems is to design protocols that ensure the system will work well despite selfish behavior.
In this paper, we study the problem of selfish behavior in multi-hop wireless networks. The emergence of these networks is being driven by the rapid deployment of 802.11 networks and the advantages of relaying packets between nodes. In infrastructure rich areas, relaying can reduce dead spots, lower power consumption (31), and increase network capacity (19). In rural or developing areas, multi-hop wireless networks can be deployed more readily and at lower expense than traditional wireless networks. Research examples of multi-hop networks include MIT's Roofnet (35), Microsoft's MUP (2), the Digital Gangetic Plains Project (8), and UCAN (27).
Selfish behavior is a concern in this setting because relaying packets for others consumes bandwidth and energy. Unlike traditional, wired LANs, nodes in these networks are often controlled by independent and potentially competing parties, e.g., nearby apartments (35,2) or villages (8). In the absence of any pressure to behave cooperatively, nodes have an incentive to free-ride by sending their own packets without relaying packets for others. This concentrates traffic through the cooperative nodes, which decreases both individual and system throughput, and may even partition an otherwise connected network.
Deployed routing protocols ignore the issue of free-riding. They simply assume that factors external to the routing protocol cause all nodes to cooperate. This incurs no overhead but unfortunately makes it trivial for a node to free-ride, e.g., by using a simple firewall rule to render itself indistinguishable from a node that lacks the wireless connectivity needed to relay traffic. Moreover, we show experimentally (Section 5.2) that free-riders can obtain substantial benefits. We should reasonably expect free-riding to become prevalent in all but the most benign situations.
Proposed solutions typically incorporate enough mechanism in the routing protocol to eliminate free-riding. This often involves some form of distributed accounting that allows each node to consume no more forwarding service than it provides. These solutions suffer from two serious drawbacks. They require infrastructure that seems unlikely to come about in practice, e.g., centralized clearance services (44,34) or trusted hardware (11). And they impose overly restrictive requirements on the system, e.g., uniform traffic rates among all node pairs (39).
Our goal is to combine the strengths of these two approaches while avoiding their weaknesses. Like deployed protocols, we assume that most (but not all) nodes will behave cooperatively. Like proposed solutions, we do not rely on trust alone but include mechanisms that actively discourage free-riding. The insight underlying this combination is that early users of a system are typically cooperative (as they try to get the system to work at all) while selfish behavior emerges when the user base grows (22). Evolutionary game theory predicts that free-riding will not flourish if discouraged from an early stage (18).
Our solution is called Catch. It uses an existing majority of cooperative nodes to collectively discourage a minority of selfish nodes from free-riding. In game theory parlance, Catch assures that cooperation is an evolutionarily stable strategy. To achieve this, Catch uses novel techniques based on anonymous messages (in which the identity of the sender is hidden) to tackle two critical problems. First, Catch allows a cooperative node to determine whether its neighbors are free-riding, i.e., dropping packets that should be relayed. Second, it enables the cooperative neighbors of a free-rider to disconnect it from the rest of the network. These tasks can be accomplished even when cooperative nodes can communicate with each other only through potential free-riders. The result is that free-riding that previously succeeded is now deterred in a low-cost manner.
We have implemented and evaluated Catch on an in-building 802.11b testbed. This provides a realistic evaluation environment with the complex link quality factors that affect actual wireless systems. Real wireless conditions significantly complicate the implementation of robust mechanisms where nodes monitor the behavior of their neighbors. Yet they have received little attention in earlier work, which to our knowledge is exclusively based on simulation. We find that Catch is able to detect free-riding by individual nodes both quickly and with high accuracy. Its overhead is modest, roughly 24Kbps of control packets per node in our testbed, with no space overhead or cryptographic operations per data packet.
The rest of this paper is organized as follows. We describe our problem setting in Section 2, followed by our approach based on anonymous messages in Section 3. The Catch protocol itself is described in Section 4. Section 5 describes our evaluation based on the 802.11 testbed. We then report simulation results that analyze Catch across a broad range of parameters in Section 6. Finally, we present related work in Section 7 and our conclusions in Section 8.
We focus on selfish behavior, whereby a node gains at the expense of others, rather than malicious behavior, in which a node actively attacks others, e.g., by jamming its radio transmissions. Consider the simple example of a multi-hop wireless network in Figure 1. Here may wish to send a message to , either to communicate with itself or because serves as a gateway to additional nodes. Because and are not in each other's radio range, communication between then must rely on . On the other hand, may be interested in communicating via but uninterested in obtaining any service from . In that case, may want to avoid the costs of forwarding packets for .
can avoid these forwarding loads in two distinct ways: at the forwarding level and at the routing level. At the forwarding level, can simply drop some or all of the data packets it receives for forwarding from . At the routing level, can refuse to send routing messages that acknowledge connectivity with . Consequently, will appear to be a ``dead-end'' from 's perspective and unreachable from 's, and so neither will ever request forwarding of it. This strategy, which we call link concealment, is broadly applicable and, to our knowledge, no existing wireless routing protocol or policing scheme counters it. Our protocol, Catch, prevents from getting away with these selfish behaviors in the case that both and behave cooperatively. would appear to be immune from adverse consequences for free-riding, because at best only is aware of either of these behaviors (and it cannot communicate with except through ), and only can inflict any punishment on . But we will see that this is not so. 20) and imposing a startup cost for new identities.
Catch does not make any assumption regarding the routing protocol, traffic workload, or objectives of the nodes (such as bandwidth maximization or energy conservation). We believe that it works largely unchanged across these variables. We do not directly consider fairness issues but assume that a higher layer protocol decides what fraction of packets a node should relay for others. Catch can then be used to enforce that policy.
At a high-level, our approach is to use cooperative nodes to monitor for the presence of free-riders and to isolate them from the rest of the network. In this way, free-riding is no longer attractive. However, this approach requires us to tackle two problems, each of which is difficult or impossible to solve in the general case:
To distinguish deliberate packet dropping from wireless errors, we compare an estimate of the true connectivity of a node with its observed forwarding behavior. We use a watchdog (29) to observe the forwarding behavior of a testee node that is being tested for selfish behavior from a tester node that is assumed to operate correctly. (We use the terms tester and testee in these roles throughout this paper.) The watchdog relies on the broadcast nature of wireless transmissions. After a node sends a packet to a neighbor for relaying, it can listen to the wireless medium to observe whether the packet is forwarded by the neighbor. It can thereby build up an estimate of the neighbor's forwarding behavior over time.
It is more difficult to remotely estimate the true connectivity of a node. To do so, we develop an anonymous challenge message (ACM) sub-protocol as follows. Observe that even a selfish testee must depend on at least one of its testers to forward its packets if it is to stay connected. Call this tester the gateway. Let the gateway regularly but unpredictably send an anonymous challenge to the testee for it to rebroadcast; the gateway refuses to forward packets for the testee if it does not overhear the rebroadcasts (since it believes the testee is not connected or is free-riding). Now consider that all other testers with connectivity to the testee are also sending it anonymous challenges, requiring that they be rebroadcast. Because the testee cannot differentiate gateway challenges from other challenges, it must rebroadcast them all or risk losing connectivity to the gateway. This allows the other testers to estimate their connectivity to the testee. They then compare this to the observed forwarding behavior and infer deliberate packet dropping if there is a discrepancy. In practice, the estimates of connectivity and forwarding are statistical and only recent estimates are compared to allow for real wireless losses.
The ACM protocol is difficult to undermine even with weak anonymity because the likelihood of correctly handling a series of challenges decreases exponentially over time. Without breaking the protocol, a testee has only two options to avoid being flagged as deliberately dropping packets. First, it can be honest and reveal its true connectivity to its neighbors and forward their packets. This is what we desire. Second, it can selfishly drop both challenges and data packets in equal amounts and appear to be poorly connected to all its neighbors. But this is a counter-productive strategy. Because the challenges are anonymous they will be dropped independently of their source, and so data packets must also be dropped independently of their source to match. This forces the selfish node to drop and retransmit even its own packets, needlessly consuming its own resources. We note that the ACM protocol is compatible with nodes that sleep for power management, effectively dropping all packets. These nodes neither contribute to the network nor consume its resources, which we consider acceptable behavior. The ACM protocol also has the effect of discarding asymmetric links as does the 802.11 MAC.
Once a tester detects free-riding, it informs all other testers of the free-rider, so that they can simultaneously isolate it. This is necessary: if testers independently break connectivity with the free-rider, they only help the free-rider by reducing its forwarding burden while leaving it able to send its own packets through other testers. The challenge is to inform the other testers even though the only path to them might be via the free-rider, who may discard any incriminating information.
We define an anonymous neighbor verification (ANV) sub-protocol to allow a tester to reliably inform the other testers when the testee misbehaves. It operates in two phases. In the first (``ANV Open'') phase, all testers become aware of each other via the testee: each tester sends a cryptographic hash of a randomly generated token to the testee for it to rebroadcast, and other testers take note when the rebroadcast happens. As before, anonymous messages are used to prevent the testee from selectively excluding testers. If the testee does not rebroadcast these messages, the testers assume that it lacks connectivity or is free-riding and do not relay packets for it.
In the second (``ANV Close'') phase, each tester releases its token to the testee only if the testee has behaved well, as determined by the ACM protocol. The testee rebroadcasts this token. If the hash of the received token matches one of the hashes collected during the first phase, other testers know that this particular tester is satisfied; the original token can only be released by the tester who encrypted it because it is computationally hard to invert the hash. If a tester does not eventually hear all of the tokens it expects based on the first phase, it concludes that another tester is signaling the presence of a free-rider by refusing to release its token. The free-rider is then isolated by all testers. Note that it is crucial that failure of the testee be signaled by the absence of a message to prevent the free-rider from blocking the signal, as it could with a more straightforward positive signaling mechanism.
We make two further observations. First, as before, dropping messages in the first phase to exclude particular testers and their data packets is unlikely to succeed. This is because the likelihood of correctly matching anonymous messages to testers decreases exponentially over time. Second, interference in the second phase of the sub-protocol by the testee is clearly unproductive because it can only lead to its isolation.
Catch builds on the anonymous techniques above, adapting them for use in real, wireless networks.
Catch operates as a sequence of protocol epochs run between a testee node and its neighbors, who act as testers. Figure 3 provides two illustrations of the per-epoch protocol steps, one when the testee is cooperating and the other when it is free-riding.
Each tester applies two statistical tests per epoch to determine whether a testee is behaving correctly. Each test is designed to be sensitive to distinct selfish strategies. The key challenge in both is to avoid mistaking volatile wireless conditions for misbehavior.
One selfish strategy is to drop packets from a particular tester in the hope that the consensus across neighbors will be that the free-rider has passed the epoch, since all other testers should find its behavior acceptable. To detect this, each tester compares observed forwarding and true connectivity estimates for the last three epochs using the z test (30). We found that high confidence levels (99% and above) coupled with using measurements from multiple epochs provides a good balance between quick detection of free-riding and a low rate of false positives.
The second selfish strategy is to uniformly drop some fraction of the packets received from each tester, making it hard for any one of them to conclude that free-riding has taken place. To detect this, we employ the sign test (30) using the sign bits exchanged by all testers. This test is based on the idea that the perceived forwarding and connectivity rates should have identical means if the testee is not deliberately dropping packets. Thus, random fluctuations in each epoch should yield about as many results in which one exceeds the other as the opposite. Each tester accumulates the one-bit results for all epochs in which it has participated, and applies the sign test to decide if the balance is reasonable.
Isolation of a testee is decided by all testers in parallel. Each maintains a small history of per-epoch test results, represented as a three state finite state automaton (FSA) that moves to the right when an epoch fails and the left when an epoch passes. If the FSA falls off the right edge, the testee is isolated.
While it might seem that this scheme allows a node to free-ride for at least half of the epochs, the fact that the per-epoch test results depend on packet accounting data aggregated over the previous three epochs prevents this: free-riding in any one epoch impacts the tests for three consecutive epochs, and is likely to lead to multiple failed tests. We more fully explore this issue in Section 6.3.
Because Catch is designed to operate when some nodes act in a selfish manner, we are as concerned about what happens when the protocol is not followed as when it is. In Appendix A we provide a short analysis by message type that shows that selfish nodes cannot undermine the protocol in the absence of collusion.
This section describes our experiments with Catch on an 802.11b testbed. This allows us to test how well Catch works in wireless environments that exhibit complex packet loss behaviors (24).
Our testbed is composed of 15 PCs equipped with 802.11b that run Linux 2.4.26. We use NetGear MA311 PCI network adapters (Prism 2.5 chipset), operating in the ad-hoc mode on channel 1 using the hostap driver. Each node also has a wired Ethernet interface to facilitate remote management of the experiments.
Our system exhibits well-known characteristics of wireless networks, including error rates that are not a simple function of distance, that are strongly asymmetric, and that vary widely over time. Figure 5 gives a static summary of these effects. It shows the average one-way delivery rate in each direction for each pair of nodes that were able to communicate at all. To compute these rates, each node broadcast 500 1000-byte packets over two minutes. The other nodes counted how many of those packets they received. The figure shows a wide range of delivery rates rather than a binary state of connectedness, which is consistent with prior results (43,3). The diameter of our network is between 3 and 5 hops, depending on the threshold of link quality at which two nodes are considered connected.
We implemented Catch at user-level using the Linux netfilter framework to monitor and manipulate the packets sent, received, and forwarded by a node. The watchdog component of Catch also needs to overhear all packets sent by the node's neighbors regardless of their intended destination. To capture these packets, we operate our wireless network adapters in promiscuous mode and use the Linux pcap framework. The Catch protocol itself is written in ruby and is completely independent of the underlying routing protocol.
One complication is that the watchdog mechanism needs to account for 802.11 MAC-level retransmissions. To see this, consider a tester judging whether the testee forwarded a particular data packet. The quality of the link between the testee and the recipient determines the number of retransmissions done by a cooperative testee. This in turn changes the probability that the tester will overhear the transmission. To correct for this recipient-based variation, we measure the data forwarding rate using only the first transmission as indicated by a bit in the 802.11 MAC header. A complete implementation of Catch would also check that retransmissions are handled consistently to close a secondary loophole. We have not done so yet.
We use the following parameters values for our experiments; simulations suggest that Catch is not highly sensitive to the exact choices. The length of an epoch is set to one minute. The confidence interval for the z test is 99.999%, and that for the sign test is 99.995%. (Both experiments and simple analysis showed that very high confidence values are most effective.) There are fifteen anonymous ACM messages per epoch, each of which is 1500 bytes, the MTU (maximum transmission unit) size of our network adapters. The loss rate for smaller data packets (such as TCP acknowledgements) can be less than that of the ACM messages. To verify forwarding behavior, our implementation checks that the loss rate for data packets is less than that for ACM messages.
We first show the potential benefit of relaying packets by comparing the performance of a single, centrally located access point (AP) setup to that of multi-hop routes. To do this we transfer a large file from one node, which acts as the AP (node 8 in Figure 4), to four client nodes (nodes 4, 6, 9 and 14). Each client downloads a 600KB file ten times. In one set of experiments, the clients communicate directly with the AP. In the other, they use multi-hop routes via a single intermediary node, over paths 4:5:8, 6:7:8, 14:10:8, and 9:10:8. We use static routes between nodes to factor out effects that stem mainly from the routing protocols; wireless routing protocols are an open area of research (14).
We now consider the performance impact of free-riding, both as benefits to the free-riders and as costs to the cooperative nodes. We do this by contrasting the per-node throughput achieved in a fully cooperative network with those achieved when some nodes are allowed to free-ride.
In this experiment, we randomly selected 3 nodes as free-riders. All nodes were trying to download randomly selected files from randomly selected servers. Figure 7 illustrates the average amount of data transferred under the two scenarios: ``Free-riding Discouraged,'' which results in all nodes behaving cooperatively, and ``Free-riding Ignored,'' where free-riders simply do not relay packets for cooperative nodes. Both scenarios were run for 35 minutes. The two bars in each scenario average the per-node results for twelve nodes that acted cooperatively and for the three free-riders. The data illustrates two key points. First, there is a very large incentive to free-ride: the free-riders improve their throughput by 400% relative to when they are forced to cooperate. This indicates that there is considerable potential motivation for nodes to behave selfishly in these environments if they can do so without retribution. Second, the improved situation for the free-riders comes at the expense of cooperative nodes. The performance of the cooperative nodes is decreased by 25% when 20% of their fellow nodes selfishly misbehave. While this is only a single example, it clearly demonstrates the need to incorporate protection against free-riding in routing protocols.
In this section we evaluate the effectiveness of Catch.
Our first experiment measures the speed with which Catch detects free-riding. To construct a base case, we selected triplets of nodes such that both the first and the third node had a reasonable (75%) delivery rate to the second node. The second node was configured to act as a free-rider that randomly dropped a fraction of the packets it received for forwarding. We experimented with different drop rates; Drop rates less than 100% mimic a situation in which the free-rider tries to evade detection by appearing to be a cooperative but poorly connected node. The first node downloaded randomly selected files ranging from 1KB to 3MB in size from the third node. The request and response traffic was relayed through the second node. Five download sessions ran in parallel so that even in the presence of a high drop rate and TCP backoff dynamics, a minimum amount of traffic (roughly ten packets per epoch) is generated for the statistical tests.
The curve ``Drop packets from one'' shows the results for the case where the free-rider dropped packets only for the client. This evaluates whether a single victim can cause the free-rider to be isolated. We find that for high drop rates the detection speed is just as fast as the previous case. It is slower at lower drop rates, but even at the low drop rate of 10% the average detection time is less than 30 epochs. Thus, a free-rider that persistently drops packets of just one neighbor at a very low rate is eventually caught and punished.
We next check that the rapid detection of free-riders does not come at the cost of falsely accusing cooperative nodes of free-riding. We ran two five hour experiments in which all nodes were cooperative. Each node repeatedly downloaded files (as before) from randomly chosen servers. This workload is high enough to saturate our network, stressing the accuracy of inference and increasing the probability of false accusations. We observed no false positives in the first experiment and a single false positive in the second. It is difficult to estimate the true rate of false accusations from this because they are so rare, but nevertheless we find it encouraging.
We randomly selected three (20%) nodes as free-riders that dropped all the packets they received for forwarding. All nodes executed a workload similar to the one in the previous section with the exception that nodes only selected the cooperative nodes as file servers. We then measured the throughput obtained by the free-riders. It should be zero if coordinated isolation was successful. Figure 9 plots the average throughput obtained by the free-riding and cooperative nodes. It shows that the cooperative nodes successfully shut out the free-riders. Roughly eight minutes into the experiment, all the free-riders were identified and isolated. Though not shown in the graph, the spread of time over which different neighbors of a free-rider started isolating it was two minutes. The free-riders were allowed to send traffic again after the punishment interval of 30 minutes. The average throughput of the free-riders appears to recover before 30 minutes because different free-riders were isolated and released at different times.
We report on the overhead of Catch in this section. We have made no attempt to optimize the protocol because its requirements are already modest.
Consider the activity for a pair of neighboring nodes in an epoch, both playing the role of tester and testee. The packet overhead of Catch comes from its messages, which have different sizes and frequencies: StartEpoch (40 bytes), ACM challenges and responses (1500 bytes, 15 times per epoch), ANV open and close (100 bytes), and sign exchanges (40 bytes). These packets come to a total of 0.6 packets or 758 bytes per neighbor per second. Our testbed has fewer than four well-connected neighbors per node on average, which means that the protocol overhead is less than 2.4 packets per second or 3 KBps per node. This is 3% of the 100 KBps that the honest nodes got on average in Figure 9. The overhead would be even lower for the newer and faster 802.11a/g.
We found the processor consumption of Catch to also be very reasonable. Informally observed using top during our experiments, it took at most 10% of the CPU on Pentium-IV 3 GHz nodes. Much of this is an artifact of our user-level implementation. Each packet that passes through the local machine or is promiscuously overheard crosses the user-kernel boundary at least once. In fact, before moving to a PC-based testbed for OS reliability reasons, we had successfully experimented with Catch on a testbed composed of 10 iPAQs.
In this section we study the potential leverage of signal strength attacks on anonymity. We show that even in its present form Catch is useful in protecting the cooperative nodes and is by far preferable to doing nothing. Taking specific steps in Catch to discourage signal strength based cheats is the subject of future work.
At the MAC level, anonymity is a reasonable assumption, since it is possible to send packets with an arbitrary source address and contents using commonly available 802.11 hardware (7). At the physical level, however, strong anonymity cannot be guaranteed against a determined adversary: the source of a packet might be estimated, or at least classified, from the wireless signal strength or direction.
Catch provides protection against such cheats because the received signal strength from an individual neighbor varies over a range of values. When the ranges of multiple neighbors overlap it becomes impossible to accurately distinguish among them. Empirical reports of wireless network conditions (42,43,24) and localization schemes based on received signal strength (5) illustrate the difficulties of using signal strengths. As examples, Figure 10 shows the spread of received signal strength at three nodes in our testbed.
To better understand the overall threat, we experimented with a cheater that uses signal strength to differentiate among its neighbors. The cheater listens to data packets for a short period of time, measuring their signal strengths and sources. It then chooses a signal strength threshold at which to drop incoming packets. It relays packets and appears cooperative to neighbors whose packets arrive with strengths above the threshold. It drops packets below the threshold to appear to be a legitimate non-neighbor to all other nodes. (In theory, the cheater can pick an arbitrary signal strength range rather than limiting itself to the top end. But our measurements show that the degree of overlap among neighbors in the middle and bottom part of the range would preclude this behavior. Additionally, better signal strength roughly translates to better connectivity, providing an incentive to pick such neighbors.) Using this procedure, a cheater may end up cooperating with between just one and all of its legitimate neighbors. Of the nodes in Figure 10, Node 4 is forced to cooperate with all of its neighbors, Node 9 with only two of them, and Node 15 with only one of them. (Peripheral nodes that can uniquely identify a neighbor do not present a major threat as such nodes are not expected to relay packets.)
We now extend our analysis of Catch using simulation.
To assess the effectiveness of Catch, we use Average Time to Isolation (ATI) as the metric. ATI is measured in units of epochs. An ideal policy would exhibit ATI values of one for nodes that free-ride (at any rate), and infinite ATI values for those that do not.
We first evaluate Catch's robustness to two characteristics of the physical environment: packet loss and network density. To model free-riding, we use a straightforward strategy in which the free-rider drops packets randomly with fixed probability. Because the packet losses due to the wireless network are also modeled as a random process, this drop strategy is arguably difficult for our statistical tests to detect.
We would expect Catch to perform better in denser networks because larger neighborhoods are more likely to make correct statistical decisions. Figure 14 examines the impact of the number of neighbors on detection and false accusation times. We show results for a cooperative node (the top line) as well as for free-riders at drop rates from 10-50%. Increasing the number of neighbors from six to ten yields a small decrease in the time to detect free-riders, as might be expected: already at 6 neighbors there is little room for improvement. More surprisingly, reducing the number of neighbors by a factor of three, to only two, increases detection time by only a few epochs. Additionally, the rate at which cooperative nodes are falsely accused is essentially unaffected over the entire range. Thus, Catch seems to be robust, working well in both high and low density networks.
One variation is targeted free-riding, in which the free-rider drops packets from a time-varying subset of neighbors, rather than uniformly from all. This stresses the test in Catch, whereas we know that the basic free-rider is most often detected by the sign test. We call this approach ``rotation.'' A second variation attacks the isolation decision process. Since three consecutive failed epoch tests are required to isolate a node, a free-rider may attempt to escape isolation by dropping packets on, say, alternate epochs. We call this the ``on-off'' strategy. Finally, both attacks may be used at once.
As another variation of the basic free-riding model, we experimented with free-riders that drop packets in a deterministic pattern, rather than randomly. The threat here is that the reduction in variance will help free-riders avoid detection. In fact, the opposite happened: Catch was more effective.
The Detection Oracle hears all packet transmissions everywhere in the network, without loss, and so has reliable knowledge of all externally visible events. Additionally, it retains infinite history information, enabling it to apply the Catch statistical tests over this maximal pool of data. In contrast, the nodes in any real system have only imprecise information (due to losses), each one is directly aware of only a subset of the global information, and history information must be devalued due to the changing environment.
Figure 16 compares the Detection Oracle with Catch. It suggests that Catch does nearly as well as possible. The oracle's advantage exceeds a five epoch reduction in detection time only in the case of high network loss rate (50%) and relatively low (5-25%) drop rates.
Anonymous broadcast was first used as a protocol building block in the Cocaine protocol for auction between mistrustful parties (40). In a manner similar to Catch, Cocaine combines this building block with one-way hash functions. We apply this approach in a different and practical setting, and our work also hints at the generality of the building block and the approach.
Catch belongs to the class of enforcement-based mechanisms that discourage free-riding through the fear of punishment. The watchdog part of our detection mechanism was originally proposed by Marti et al. (29). It is our use of it in real networks and in conjunction with anonymity to detect misbehavior that is novel. Existing enforcement-based protocols (9,29,10) rely on reputation spreading to deal with cheating nodes. This requires global flooding, while Catch limits information spread to single-hop neighborhoods. Moreover, simple flooding requires network redundancy as selfish nodes will not forward incriminating reputation packets. Catch uses anonymity and one-way hash functions to reliably communicate with the neighbors of free-riders. Our use of one-way hash functions is similar to Hu et al.'s work on secure routing in wireless networks (20,21).
Incentive-based approaches discourage free-riding by making cooperation more attractive. Nodes accumulate virtual currency by forwarding for others, which they can then use for sending their own packets. Examples include Nuglets (11), Sprite (44) and priority forwarding (34). These schemes rely on a trusted central authority or tamper-proof hardware to ensure the integrity of the currency, and to redistribute wealth so that even nodes that are not in a position to forward for others can send their packets. In contrast, the operation of Catch is completely distributed. Incentives also fail to encourage nodes with very little data of their own to send. This can lead to a disconnected network when light-senders are located at strategic points in the topology.
Finally, game-theoretic approaches formulate the forwarding decision such that forwarding at a certain rate becomes the Nash equilibrium (18) for the network. This means that deviation from the recommended forwarding behavior can only result in situations that are worse for the deviant node. Generous Tit-for-Tat (GTFT) is an example of such an approach (39). Like GTFT, Catch relies on the mechanics of Tit-for-Tat by assuming cooperation and punishing free-riders. However, while GTFT requires knowledge about the utilities of all the nodes in the network, Catch relies only on information collected in the one-hop neighborhood of individual nodes.
We have presented Catch, a protocol to sustain cooperation in multi-hop wireless networks comprised of autonomous nodes. Catch is much more widely applicable than other proposed solutions, needing no central authority and placing no restrictions on workloads, routing protocols or node objectives. It uses novel strategies based on anonymous messages and statistical tests to detect free-riders with high likelihood and punish them with periods of isolation. Anonymous challenge messages are used to estimate true loss rates, even when dealing with untrusted and uncooperative nodes. Anonymous neighbor verification is used to compel a node to forward packets, even when the data being carried is contrary to its interests. While our application of anonymity and neighborhood watch are specific to the wireless domain, we expect that these techniques are general enough to be applicable in other domains.
We implemented Catch in Linux and performed what to our knowledge is the first evaluation of cooperative routing protocols in an 802.11 wireless testbed. We showed that Catch works well despite volatile wireless conditions and requires little bandwidth overhead (and negligible CPU overhead). In our experiments, free-riders are quickly isolated from the network (and more rapidly for more egregious drop strategies) and cooperative nodes are rarely accused of misbehaving. Simulations confirm this finding over a wide range of conditions. We quantified the impact of free-riding by showing that the presence of even a few free-riders can partition the network. In one experiment, their presence led to a 25% overall performance degradation for the cooperative nodes. We also explored the leverage of signal strength cheats, and found that even without any measure to actively thwart such cheats, Catch provides worthwhile protection. Extending Catch to defeat these strategies is part of our future work.
We thank Krishna Gummadi, Ed Lazowska, Hank Levy, Mike Swift, the anonymous reviewers, and Amin Vahdat, our shepherd, for their comments on earlier versions of this paper. Brian Youngstrom provided valuable assistance with setting up the testbed.
This work was supported in part by NSF (Grant No. ANI-0133495), Atheros Communications, and Microsoft Research.
We briefly consider each step of Catch in light of possible, intentional violations by a free-rider. Our goal is to show that a free-rider cannot defeat the protocol by manipulating messages in unanticipated ways.
Epoch-Start Each node must periodically send EpochStart messages or it is deemed uncooperative by its neighbors and is ignored.
Packet Forwarding and Accounting The testee can drop some or all of the challenges. However, because the challenges are anonymous it: cannot selectively inflate the loss rate on some of the links and has to waste its own resources if it chooses to uniformly inflate the loss rate on all links. (Section 3.1)
Anonymous Neighbor Verification Open (ANV1) The testee can drop some fraction of the ANV1 messages. However, this will be detected in a reasonably short time because of anonymity. (Section 3.2)
Tester Information Exchange The testee is unable to interfere with the exchange because it relies on all the testers to release their tokens.
Epoch Evaluation and ANV Close (ANV2) It is in the testee's interest to forward these messages since they are required for it to pass the epoch evaluation.
Isolation Decision Testers drop the free-rider's data packets to isolate it. To prevent this punishment from being circumvented, we require that some unforgeable notion of identity transmitted with data packets.
Deliberate False Accusations A different style of attack is for a tester to falsely accuse a cooperative testee and cause it to be isolated. The tester is then no longer required to relay packets for this testee. To discourage this, a cooperative testee retaliates by isolating its accuser, or all of its neighbors, if the identity of the accuser is unknown, i.e., mutually-assured-destruction.
Dropping specific data packets A free-rider can use application-level knowledge to throttle data flow if encryption is not used. For instance, it could selectively drop TCP SYN packets at a higher rate to curb data packet generation. We can detect such behavior by looking for statistical differences in the forwarding rate of such special packets.
Blocking control packets Another possibility is for a node to target specific protocol packets sent by other nodes by interfering with their transmission. This is not plausible because we send protocol packets at randomized times.
Reducing transmission power A free-rider can reduce its relaying responsibilities by reducing its transmission power. This requires the node to be topologically well-placed such that there exists a power level at which it has good connectivity to one other node and almost no connectivity to others. Catch does not counter this strategy, as we view power management to be a legitimate strategy for minimizing co-channel interference.
This paper was originally published in the
Proceedings of the 2nd Symposium on Networked Systems Design and Implementation,
May 24, 2005, Boston, MA, USA
Last changed: 2 May 2005 aw