Multiple repositories

Next: Supported Analyses Up: Alert Sanitization Previous: Basic privacy protection

Multiple repositories

We now describe a ``heavy-duty'' solution for the corrupt repository problem. Instead of using a single alert repository, envision multiple repositories, operated by different owners and distributed throughout the Internet (e.g., open-source code for setting up a repository may be made available to anyone who wishes to operate one). We do not require the repositories to synchronize their alert datasets, so the additional complexity is low. Information about available repositories is compiled into a periodically published list. An organization that wants to take advantage of the alert sharing infrastructure chooses one or more repositories in any way it sees fit -- randomly, on the basis of previously established trust, or using a reputation mechanism such as [9,12].

In this setting, it is insufficient for the attacker to gain control of just one repository to launch a probe-response attack because the victim may report his alert to a different repository. The costs for the attacker increase linearly with the number of repositories. The costs for alert producers do not increase at all, since the amount of processing per alert does not depend on the number of repositories.

While spreading alerts over several repositories decreases opportunities for collaborative analysis, real-time detection of high-volume events is still feasible. If multiple systems are under simultaneous attack, chances are their alerts will be reported to different repositories in sufficient numbers to pass the ``hot list'' threshold and trigger publication. By monitoring a sufficiently large subset of the repositories for simultaneous spikes of similar alerts, it will be possible to detect an attack in progress and adopt an appropriate defensive posture. Repositories may also engage in periodic or on-demand exchanges of significant perturbations in incoming alert patterns. This could further help build an aggregate detection capability, especially as the number of would-be repositories grows large.

Randomized alert routing. For better privacy, we propose to deploy an overlay protocol for randomized peer-to-peer routing of alerts in the spirit of Crowds [27] or Onion routing [33]. Each alert producer and repository sets up a simple alert router outside its firewall. The routers form a network. When a batch of alerts is ready for release, the producer chooses one of the other routers at random and sends the batch to it. After receiving the alerts, a router flips a biased coin and, with probability (a parameter of the system), forwards the alert to the next randomly selected router, or, with probability , deposits it into a randomly selected repository. The alert producer may also specify the desired repository as part of the alert batch.

Such a network is very simple to set up since, in contrast to full-blown anonymous communication systems such as Onion routing, there is no need to establish return paths or permanent channels. The routers don't need to maintain per-alert state or use any cryptography. All they need to do is randomly forward all received alerts and periodically update the table with the addresses of other routers in the network.

When an alert enters the network, all origin data is lost after the first hop. Even if the attacker controls some of the routers and repositories, he cannot be sure whether an alert has been generated by its apparent source or routed on behalf of another producer. This provides probabilistic anonymity for alert sources which is quantified below. The disadvantage is the communication overhead and increased latency for alerts before they arrive to the repository (note that there is no cryptographic overhead).

Anonymity estimates. To quantify the anonymity that alert contributors will enjoy if the repositories and producers form a randomized alert routing network, we compute the increase in attacker workload as a function of the average routing path length. If is the probability of forwarding at each hop, then the average path length $m=2+\frac{p}{1-p}$ . Reversing the equation, the forwarding probability must be equal to $\frac{m-2}{m-1}$ to achieve the average path length of .

Suppose the network contains routers, of which are controlled by the attacker. The probability that a random path contains a router controlled by the attacker is $\frac{c(n-np+cp+p)}{n^2-np(n-c)}$ [27]. For large , this value is close to $\frac{c}{n}$ , which means that almost $1-\frac{c}{n}$ alerts will not be observed by the attacker and thus remain completely anonymous.

For each of the $\frac{c}{n}$ alerts that are observed by the attacker, the probability that its apparent source (the site from which an attacker-controlled router has received it) is the actual source can be calculated as $\frac{n-p(n-c-1)}{n}$ [27]. We interpret the inverse of this probability as the attacker workload. For example, if there is only a 25% chance that the observed alert was produced by its apparent source, the attacker needs to perform 4 times the testing to determine whether the apparent source is the true origin. As expected, higher values of forwarding probability provide better anonymity at the cost of increased latency (modeled as increase in the average number of hops an alert has to travel before arriving to the repository). This relationship is plotted (assuming routers) in figure 3.

**Figure 3:** Estimated anonymity provided by randomized alert routing.
[width=.99]workload.eps

Next: Supported Analyses Up: Alert Sanitization Previous: Basic privacy protection

Vitaly Shmatikov 2004-05-18