Check out the new USENIX Web site. next up previous
Next: Conclusions Up: Privacy-Preserving Sharing and Correlation Previous: Event-driven analyses

Performance

As illustrated in figure 2, large volumes of alert data are being generated, and alert production among members of the contributor pool can vary greatly. Security services can produce inundations of security alerts when they are the target of a denial of service attack, and when there is a widespread outbreak of virulent worm or virus. During such periods of significant stress, alert production and processing can pose significant burden on sensors, repositories, and analysts, and thus limit utility of the alerts. This is a significant motivator for work on alert reduction methods [35,10], and places constraints on the acceptable costs of alert sanitization.

As we show below, the cost of providing privacy to alert producers in our scheme is very low: there is a small impact on the performance of alert producers, and virtually no impact on the performance of supported analyses (of course, some analyses are disabled due to data sanitization). We argue that our scheme provides a sensible three-way tradeoff between utility of alert analysis, performance of the alert sharing infrastructure, and privacy of alert producers.

Performance of alert producers. To understand the CPU impact of alert sanitization, we benchmarked IP hashing on large alert corpuses under the scheme proposed in section 6.2, using SHA-1 on external IP addresses (primarily Source_IP ), and HMAC on internal IP addresses (primarily Dest_IP ).

The experiment was conducted on a FreeBSD 1.4Ghz Intel Pentium III workstation using Mark Shellor's free software implementation of SHA and HMAC. 4 We employed two large alert repositories. One repository, produced from our laboratory firewall, consisted of 4,224,122 records collected over a three hour period during an intense exposure to the Kuang 2 virus [16]. The second repository consisted of 19,146,346 records collected over a 24 hour period by DShield.


Table 5: CPU Impact of IP Hashing (seconds per 1 million alerts).
    baseline hashed delta cached-8 delta
DShield.org 29.81 64.16 34.35 56.84 27.02
Laboratory 75.80 110.34 34.54 106.20 30.40


Table 5 presents the results of the IP address hiding scheme on the DShield and laboratory alert corpuses, reported in CPU seconds per million records. The baseline represents the amount of seconds, in CPU time, required to read the alerts from secondary storage per 1 million records. The hashed and cached-8 times indicate the amount of CPU seconds required to apply SHA and HMAC hashing to the Source_IP and Dest_IP fields per 1 million records. The delta column represents the difference between the baseline alert reporting performance and the sanitized alert reporting performance.

Cached-8 represents a moderately optimized implementation with a very small cache holding the last 8 encountered IP addresses. Because our sanitization scheme is deterministic, we can use the previously hashed IP addresses from the cache. Caching makes sense in two cases:

For the IP addresses not in the trusted domain (to which SHA is applied), caching achieved savings of about 65%.

The results reveal that the performance impact is modest, less than the cost of I/O in our implementation. For a sensor producing 1 million alerts per hour, the additional hashing expense is roughly 30 seconds of CPU time per hour. This overhead should be considered in the context of the much larger task of alert caching and periodic batched transmission to a remote alert repository. Key management is relatively cheap in our case: there is no need for PKI and keys are never distributed outside the producer's site.

The expected cost of randomized routing to anonymize alert sources depends on the parameters of the routing network such as the forwarding probability and is roughly linear in the number of hops. There is no cryptographic processing and alert routers are stateless (see section 6.3).

Performance of analysis. To achieve the balance between privacy and utility, our sanitization methods have been designed to have minimal or no effect on the performance of primary analyses. In particular, sanitized IP addresses are mapped into the same size record as the original IP addresses, and cross-alert comparisons can be carried out at the repository without any network interaction. Comparing hashes for equality takes the same time as comparing IP addresses, so there is zero impact on performance.

When a troublesome source IP address is identified, this information may need to be propagated back to the producer (this is infeasible in the randomized-routing setting due to the high overhead of maintaining a return path for each alert). The producer may opt to reveal the actual IP address of the offender. In the case of a widespread attack, many sensors may complain about a single IP address, and any of the victims may choose to reveal the source of the threat, to enable defensive filters to be tuned appropriately. Measuring the costs of such selective revelation is beyond the scope of this paper.


next up previous
Next: Conclusions Up: Privacy-Preserving Sharing and Correlation Previous: Event-driven analyses
Vitaly Shmatikov 2004-05-18