Check out the new USENIX Web site. next up previous
Next: Alert Sanitization Up: Privacy-Preserving Sharing and Correlation Previous: Potential attacks


Alert Sharing Infrastructure

To enable open collaborative analysis of security alerts and real-time attack detection, we propose to establish alert repositories which will receive alerts from many sensors, some of them public and located at visible network nodes and other hidden on corporate networks deep behind firewalls. Achieving this requires a robust architecture for information dissemination, ideally with no single point of failure (to provide higher reliability in the face of random faults and outages), no single point of trust (to provide stronger privacy guarantees against insider misuse in any one organization), and few if any leverage points for attackers.

The core of the proposed system is a set of repositories where alerts are stored and accessed during analysis. Each repository is very simple: it accepts alerts from anywhere, strips out source information, and publishes them immediately or after some delay. There is no cryptographic processing and no key management (unless the repository performs re-keying -- see section 6.2). As described in section 6.3, multiple repositories make it more difficult for the attacker to infer the source of sanitized alerts. The repositories may share alerts, but they are not required to be synchronized, thus not every alert will be visible to every analysis engine. For performance reasons, analysis engines normally interact with a single repository or mirror site.

Figure 1 shows the major data flows among a small set of sensors, producers, repositories, and analysis engines. The sensor trapezoids consist of firewalls, intrusion detection systems, antivirus software, and possibly other security alert generators. The producer boxes represent local collection points for an enterprise or part of an enterprise. These boxes perform the sanitization steps such as hashing IP addresses, and are controlled by the reporting organization. The repository cylinders represent public or semi-public databases containing reported data. A repository may be controlled by a producer or by an analysis organization. The analysis diamonds represent analysis services which process the published alerts for historical trends, event frequency changes, and other aggregation or correlation functions.

Figure 1: Data flows in alert processing.
[width=.9]arch-slide2.eps

An enterprise (such as a major research lab famed for computer security research) may be sensitive to public disclosure of possible attacks, and wish to keep private even the volume of alerts it generates. As described in section 6.3, the repositories can optionally form a randomized alert routing network. Although we have not implemented this feature, randomized routing can provide strong anonymity guarantees for alert sources. A repository may also be configured so that only events whose volume exceeds a certain threshold are published. This will have relatively little impact on historical and inflection analysis (see section 7), but may disable identification of stealth attacks associated with low alert volumes.

As shown in figure 2, sensors vary greatly in the volume of alerts they produce in a given day, but the total alert volume is substantial. This graph depicts the number of alerts produced on a single day by 1,416 sensors reporting to DShield. At the high end, over 7 million alerts were produced by one firewall, apparently experiencing a certain DoS-like attack. Several other sensors were near or above a million alerts. The median sensor produced only 177 alerts.

Figure 2: Alert volume per sensor (semi log scale). Data courtesy DShield.
[width=.9]alert-graph.eps

The total alert volume of 19,147,322 alerts reported on that day, across a total of 1,416 different sensors from many organizations spread over a wide geographic area, constrains practical implementation choices. In particular, secure multiparty computation (SMC) approaches (see section 2.3), and many privacy-preserving data mining techniques add impractical levels of overhead to alert analysis. With over a thousand reporting sensors, naive SMC approaches would require tremendous network bandwidth and unsupportable CPU or cryptographic coprocessor performance for even moderate levels of analysis query traffic. It is possible that special-purpose SMC schemes developed specifically for this problem would prove more practical. In this paper, we propose simple solutions which enable a broad set of analyses on sanitized alerts that would normally require raw alert data.


next up previous
Next: Alert Sanitization Up: Privacy-Preserving Sharing and Correlation Previous: Potential attacks
Vitaly Shmatikov 2004-05-18