Evaluation Strategy and Trace Selection

Table 1: The Scalable Performance of Cluster Bro on the 500x multiplied HTTP trace

Number of Sensor Nodes	Maximum Processing Rate	Speedup
Standalone Bro	3600 pps
2 Sensors	6800 pps	1.8 $\times$
4 Sensors	14400 pps	4.0 $\times$
8 Sensors	35400 pps	9.8 $\times$

One significant problem in evaluating intrusion detection systems is evaluating on representative traces. Especially for systems which perform deep packet inspection, traces can be highly sensitive and can't easily be exported even to a different secure facility due to privacy and confidentiality concerns.

Since our work is attempting to find bottlenecks and performance artifacts, rather than comparing between multiple intrusion detections or site-specific evaluation, we can sometimes use synthetic traces. To create a synthetic trace, we begin with a small trace which triggers one or more known alerts, allowing us to verify correctness.

We then amplify this trace using a small Click module. The amplification consists of duplicating the packet stream by an amplification factor. Each new packet stream has the SRC and DST addresses hashed to new values. The resulting streams are interleaved, but on a slightly staggered basis to keep the streams out of phase, and written to a new file.

For performing our tests itself, we modified our Click load balancer to read from an amplified trace file. This reading uses a programmable rate limit expressed in packets per second. By creating a smooth packet flow, we eliminate small buffering errors and packet bursts from affecting end-hosts, causing this test to focus solely on system resources used to process the traffic.

We also created some custom shell scripts to manage the cluster, launch experiments, and collect results.

This technique is suitable for discovering and evaluating bottlenecks in our system, but it is NOT sufficient to know whether our system will scale without bottlenecks on real traffic. Without access to real traffic traces, with full packets, we can only state that we have removed known bottlenecks, not that our system will be without bottlenecks when deployed in the real world.

Our initial testing for this work used an HTTP trace. This trace, collected on a user's local system, is 1215 packets and 653930 bytes. It is composed almost entirely of HTTP sessions, which can be particularly stressful for Bro to analyze. This trace included two ``attacks'', suspicious URL requests for ../../../../etc/motd and ../../../etc/passwd and several benign requests. Thus we can easily check to see if the trace was correctly processed: count the total number of sensitive URL alarms and make sure it matches the amplification factor. If Bro had dropped packets, this adds to the alarm count through content-gap errors. Thus we can quickly verify that Bro correctly processed the packet stream.

The cluster itself used one manager node, one proxy node, and a variable number of sensor nodes. Additionally, we performed tests using a single Bro sensor without the cluster. We used an almost-complete set of Bro analyzers for the cluster, including scan detection, dynamic protocol detection, and HTTP request and reply analysis. Our particular trace highly stresses the HTTP reply analyzer, as removing this analyzer improves Bro's performance by over a factor of 2.