Background

Both the monitoring and the response components of pH use ideas introduced in [16]. What follows is a description of our original testing methodology, with which we gathered on-line data for off-line analysis. Subsequent sections explain how these techniques were modified to create pH.

To review, we monitored all the system calls (without arguments) made by an executing program on a per-process basis. That is, each time a process was invoked, we began a new trace, logging all the system calls for that process. Thus, for every process the trace consists of an ordered list (a time-series) of the system calls it made during its execution. For commonly executed programs, especially those that run with privilege, we collected such traces over many invocations of the program, when it was behaving normally. We then used the collection of all such traces (for one program) to develop an empirical model of its normal behavior.

Once the system had been trained on a sufficient number of normal program executions, the model was tested on subsequent invocations of the program. The hope was that the model would recognize most normal behavior as ``normal'' and most attacks as ``abnormal.'' Our method thus falls into the category of anomaly intrusion detection.

Given a collection of system call traces, how do we use them to construct a model? This is an active area of research in the field of machine learning, and there are literally hundreds of good methods available to choose from, including hidden Markov models, decision trees, neural networks, and a variety of methods based on deterministic finite automata (DFAs). We chose the simplest method we could think of within the following constraints. First, the method must be suitable for on-line training and testing. That is, we must be able to construct the model ``on the fly'' in one pass over the data, and both training and testing must be efficient enough to be performed in real-time. Next, the method must be suitable for large alphabet sizes. Our alphabet consists of all the different system calls--typically about 200 for UNIX systems. Finally, the method must create models that are sensitive to common forms of intrusion. Traces of intrusions are often 99% the same as normal traces, with very small, temporally clumped deviations from normal behavior. In the following, we describe a simple method, which we call ``time-delay embedding'' [16]. Warrender [38] compared time-delay embedding with several other common machine learning algorithms and discovered that it is remarkably accurate and efficient in this domain.

We define normal behavior in terms of short

-grams of system calls. Conceptually, we define a small fixed size window and ``slide'' it over each trace, recording which calls precede the current call within the sliding window. The current call and a call at a fixed preceding window position form a ``pair,'' with the contents of a window of length

being represented by

pairs. The collection of unique pairs over all the traces for a single program constitutes our model of normal behavior for the program.¹

$\begin{displaymath} \begin{array}{rcl} S & = & \mbox{alphabet of possible system... ...t_p = s_i, \\ & t_{p+k} = s_j,\\ \end{array}\\ \end{array}\end{displaymath}$

position 3	position 2	position 1	current
			execve
		execve	brk
	execve	brk	open
execve	brk	open	fstat
brk	open	fstat	mmap
open	fstat	mmap	close
fstat	mmap	close	open
mmap	close	open	mmap
close	open	mmap	munmap

current	position 1	position 2	position 3
execve
brk	execve
open	brk, close	execve, mmap	fstat
fstat	open	brk	execve
mmap	fstat, open	open, close	brk, mmap
close	mmap	fstat	open
munmap	mmap	open	close

This table can be stored using a fixed-size bit array. If $\vert S\vert$ is the size of the alphabet (number of different possible system calls) and

is the window size, then we can store the complete model in a bit array of size: $\vert S\vert \times \vert S\vert \times (w - 1)$ . Because

is small (6 is our standard default), our current implementation uses a $200 \times 200$ byte array, with masks to access the individual bits.

At testing time, system call pairs from test traces are compared against those in the normal profile. Any system call pair (the current call and a preceding call within the current window) not present in the normal profile is called a mismatch. Any individual mismatch could indicate anomalous behavior (a true positive), or it could be a sequence that was not included in the normal training data (a false positive). The current system call is defined as anomalous if there are any mismatches within its window.

To date, all of the intrusions we have studied produce anomalous sequences in temporally local clusters. To facilitate the detection of these clusters, we record recent anomalous system calls in a fixed-size circular array, which we refer to as a locality frame. More precisely, let

be the size of our locality frame, and let

be the

-th entry of the locality frame array, with $0 \leq i < n$ and $A_i \in \{0,1\}$ . Then, for system call

( $0 \leq s < \tau$ ) with mismatches

, $A_{s \mbox{{\tiny ~mod~}} n} = 1$ iff

, and is 0 otherwise. Thus, the locality frame implicitly stores the number of the past

system calls which were anomalous. We call this total of recent anomalies, $\sum A_i$ , the locality frame count (LFC).² For the experiments described below, we used a locality frame of size 128.