Next: Implementation Up: Automated Response Using System-Call Previous: Background

pH Design

pH performs two important functions: It monitors individual processes at the system-call level, and it automatically responds to anomalous behavior by either slowing down or aborting system calls. Normal behavior is determined by the currently running binary program; response, however, is determined on a per-process basis.

To minimize I/O requirements and maximize efficiency, stability, and security, we have implemented most of pH in kernel space. We considered several alternative approaches, including audit packages, system-call tracing utilities (such as strace), and instrumented libraries. However, each of these other approaches has serious drawbacks. Audit packages generate voluminous logfiles, which are expensive to create and even more expensive to analyze. Additionally, they do not routinely record every system call. User-space tracing utilities are too slow for our application, and in some cases, they interfere with privileged daemons to the extent that they behave incorrectly. Instrumented libraries cannot detect every system call, because not every system call comes through a library function (e.g., buffer overflow attacks). In addition, a kernel implementation allows us to put our monitoring and response mechanisms exactly where they are needed, in the system call dispatcher, and allows the implementation to be as secure as the kernel.

For each running executable, pH maintains two arrays of pair data: A training array and a testing array. The training array is continuously updated with new pairs as they appear; the testing array is used to detect anomalies, and is never modified except by replacing it with a copy of the training array. Put another way, the testing array is the current normal profile for a program, while the training array is a candidate future normal profile.

A new ``normal'' is installed by replacing the testing array with the current state of the training array. The replacement occurs under three conditions: (1) the user explicitly signals via a special system call (sys_pH) that a profile's training data is valid; (2) the profile anomaly count exceeds the parameter $anomaly\_limit$ ; (3) the training formula is satisfied. When an anomaly is detected, the current system call is delayed according to a simple formula. Details of these conditions and actions are given in the next several paragraphs.

The training to testing copy can occur automatically based on the state of the following training statistics:

$\begin{displaymath} \begin{array}{rl} \\ train\_count: & \mbox{\char93 calls s... ...l\_count = & train\_count - last\_mod\_count \\ \\ \end{array}\end{displaymath}$

When the training array meets all of the following conditions, it is copied onto the testing array (note: this is the normal mechanism for initiating anomaly detection in the system):

$\begin{displaymath} \begin{array}{rcl} \\ last\_mod\_count & > & mod\_minimum... ..._count}{normal\_count} & > & normal\_ratio \\ \\ \end{array}\end{displaymath}$

The three parameters on the right are user defined, and can be set at runtime.

As we mentioned earlier, pH responds to anomalies by delaying system call execution. The amount of delay is an exponential function of the current LFC, regardless of whether the current call is anomalous or not. The unscaled delay for a system call is $d = 2^{\mbox{\tiny LFC}}$ . The effective delay for a system call is $d \times delay\_factor$ , where $delay\_factor$ is another user-defined parameter. Note that delays may be disabled by setting $delay\_factor$ to 0. If the LFC ever exceeds the $tolerization\_limit$ parameter (which is 12 for the experiments described below), the training array is reset, preventing truly anomalous behavior from being incorporated into the testing array.

Because pH monitors process behavior based on the executable that is currently running, the execve system call causes a new profile to be loaded. Thus, if an attacker were able to subvert a process and cause it to make an execve call, pH might be tricked into treating the current process as normal, based on the data for the newly-loaded executable. To avoid this possibility the maximum LFC count (maxLFC) for a process is recorded. If maxLFC exceeds the $abort\_execve$ threshold, then all execve's are aborted for the anomalous process.

pH also keeps a count of the raw number of anomalies each profile has seen. This count can be seen as a measure of ongoing, non-clustered abnormal behavior. If this number exceeds the parameter $anomaly\_limit$ , pH automatically copies the training array to the testing array, causing pH to treat similar future behavior as normal. Borrowing from immunology, we refer to this process as tolerization. Low values of $anomaly\_limit$ allow pH to automatically tolerize most novel behavior, while higher values inhibit tolerization. When a system is initially set up, automatically-created normal profiles may contain too little normal behavior. To reduce the number of reported anomalies, $anomaly\_limit$ should be set to a small value (less than 10). Then, once the system has stabilized, $anomaly\_limit$ should be set to at least 20 to prevent pH from automatically learning the behavior of attacks.

Next: Implementation Up: Automated Response Using System-Call Previous: Background

Anil B. Somayaji 2000-06-14