Home About USENIX Events Membership Publications Students
Security '03 Paper    [Security '03 Technical Program]

 

High Coverage Detection of Input-Related Security Faults

Eric Larson and Todd Austin

 

Advanced Computer Architecture Laboratory

University of Michigan

Ann Arbor, MI 48105

larsone@eecs.umich.edu, austin@eecs.umich.edu

 

Abstract

Improperly bounded program inputs present a major class of program defects. In secure applications, these bugs can be exploited by malicious users, allowing them to overwrite buff­ers and execute harmful code. In this paper, we present a high coverage dynamic technique for detecting software faults caused by improperly bounded program inputs. Our approach is novel in that it retains the advantages of dynamic bug detec­tion, scope and precision; while at the same time, relaxing the requirement that the user specify the input that exposes the bug. To implement our approach, inputs are shadowed by additional state that characterize the allowed bounds of input-derived variables. Program operations and decision points may alter the shadowed state associated with input variables. Potentially hazardous program sites, such as an array refer­ences and string functions, are checked against the entire range of values that the user might specify. The approach found several bugs including two high-risk security bugs in a recent version of OpenSSH.

1.  Introduction

Bugs in software can have a devastating effect in today’s world. Computer viruses can exploit software bugs in order to run malicious code or gain access to restricted data.  A highly visible and damaging example of this type of defect includes improperly bounded checks on network data. A common example of this class of defect is the buffer overflow. Input data is obtained from the network without checking to see if it will fit within a program buffer. Malicious users can exploit this bug by overwriting stack buffers in a way that overwrites the function return address to direct control to arbitrary code.

To prevent a buffer overflow exploit, it is necessary for the program to check input data to ensure it does not exceed the bounds of any buffer it may be used to reference. However, many programs either fail to check input data or check the data incorrectly. Such cases are often hard to find. For example, the code sequence in Figure 1 contains an off-by-one error. Such an error may be difficult to find if the programmer writing the code to check the reference is not aware that the index is incre­mented before it is referenced.

 

 

 

 

 

 

5

6

unsigned int x;

int array[5];

scanf(“%d”, &x);

if (x > 4) fatal(“Index out of bounds”);

x++;

a = array[x];

Figure 1: Example of an array bounds error. This code segment will overflow the array if x is 4.

Another common source of security bugs is improper use of the string library functions in C. Since the functions provide no checking, the responsibility resides with the programmer. To complicate matters, there is little consistency on how the different string functions operate. For instance, the strcpy command always copies a null character but strncpy will not copy the null character unless one is present within the speci­fied limit. An example of a bug involving strings is shown in Figure 2. In this case, there is a check to filter out strings that are greater than 16 characters. However, the strlen com­mand does not count the null character. If the source string is exactly 16 characters (not including the null character), it will pass the check though it contains 17 characters, including the null character. As a result, the null character does not get cop­ied by the strncpy command, creating a potentially danger­ous strcpy because the source is not null terminated. This type of problem is difficult to catch during testing since it requires a source string of exactly 16 characters. Also, the bug may not manifest in an error in the output when such an input is presented; this is likely the case if the character after the temp array happens to be a null.

 

 

 

 

 

 

5

 

 

 

 

10

11

char *bad_string_copy(char *src)

{

  char *dest;

  char temp[16];

 

  if (strlen(src) > 16) return NULL;

  strncpy(temp, src, 16);

  dest = (char *) malloc(16);

  strcpy(dest, temp);

  return dest;

}

Figure 2: Fault due to improper use of string library functions.  If src has 16 characters (not including the null character), it will get copied into dest without a null character causing a problem in the subsequent strcpy function.

One approach to prevent security exploits is to add run-time support that will prevent malicious behavior. For example, a technique created by Lhee and Chapin [20] append type infor­mation to arrays and intercepts string functions to ensure the bounds of the array are not exceeded. The added run-time sup­port remains with the program after it has been deployed resulting in a performance penalty. To avoid this penalty, soft­ware teams will sometimes employ testing efforts to detect errors at design-time, thereby reducing the need for run-time checking once the software has been deployed. Often, dynamic bug detection tools are used to aid validators by find­ing bugs that do not necessarily manifest in an error in the pro­gram output. These tools use run-time knowledge of all variables including pointers and variables that reside on the heap. Dynamic techniques can find defects over a wide scope of the program including bugs that span multiple function boundaries, library functions, or even process boundaries. However, they are limited to exposing only those defects that testers can expose. For example, the bug in Figure 1 would only be detected when the input value of four is provided. Without extensive testing, dynamic techniques are best used to expose defects in common-use scenarios.

In this work, we introduce a dynamic high coverage approach to detecting security faults caused by improperly bounded inputs. Our technique possesses the scope and program knowl­edge of a run-time technique while relaxing the requirement that the validator specify a set of inputs that exposes the defect. We implement our approach by shadowing all input values (and variables derived from input) with a state variable.  State variables are introduced into the program when external inputs are read. External inputs encompass a variety of sources: command line arguments, input files, environment variables, and network data.

Integers are shadowed by an interval constraint variable that stores the lower and upper bounds of the range of values that the given variable may hold. They have an initial value indi­cating the input is unbounded with maximum range that can be represented by the data type of the variable.  During execution, control tests and operators may narrow input interval con­straints. Finally, at potentially dangerous uses of inputs, such as array references and trusted system calls, the entire range of an input value is validated using the computed interval con­straint. As a result, all input-related faults are exposed for a given control path, even if the user-specified input did not directly expose the fault.

In Figure 3, we show how our technique can find the off-by-one error in the code segment from the example in Figure 1. In a conventional dynamic bug detection implementation, an error will not be detected unless x is four. In our correctness model, we improve defect coverage by extending input values with interval constraints. When the value is first read from input, it is given a range to span all possible values. At the con­trol points (after the if statement in the example), the interval constraint can be narrowed because the value of x is now known to be <= 4. When the value of x is incremented, the interval is adjusted up by one on both ends. When the array access occurs, the interval of x is compared to the size of the array. Even though x has the legal value of two, an error is flagged since it is possible for the input to be five, which exceeds the bounds of the array.

 

Code Segment

Value of x

Interval constraint on x

unsigned int x;

int array[5];

scanf(“%d”, &x);

if (x > 4) fatal(“Index out of bounds”);

x++;

a = array[x];

 

 

2

2

2

3

 

 

0 <= x <= INF 

0 <= x <= INF

0 <= x <= 4

1 <= x <= 5 --> ERROR!

Figure 3: Detecting an array bounds error. The error is detected even though the input value of 2 does not directly cause an error.

Input strings are shadowed by state variables that hold the maximum possible size of the string and a flag that indicates if the string is known to contain a null character. Like integers, strings from external input sources are considered to have an unbounded maximum size. Control predicates that test the length of an input string can decrease the maximum string length. The null flag is initially set on input strings and is ini­tially clear on uninitialized arrays. In order to set the null flag, a string must be copied in a manner that guarantees that null is set. String functions are checked to ensure that all strings passed as a parameter are null terminated and there is suffi­cient room in the destination string for copy operations. Our approach will verify these functions for all possible string lengths up to the maximum size making it unnecessary for a test to have the exact string length that can trigger an error.

The string example is revisited in Figure 4. Assume that the input to the function is a null terminated string consisting of 8 characters taken directly from input. Upon entry to the function, its maximum size will be unbounded. Though the user entered an 8 character string, it could have an entered a string of any length. Since the string has 8 characters it passes the check but its maximum size is reduced to 17. (We count the null character in our definition of string size.) In the strncpy function, the maximum size of 17 can exceed the available destination size limit of 16. Consequently, there is no guarantee that the null character is copied, and the null flag remains off for the array temp. This leads to an error when temp is used in the strcpy function since it may not be null terminated. Even though the input string of 8 characters does not expose the error, our approach still detects it because all possible string lengths are verified. It is also worth noting that our approach would detect an error if the string src were directly copied into dest in the strcpy instead of using the temp array. In this case, the error would get signalled because the maximum size of the source (17 characters) is greater than the destination (16 characters).

 

Code Segment

State for src

State for temp

State for dest

char *bad_string_copy(char *src)

{

  char *dest;

  char temp[16];

 

  if (strlen(src) > 16) return NULL;

  strncpy(temp, src, 16);

  dest = (char *) malloc(16);

  strcpy(dest, temp);

  return dest;

}

max_sz: , known_null: T

 

 

 

 

max_sz: 17, known_null: T

 

 

 

max_sz: 16, known_null: F

 

 

max_sz: 16, known_null: F

 

 

 

 

 

 

 

max_sz: 16, known_null: F

ERROR (temp may not be null terminated)

Figure 4: Detecting a string copy error. The error is detected because it is possible for the strcpy function to execute with a source that is not null terminated.

Our approach is generic and can be applied to all programs. We have applied it to several programs and found a number of bugs including two major security bugs in a recent release of OpenSSH. It is portable and does not require any modifica­tions to the source code.  Unlike techniques that are designed to prevent malicious behavior while running, our technique is intended to be used to find faults before software is released. Given the degree of our analysis, the run-time performance impact of our technique is fairly high, making our approach only appropriate for the testing phase of development. Another limitation of our approach is that defect detection is control path sensitive. Good testing, however, can mitigate this effect by covering all of the interesting paths of a program.

The remainder of the paper is organized as follows. The next section describes our method for detecting input-related faults. Section 3 describes our methodology and our dynamic bug detection tool. Section 4 shows the types of bugs we have  found using our approach and compares the run-time perfor­mance to an uninstrumented version of the program. Section 5 outlines related work, and Section 6 gives conclusions.

2. High coverage detection of software faults

Our high coverage scheme for detecting software faults cen­ters on verifying all possible input values at dangerous points. At array references, the entire range of possible index values are checked to ensure the array bounds are not exceeded. This procedure is described in Section 2.1. Unsafe string functions such as strncpy are checked by assuming that input strings can have arbitrary length initially. String operations are checked to ensure there is sufficient room to store the largest possible string. Section 2.2 details our technique. Section 2.3 lists other situations we detect that lead to software faults.

2.1 Detecting dangerous array references

In order for an array reference to be considered safe, the index must be checked to determine if it is possible to exceed the bounds of the array. This is accomplished by attaching interval constraint information be to every variable that contains pro­gram input. To keep the maintenance of bounds information manageable, we identify at run-time which variables contain program input and only attach interval constraints to these variables. We start with a model of tainted data that is similar to that used in [1] and is summarized in Figure 5. Data that comes from unbounded input is considered tainted. Examples of unbounded input include environment variables, command line inputs, data read from files, and network packets. Tainted data is shadowed with interval constraint variables that track the lower and upper bounds for the variable. When an access to an array occurs, the bounds of the array index are compared with the run-time size of the referenced array. An error is declared if there is an index that can exceed the bounds of the array. When a variable is assigned a value that is not depen­dent on input (untainted), the destination variable is reset to the untainted state. Since only tainted variables need interval constraints, any variable transition from tainted to untainted will release shadow state.

 

Figure 5: Program states for integer variables.

Since arrays may only be indexed by an integer, only variables with integer type (char, int, unsigned int, etc.) can become tainted. When an integer variable becomes tainted, it is assigned upper and lower bounds based on the precision of the type. For unsigned integers, the lower bound is zero. Oth­erwise, it is the most negative value the variable can hold, based on type. Similarly, the upper bound is the largest possi­ble value.

As tainted variables are operated upon or tested with control predicates, their interval constraints must be adjusted accord­ingly. Table 1 shows a list of representative operations and their effect on the upper and lower bounds of an interval con­straint. In the table, ticked variables a’, x’, and y’ refer to tainted variables while y represents an untainted variable. The notation x’.lb represents the lower bounds of tainted variable x’. The expressions MIN_VAL(a) and MAX_VAL(a) refer to the minimum and maximum values that a can have based on its type precision.

Table 1: Sample integer rules and their effect on the bounds. x’, y’, and a’ are tainted and y is untainted.

Rule

Operation

Input Interval Constraint

1

a’ = x’

a’.lb = max(MIN_VAL(a’), x’.lb)

a’.ub = min(MAX_VAL(a’), x’.ub)

2

a’ = x’ + y

a’.lb = max(MIN_VAL(a’), x’.lb + y)

a’.ub = min(MAX_VAL(a’), x’.ub + y)

3

a’ = x’ + y’

a’.lb = max(MIN_VAL(a’), x’.lb + y’.lb)

a’.ub = min(MAX_VAL(a’), x’.ub + y’.ub)

4

a’ = x’ % y’

a’.lb = 0, a’.ub = max(abs(y’.lb), abs(y’.ub))

5

if (x’ < y)

if (x’ < y):  x’.lb = x’.lb,         x’.ub = min(x’.ub, y - 1)

else:         x’.lb = max(x’.lb, y), x’.ub = x’.ub

6

if (x’ < y’)

if (x’ < y’): x’.lb = x’.lb, x’.ub = min(x’.ub, y’.ub - 1)

              y’.lb = max(y’.lb, x’.lb + 1), y’.ub = y’.ub

else:         x’.lb = max(x’.lb, y’.lb), x’.ub = x’.ub

              y’.lb = y’.lb, y’.ub = min(y’.ub, x’.ub)

7

if (x’ == y)

if (x’ == y):         x’.lb = y,     x’.ub = y

else if (x’.lb == y): x’.lb = y + 1, x’.ub = x’.ub

else if (x’.ub == y): x’.lb = x’.lb, x’.ub = y - 1

else:                 x’.lb = x’.lb, x’.ub = x’.ub

8

if (x’ == y’)

if (x’ == y’):  x’.lb = y’.lb = max(x’.lb, y’.lb)

                x’.ub = y’.ub = min(x’.ub, x’.lb)

else:           x’.lb = x’.lb, x’.ub = x’.ub

                y’.lb = y’.lb, y’.ub = y’.ub

9

while (x’ < y)

in loop:      x’.lb = x’.lb,         x.ub = min(x’.ub, y - 1)

after loop:   x’.lb = max(x’.lb, y), x.ub = x’.ub

For simple assignment operations (Rule 1), the bounds are copied into the assigned value in most cases. However, it may be necessary to restrict the bounds if the size of the destination type is smaller than the size of the source type. This may also occur when assigning a signed value into an unsigned value. State propagations are also required when integers are passed at function calls, returned from functions, assigned within structures, or when copied by system functions such as mem­cpy.

Addition and other arithmetic operations adjust the bounds of the destination variable. In the first addition pattern (Rule 2), a tainted value is added to an untainted value. The bounds of the destination variable are computed by adding the run-time value of the untainted variable to the bounds of the tainted variable. If both variables are tainted (Rule 3), the bounds are added together to form the new worst-case bounds. Rule 4 sin­gles out the modulus operator because the new range is strictly dependent on the value of the second operand. We also detect overflow situations; this is mentioned in more detail in Section 2.3.

Rules 5-9 narrow the input interval constraint based on knowl­edge gained from control predicates. In Rule 5, if the if con­dition is true, the upper bound is reduced to y-1 unless the existing upper bound is already lower than y-1. If the condi­tion is false, the lower bound must be at least y. Rule 6 refers to a situation where two tainted variables are compared to one another. If x’ < y’ is true, then no change is necessary to the lower bound of x’ and the upper bound of y’. The upper bound of x’ must be at least one less than the upper bound of y’ in order to make the equality true. x’ also cannot exceed its own upper bound. Similarly, the lower bound of y’ is the max­imum of the lower bound of x’+1 and the lower bound of y’ before the statement.

 

The equality test to an untainted variable (Rule 7) will set both bounds to y if the tainted value is indeed equal to the value. If the tainted value x’ is not equal to y, there is no change unless y happens to equal one of the bounds. In this case, the bound is adjusted accordingly. While in this case it would be possible to split the interval, this is not necessary as only the lower and upper bounds are needed to validate array accesses.  In Rule 8, two tainted variables are compared for equality.  If they are equal, each variable will have an identical new range that is formed and by taking the highest lower bound and lowest upper bound.  If they are not equal, no change is made[1].

The effect of a while loop comparison is shown in Rule 9. When the body of the loop is entered, the condition x’ < y is true and the bounds are updated appropriately. Upon exiting the loop, the bounds are updated to reflect that the condition is now false. for loops and do loops are handled in the same manner except that the bounds are not updated during the first pass in a do loop since the condition is not tested until the end of the loop. The case where two tainted values are compared as a loop condition is analogous to the if statement.

Notable omissions from the list are the logical-or (||) and log­ical-and (&&) operators. A simplification phase, discussed in Section 3, converts these short-circuited operators into the appropriate if-then-else constructs.

To perform interval bounds checks, it is necessary to keep track of the sizes of all arrays in the program. The size of glo­bally and locally declared arrays are known at compile-time and are straightforward to process. Dynamic variables pose an interesting challenge in our target language C. Since all dynamic memory allocations are considered to be untyped, we consider all dynamic allocations to be a single array with a size equal to that of the memory allocation.

2.1.1 Array reference example

To illustrate our approach, we will describe a bug that was dis­covered in OpenSSH. It occurred in the channel code (chan­nel.c). The relevant code is shown in Figure 6. In function channels_new, the channels array is a dynamically growing array with a size equal to channels_alloc. The starting size of the array is 10.

 

 

 

 

 

5

 

 

 

 

10

 

 

 

 

15

 

 

 

 

20

 

 

 

 

25

 

 

 

 

30

 

 

 

 

35

 

 

 

 

40

 

 

 

 

45

 

 

 

 

50

 

52

/* Pointer to an array containing all allocated channels. The

 * array is dynamically extended as needed. */

static Channel **channels = NULL;

 

/* Size of the channel array. */

static int channels_alloc = 0;

 

Channel *

channel_new(...)

{

  int i, found;

  Channel *c;

 

  /* Do initial allocation if this is the first call. */

  if (channels_alloc == 0) {

    channels_alloc = 10;

    channels = malloc(channels_alloc * sizeof(Channel *));

    ...

  }

  ...

  if (found == -1) {

    channels_alloc += 10;

    channels = realloc(channels, channels_alloc * sizeof(Channel *));

    ...

  }

  ...

}

 

void

channel_input_data(int type, int plen, void *ctxt)

{

  int id;

  Channel *c;