Check out the new USENIX Web site. next up previous
Next: Prototype Implementation Up: Binary Instrumentation Previous: Where to Insert RAD

How to Insert RAD Code

So as to not disturb the original binary's address space, we choose to create a separate new code section, not present in the original PE binary (information regarding the PE format is in [25]), appended to the end of the original binary to hold the additional prologue and epilogue code for each function. Moreover this new section, mapped to a non-interfering portion of the address space, will be set as read-only. Thus neither the RAD code is corrupted by the application nor is the application corrupted by the RAD code. To redirect control to the inserted code at a function's prologue and epilogue, we need to replace some instructions at the function prologue and epilogue with a JMP to the corresponding RAD code. When such an instrumented function is invoked, the JMP instruction, which replaces the prologue, transfers control first to the RAD prologue code, then executes the original prologue instructions and then jumps back to the original function to continue execution from the instruction immediately after the original function prologue. Epilogue instructions are replaced in a similar manner. However, the execution proceeds first with a JMP to the epilogue code in the new section, first executing the original epilogue instructions until the RET, then the RAD epilogue checking code and then return if there are no problems. Because the size of an unconditional JMP instruction is 5 bytes, we need at least 5 bytes worth of instruction space to accommodate a JMP instruction. Instructions that are target of existing branch instructions cannot be replaced. A function prologue, which needs to allocate stack space for local variables, typically comprises 3 instructions :
  1.      
        push ebp  // save old frame ptr 
                  // (1 byte instruction)
    
  2.      
        mov ebp, esp // set the top of 
                     // the stack as the
                     // current frame ptr 
                     // (2 byte inst)
    
  3.      
        sub esp, x  // allocate x bytes on 
                    // the stack for local 
                    // variables (3 to 6 
                    // byte instruction)
           or
        add esp, -x
    
Alternatively it could also be done using the ENTER instruction, however most compilers do not use ENTER for stack frame allocation. Thus, an 'interesting' function prologue includes at least 6 bytes worth of instructions. Hence, we can comfortably instrument an 'interesting' function prologue to redirect control to the RAD prologue code using a 5-byte JMP instruction. On the other hand, a typical stack frame deallocation instruction sequence looks like one of the following three cases:
  1.     
        add esp, x   // dealloc. stack 
                     // space, x bytes 
                     // were allocated 
                     // (3-6 byte inst)
        pop ebp      // restore caller's 
                     // frame ptr (1 byte)
        ret          // return (1 byte)
    
  2.     
        mov esp, ebp  // dealloc. stack 
                      // space, any
                      // number of bytes 
                      // allocated on the 
                      // stack (2 byte 
    		  // instruction)
        pop ebp       // restore caller's 
                      // frame ptr (1 
                      // byte)
        ret           // return (1 byte)
    
  3.     
        leave         // dealloc. stack 
                      // frame & restores 
                      // old frame ptr 
    		  // (1 byte)
        ret           // return (1 byte)
    
From 2) and 3), we see that stack frame deallocation could be done with 2 to 4 bytes worth of instructions. So we need to replace some more instructions in addition to the stack frame deallocation instructions to hold a JMP instruction. In most cases, we do find enough space this way. However, it is possible that the first instruction of the stack frame deallocation sequence is a jump target, e.g.:

    jne x
      :
x:  leave	
    ret
In this case, if we replace instructions prior to LEAVE, then the jump target x would be disturbed. From our experiences, the scenario of not being able to find 5 bytes worth of instructions at a function's epilogue does occur in practice but is relatively rare. For such a situation to occur in practice, two conditions need to be met:
a)
Most development environments on Windows, by default, set certain compilation options which generate calls to stack checking code, prior to stack frame deallocation, to check for adherence to certain calling conventions (which basically dictate caller and callee duties as regards function frame initialization and cleanup). Calling convention adherence check is desirable because of functions being called using function pointers and calls to library functions. If we disable these options the compiler won't generate these stack checking calls and thus will not generate extra bytes prior to stack frame deallocation.
b)
There should be a high level code sequence like:

    goto label;
        :
        :
label: 	
    return;
So in such rare scenarios (our experiments show typically 0.03 - 3% of all functions, sec. 5.2.2, table 9), we use a simple although expensive approach to solve this problem. When not enough instructions are available, we replace the first byte of the instruction prior to ret with an int 3 (breakpoint interrupt) instruction, which corresponds to a software interrupt, and install a corresponding exception handler. When an int 3 instruction is executed, it generates a Debugger Breakpoint Exception, and the handler gains control to perform return address check. Because this exception handler is executing the user space, control transfer to our handler is similar to an intra-privilege level far call, which means that there is no stack switching and the exception handler can access the return address on the stack. For details regarding how the stack evolves during the execution of a software interrupt handler, please refer to [17]. The reason why we chose the debugger breakpoint exception is that this exception is not used normally unless the program is being debugged. However, while being debugged under a debugger, the control is transferred to the debugger when an int 3 instruction is executed, and our exception handler will not executed.
next up previous
Next: Prototype Implementation Up: Binary Instrumentation Previous: Where to Insert RAD
Manish Prasad
2003-04-05