Check out the new USENIX Web site. next up previous
Next: Potential buffer overflow attacks Up: Limitations Previous: Limitations

Security Weaknesses Due to Disassembly Limitations

Two aspects of disassembly that relate to sources of false security alarms and security loopholes are: Let's look at each of these aspects and evaluate when and how these could result in false alarms or missed attacks. There is a trade-off between security and program correctness. While we make every effort to seal all known security holes, we consider preserving original program semantics as a more important goal than attempting perfect security.
False Negatives
These (if any) are mainly callback functions (without an explicit CALL within the code section) and/or functions invoked using Position-Independent Code (PIC) sequences (wherein there would be no absolute address references to a function within the code section). These are not covered by pure control flow analysis. Despite this, typically, only a certain fraction of such functions get missed out. The following "representative" scenarios should make this clear. Functions missed by the control flow analysis step could be:
a)
partly/fully misidentified as data or
b)
identified fully as code
If the start or the end of a function is misidentified as data, then we might miss out on an interesting prologue or epilogue respectively. Either cases result in an unprotected return, which might turn out to be a security loophole, if that particular function has a buffer overflow vulnerability. Ditto is the case when a function is fully identified as data. If a piece of code somewhere in the middle of a function is misidentified as data, then the function is misidentified as data, then the function is divided into two, and hence all returns in this function beyond this dividing point would be treated as a part of an uninteresting function, and hence are left uninstrumented and could miss an attack. A function with its body fully identified as code, could still be missed out during control flow analysis and have their unidentified entry points preceded either by data or an unconditional branch instruction from the previous function. In either of the cases, we would indeed mark the function entry point (last step (5) of disassembly engine sec. 3.1.2). When data preceding the entry point of such a function aligns properly with the code bytes to form a legitimate instruction sequence, an originally interesting prologue could become uninteresting, thus exposing an attack opportunity. In all the cases presented so far, however, program semantics are not jeopardized. But if data misidentified as code turns an uninteresting function prologue to an interesting one, it might generate a false alarm, if the epilogue happens to be interesting. Another false alarm scenario is if the function entry point is preceded by some data and the first identified code byte happens to be a jump target (happens with inter-procedural jumps), in which case the two functions get merged into one. However, inter-procedural jumps occur only in handcrafted assembly or as in {\tt setjmp()}/{\tt longjmp()} cases. Apart from functions, jump targets reached by PIC jump tables could be missed. This could affect program correctness, if these targets happen to be within instrumented prologues or epilogues, a very unlikely scenario, though.
False Positives
Functions with multiple entry points are treated as two separate functions. Targets of PIC jump tables, which cannot be discovered statically could get marked as function entry points, if they lie immediately after an unconditional branch or a sequence of data bytes (last step (5) of disassembly engine sec. 3.1.2). Code section addresses which appear as immediate ( imm32) operands to mov r32, imm32 or push imm32, could be identified as function entry points even if they are targets of an indirect jump (non-PIC jump table targets are, however, treated specially and identified). Function boundary identification helps prevent scenarios where the prologue is instrumented, but the epilogue is not and vice versa. Since the latter case could cause false alarms (since the epilogue checking code would be trying to find a match for the return address on the stack, but since it was never saved (no prologue instrumentation code), it won't find it in the Return Address Repository (RAR), thus flagging a false exception). We want to avoid that altogether, which can be achieved by "optimistic" identification of functions. Over-identification, however could result in a function having an instrumented prologue, but an uninstrumented epilogue. Such a function, if called too frequently in a manner that it exits from an uninstrumented epilogue, then the RAR will eventually overflow, since there is no code to pop the return address off the RAR. Another potential problem due to over-identification is missed attacks. If over-identification causes an "entry point" to be inserted within a function body then the single function gets divided into two. Here the "second" function won't have an interesting prologue, hence all subsequent returns in this function will be missed. However, "over-identification" never jeopardizes program correctness unless, of course, an entire chunk of data misidentified as code forms a function, with both interesting prologue and epilogue, which is an extremely unlikely scenario. In summary, PIC, indirect branches and callback functions could cause some security loopholes in the input programs to be un-protected. Empirical results show that indirect branches typically are 5-8 % of all branch instructions (Section 5.2.1, Table 5). Only a fraction of this (if at all any) could possibly result in a missed attack. As for false alarms, they could arise due to hand crafted assembly code, mostly with inter-procedural jumps and/or entry and exit points in different functions. Here's an example of such a case that we observed after rewriting the application Microsoft Access:

Fn1:  // no 'interesting' prologue
        :
     jne label
        :                
     ret // no 'interesting' epilogue

Fn2:  // 'interesting prologue'
        :
label: 
        :         
     ret // 'interesting' epilogue
In this case, the control jumps from Fn1 (whose prologue is not instrumented, meaning its return address is not saved in the RAR) to label, which is in Fn2 and exit from Fn2 (whose epilogue is instrumented, meaning a return address check is done). The RAD epilogue of Fn2 will flag an exception, since it cannot find the on-stack return address in the RAR, thus a false alarm. A solution to avoid such false alarms (not currently implmented) could be to assign an ID to every function, and allow RAD epilogue check to proceed only when its function ID matches the function ID of the top of stack on the RAR (stored by RAD prologue code). Other recipes for false alarms include data misidentified as code which looks exactly like an interesting prologue, or an entire chunk of data which appears like an interesting function, both of which are rather uncommon.
next up previous
Next: Potential buffer overflow attacks Up: Limitations Previous: Limitations
Manish Prasad
2003-04-05