The C and C++ languages are popular primarily because of the precise
low-level control they provide over system resources, including memory.
Unfortunately, this control is more than most programmers can handle, as
evidenced by the host of memory-related programming errors which plague
software written in these languages, and continue to be discovered every
day. Attacks which exploit memory errors such as buffer overflows
constitute the largest class of attacks reported by organizations such as
the CERT Coordination Center, and pose a serious threat to the
To date, a number of attacks which exploit memory errors have been
developed. The earliest of these to achieve widespread popularity was the
stack smashing attack [31, 27], in which a stack-allocated
buffer is intentionally overflowed so that a return address stored on the
stack is overwritten with the address of injected malicious code.
(See Figure 1). To thwart such attacks, several approaches
were developed, which, in one way or another, prevent undetected
modifications to a function's return address. They include the StackGuard
 approach of putting canary values around the
return address, so that stack smashing can be detected when the
canary value is clobbered; saving a second copy of return address elsewhere
[9, 6]; and others .
--- Stack Growth --->
<--- Increasing Address ---
Figure 1: A buffer overflow in which the current function's return address
is replaced with a pointer to injected code.
The difficulty with the above approaches is that while they are effective
against stack-smashing attacks, they can be defeated by attacks that
modify code pointers in the static or heap area. In addition, attacks
where control flow is not changed, but security-critical data such as an
argument to chmod or execve system call are changed, are not
addressed. Recently, several new classes of vulnerabilities such as the
integer overflow vulnerability (reported in Snort 
and earlier in sshd ), heap overflows  and
double-free vulnerabilities  have emerged. These
developments lead us to conclude that additional ways to exploit
the lack of memory safety in C/C++ programs
will continue to be discovered in the future.
Thus, it is important to develop approaches that provide systematic
protection against all foreseeable memory error exploitations.
As a first step towards developing more comprehensive solutions against
memory exploits, we observe that such exploits require an attacker to
possess a detailed understanding of the victim program, and have precise
knowledge of the organization of data and code within the victim program
memory. Code obfuscation is a general technique that attempts to
secure programs by making them hard to understand. It is typically
implemented using a set of randomized, semantics-preserving program
transformations [38, 10, 4]. While code obfuscation is
concerned primarily with preventing the understanding and reverse
engineering of binary code, our interest lies in obfuscations which modify
the internal runtime behavior of programs in ways that don't affect the
observable semantics, but do create unpredictability which makes it
difficult to successfully craft attacks which exploit memory errors.
Forrest, et.al.  suggested the use of
randomized program transformations as a way to introduce diversity
into applications. Such diversity makes it necessary for attackers to
analyze each copy of the victim program independently, thereby greatly
increasing the cost of developing attacks. They presented a prototype
implementation that performed one particular kind of randomization:
the randomization of the addresses of stack-resident data.
Their implementation modified the gcc
compiler to insert a random amount of padding into each stack frame. Our
paper extends this basic idea, and presents a systematic study of the
range of address randomizations that can be achieved using program
Address obfuscation is a program transformation technique in which a
program's code is modified so that each time the transformed code is
executed, the virtual addresses of the code and data of the program are
randomized. As we will show, this makes the effect of most memory-error exploits
non-deterministic, with only a very small chance of success. Attackers
are forced to make many attempts on average before an attack succeeds,
with each unsuccessful attack causing the target program to crash,
increasing the likelihood that the attack will be detected. Moreover, an
attack that succeeds against one victim will not succeed against another
victim, or even for a second time against the same victim. This aspect
makes it particularly effective against large-scale attacks such as Code
Red, since each infection attempt requires significantly more resources,
thereby greatly reducing the propagation rate of such attacks.
The PaX project
has also developed an approach for randomizing the memory regions occupied
by program code and data, called Address Space Layout Randomization (ASLR)
(See http://pageexec.virtualave.net for documentation on PaX
project.) Rather than viewing address obfuscation as a program
transformation, they view it as an operating system feature. In
particular, they have modified the Linux kernel so that it randomizes the
base address of different sections of memory, such as the stack, heap,
code, and memory-mapped segments. A key benefit of this approach is that
it requires no changes to individual applications (other than having the
compiler generate position-independent code). However, since the approach
incorporates no analysis of the applications, it is difficult to perform
address randomizations beyond changes to the base addresses of different
memory segments. In contrast, a program transformation approach will
permit randomization of the locations of individual variables and routines
within these memory sections. Such randomization makes it difficult to
carry out attacks that rely on relative distances between variables to
modify critical data, e.g., a string used as an argument to execve.
Moreover, it introduces significant additional diversity into the program,
as it is no longer possible to craft attacks by knowing just the offsets
in the base address of various memory segments. (These offsets can
potentially be learned by exploiting vulnerabilities that may allow
attackers to read contents of victim program memory without crashing it.)
The current generation of compilers and application binary interfaces
limit how much randomization is possible, and at what stage (compile-time,
link-time, load-time or runtime) such randomization can be performed. Our
implementation focuses on techniques that can be smoothly integrated into
existing OS environments. The key contribution of this paper is to develop
and analyze the range of address obfuscations that can be implemented
effectively with low runtime overheads. The principal benefits of this
|Libverify , RAD 
|Etoh and Yoda 
1. Only protected from buffer-overflow attacks; no protection from other attacks.
2. Limited protection provided.
3. Possible in principle but not currently implemented.
4. Susceptible to attacks that simultaneously
corrupt return address and another location (second copy of return address)
5. Some susceptibility to attacks that corrupt return
address and another stack-resident pointer.
Figure 2: Targets of memory error exploits, and effectiveness of defenses against them.
Applicability to legacy code without source-code or operating system
changes provides an easy migration path for existing systems to adopt
address obfuscation. Such a solution can also be ported more easily to
proprietary operating systems.
Finally, the approach can be easily combined with existing
techniques, such as Stackguard and Formatguard, to provide additional
It systematically protects against a wide range of attacks which exploit
memory programming errors, including stack smashing, heap-overflow,
integer overflow, and typical format-string attacks.
- It can be easily applied to existing legacy code without modifying the
source code, or the underlying operating system. Moreover, it can
be applied selectively to protect security-critical applications without
needing to change the rest of the system.
- The transformation is fast and introduces only a low runtime overhead.
1.1 Overview of Address Obfuscation and How it Works.
We start with the observation that the goal of an attacker is to cause the
target program to execute attack-effecting code. This code itself may be
provided by the attacker (injected code), or it may already
be a part of the program (existing code). A direct way to
force execution of such code is through a change to the control flow of
the program. This requires the attacker to change a code pointer stored
somewhere in memory, so that it points to the code of their choice. In
such a case, when the corrupted code pointer is used as the target of a
jump or call instruction, the program ends up executing the code
chosen by the attacker.
Some natural choices for such code pointers include the return
address (stored on the stack), function pointers (stored on the stack,
static area or the heap), the global offset table (GOT) that is used in the
context of dynamic linking, and buffers storing longjmp data. An
indirect way to force execution of attack-effecting code is to change
security-critical data that is used by the program in its normal course of
execution. Examples of such data include arguments to a chmod or
execve system call, variables holding security critical data
such as a flag indicating whether a user has successfully authenticated
There are essentially two means by which an attacker can
exploit a memory error:
by overwriting a pointer value, or by overwriting non-pointer data.
Since code sections cannot be overwritten in most
modern operating systems, there are three possible
combinations of goals and
means: corrupting a code-pointer, corrupting a data-pointer, or
corrupting non-pointer data. Of these,
the two pointer-corrupting attacks involve
overwriting a pointer with the address of data or code chosen by
the attacker. These two kinds of attacks require the attacker to
know the absolute address of such data or code, and hence we call them
absolute address-dependent attacks. The third kind of attack is
called relative address-dependent,
because it does not overwrite pointers, and requires only relative address
information --- in particular, an attacker needs to know the relative
distance between a buffer (which is overrun) and the location of the
data item to be corrupted. Figure 2 shows these three
classes of attacks, further subdivided based on the pointer or data value
that is targeted by an attack. It shows which of today's protection
schemes (including ours) protect against them. As it shows, Stackguard,
Libverify and RAD protect against buffer overrun attacks that overwrite
the return address. PointGuard  is an approach that
encrypts stored pointer values (by xor-ing them with a random number). It
can be thought of as obfuscating pointer values as opposed to the
addresses pointed by them. The benefit of their approach is that the
probability of an attack making a successful guess is smaller than with
address obfuscation. A drawback is that it does not provide protection
against attacks that modify non-pointer values, e.g., attacks that modify
critical data, or integer subscripts. A concrete example of such an attack
is the recent integer overflow exploit , which is protected
by address obfuscation. The PaX project's ASLR approach provides
protection against pointer-based attacks in much the same way as address
obfuscation, but not against data attacks that exploit relative distances
between variables. A more detailed comparison of our approach with these
approaches can be found in Sections 5 and 3.
1.2 Organization of the Paper.
The rest of this paper is organized as follows. In Section 2,
we describe several possible obfuscating transformations, and describe
our implementation approach. Section 3 discusses the
effectiveness of our approach against different attacks, and analyzes
the probability of mounting successful attacks. Runtime overheads
introduced by our approach are discussed in Section 4,
followed by a discussion of related work in Section 5.
Finally, Section 6 provides a summary and discusses future