Our first stage payload initiates communication by negotiating a shared key and establishing a secure channel to the payload delivery server. The secure channel is established by the first stage payload in order to hamper an active adversary desiring to identify and interfere with payload communications. This is done in the first stage because later stage communications may be easier to identify and tamper with than the initial transmission of the actual code injection exploit. For example, consider that browser-based exploits may be delivered over SSL and that exploit payloads typically employ polymorphic self-decoders that may be difficult to identify by signature-based network intrusion detection (IDS) and prevention (IPS) systems. In both cases, identifying and intercepting the delivery of the initial exploit is difficult in principle and impossible using currently available commercial IPS technology. While technically possible, intercepting, decoding, and modifying a polymorphic payload within an arbitrary exploit in transit is a very challenging task.
The secure payload delivery protocol was designed to minimize required code space for the first stage payload through the use of simple yet secure cryptographic operations and algorithms. In addition, custom cryptographic routines were chosen to maximize portability across target operating systems. Our first stage payload establishes the secure channel and must do so given the size constraints of first stage code injection exploit payloads. The system uses ElGamal key agreement (alternatively referred to as Half-Certified Diffie Hellman [14]) to securely establish a session key with the payload server and the RC4 stream cipher to encrypt communications. ElGamal key agreement was chosen because it requires only a single arbitrary-precision integer operation, modular exponentiation. RC4 is also a very simple stream cipher, generating a keystream by swapping elements in its internal S-box. The use of public-key cryptography was chosen over traditional symmetric cryptography in order to prevent a passive attacker from analyzing the initial exploit, recovering the key, and decrypting the subsequent communications. Using public-key cryptography to secure all communication prevents post-analysis by an adversary who has recorded all network traffic.
Both of these cryptosystems were also chosen for their ease of implementation in assembly language or low-level position-independent C. Modular exponentiation, the only mathematical operation required for ElGamal key agreement, can be implemented compactly using classical methods or Montgomery exponentiation[14] for more efficiency. To estimate the code space required for modular exponentiation, we modified an open source fast fixed precision math library [15] to use some more compact algorithms for modular exponentiation and multiplication. The compact modular exponentiation function fp_exptmod_small uses right-to-left binary exponentiation[14]. The compact modular multiplication function fp_mulmod_small uses repeated subtractions rather than division for the modular reduction step. The total size of the code required to implement modular exponentiation is around 1200 bytes. The exact code sizes of the necessary routines are listed in table 3.1.
The RC4 stream cipher, using a single S-box, may be implemented very
compactly. Most stream ciphers are optimized for hardware
implementation, but RC4 requires the least number of operations on a
general-purpose processor. Additional space required for the use of
these cryptographic algorithms is dedicated to the size of the ElGamal
public key, roughly twice the bit length of the prime modulus
.
Using an elliptic curve cryptography equivalent of ElGamal key
agreement would save space for the key at the expense of increased
implementation complexity and may actually require more code space.
Elliptic-curve cryptography is typically used on devices with low
processor power and little memory so the increase in code space in
return for smaller keys and thus less computationally intensive
operations is a reasonable trade-off. The target systems in our case
are assumed to be modern computers with processor power and memory
sufficient for basic cryptographic operations. The total payload
size, including basic socket networking, ElGamal key agreement, keys,
and RC4 routines is estimated to be under two kilobytes, assuming a
straightforward x86 assembly language implementation and 1024-bit
ElGamal.
The second stage is downloaded over the secure channel created by the first stage and is executed in-place within the vulnerable process' address space. The second stage executes free of code space constraints, but may be executing in a damaged address space. This impediment is overcome by downloading and executing a remote agent executable, described in the next section. Each stage of the payload is described in more detail in the following sections below.