In a subterfuge attack, an attacker attempts to mislead the monitor as to the meaning of the traffic it analyzes. These attacks are particularly difficult to defend against, because (1) unlike overload and crash attacks, if successful they do not leave any traces that they have occurred, and (2) the attacks can be quite subtle. Access to the monitor's source code particularly aids with devising subterfuge attacks.
We briefly discussed an example of a subterfuge attack in § 3.1, in which the attacker sends text with an embedded NUL in the hope that the monitor will miss the text after the NUL. Another form of subterfuge attack is using fragmented IP datagrams in an attempt to elude monitors that fail to reassemble IP fragments (an attack well-known to the firewall community). The key principle is to find a traffic pattern interpreted by the monitor in a different fashion than by the receiving endpoint.
To thwart subterfuge attacks, as we developed Bro we attempted at each stage to analyze the explicit and implicit assumptions made by the system, and how, by violating them, an attack might successfully elude detection. This can be a difficult process, though, and we make no claims to have found them all! In the remainder of this section, we focus on subterfuge attacks on the integrity of the byte stream monitored for a TCP connection. Then, in § 6.4, we look at subterfuge attacks aimed at hiding keywords in interactive text.
To analyze a TCP connection at the application level requires extracting the payload data from each TCP packet and reassembling it into its proper sequence. We now consider a spectrum of approaches to this problem, ranging from simplest and easiest to defeat, to increasingly resilient.
Scanning the data in individual packets without remembering any connection state, while easiest, obviously suffers from major problems: any time the text of interest happens to straddle the boundary between the end of one packet and the beginning of the next, the text will go unobserved. Such a split can happen simply by accident, and certainly by malicious intent.
Some systems address this problem by remembering previously-seen text up to a certain degree (perhaps from the beginning of the current line). This approach fails as soon as a sequence ``hole'' appears: that is, any time a packet is missing--due to loss or out-of-order delivery--then the resulting discontinuity in the data stream again can mask the presence of key text only partially present.
The next step is to fully reassemble the TCP data stream, based on the sequence numbers associated with each packet. Doing so requires maintaining a list of contiguous data blocks received so far, and fitting the data from new packets into the blocks, merging now-adjacent blocks when possible. At any given moment, one can then scan the text from the beginning of the connection to the highest in-sequence byte received.
Unless we are careful, even keeping track of non-contiguous data blocks does not suffice to prevent a TCP subterfuge attack. The key observation is that an attacker can manipulate the packets their TCP sends so that the monitor sees a particular packet, but the endpoint does not. One way of doing so is to transmit the packet with an invalid TCP checksum. (This particular attack can be dealt with by checksumming every packet, and discarding those that fail; a monitor needs to do this anyway so that it correctly tracks the endpoint's state in the presence of honest data corruption errors, which are not particularly rare [Pa97].) Another way is to launch the packet with an IP ``Time To Live'' (TTL) field sufficient to carry the packet past the monitoring point, but insufficient to carry it all the way to the endpoint. (If the site has a complex topology, it may be difficult for the monitor to detect this attack.) A third way becomes possible if the final path to the attacked endpoint happens to have a smaller Maximum Transmission Unit (MTU) than the Internet path from the attacker's host to the monitoring point. The attacker then sends a packet with a size exceeding this MTU and with the IP ``Don't Fragment'' header bit set. This packet will then transit past the monitoring point, but be discarded by the router at the point where the MTU narrows.
By manipulating packets in this fashion, an attacker can send innocuous text for the benefit of the monitor, such as ``USER nice'', and then retransmit (using the same sequence numbers) attack text (``USER root''), this time allowing the packets to traverse all the way to the endpoint. If the monitor simply discards retransmitted data without inspecting it, then it will mistakenly believe that the endpoint received the innocuous text, and fail to detect the attack.
A defense against this attack is that when we observe a retransmitted packet (one with data that wholly or partially overlaps previously-seen data), we compare it with any data it overlaps, and sound an alarm (or, for Bro, generate an event) if they disagree. A properly-functioning TCP will always retransmit the same data as originally sent, so any disagreement is either due to a broken TCP (unfortunately, we have observed some of these), undetected data corruption (i.e., corruption the checksum fails to catch), or an attack.
We have argued that the monitor must retain a record of previously transmitted data, both in-sequence and out-of-sequence. The question now arises as to how long the monitor must keep this data around. If it keeps it for the lifetime of the connection, then it may require prodigious amounts of memory any time it happens upon a particularly large connection; these are not infrequent [Pa94]. We instead would like to discard data blocks as soon as possible, to reclaim the associated memory. Clearly, we cannot safely discard blocks above a sequencing hole, as we then lose the opportunity to scan the text that crosses from the sequence hole into the block. But we would like to determine when it is safe to discard in-sequence data.
Here we can make use of our assumption that the attacker controls only one of the connection endpoints. Suppose the stream of interest flows from host A to host B. If the attacker controls B, then they are unable to manipulate the data packets in a subterfuge attack, so we can safely discard the data once it is in-sequence and we have had an opportunity to analyze it. On the other hand, if they control A, then, from our assumption, any traffic we see from B reflects the correct functioning of its TCP (this assumes that we use anti-spoofing filters so that the attacker cannot forge bogus traffic purportedly coming from B). In particular, we can trust that if we see an acknowledgement from B for sequence number n, then indeed B has received all data in sequence up to n. At this point, B's TCP will deliver, or has already delivered, this data to the application running on B. In particular, B's TCP cannot accept any retransmitted data below sequence n, as it has already indicated it has no more interest in such data. Therefore, when the monitor sees an acknowledgement for n, it can safely release any memory associated with data up to sequence n.