Toward
Undetected Operating System Fingerprinting
Lloyd G. Greenwald and Tavaris J. Thomas
LGS
{lgreenwald,
tjthomas}@lgsinnovations.com
Abstract
Tools for active remote operating system fingerprinting generate many packets and are easily detected by host and network defensive devices such as IDS/NIDS. Since each additional packet increases the probability of detection, it is advantageous to minimize the number of probe packets. We make use of an information-theoretic measure of test quality to evaluate fingerprinting probes and use this evaluation to derive effective probe combinations that minimize probe packets. While the default configuration of Nmap’s second generation operating system detection transmits 16 different probe packets, we demonstrate successful fingerprinting with one to three packets. Furthermore, these packets are valid TCP SYN packets to open ports, which are less likely to be detected as fingerprinting probes than malformed packets or packets that are not part of a valid TCP three-way handshake.
An attacker can use operating system fingerprinting to discover possible security vulnerabilities and evaluate the attack potential of a target machine. Open source tools are publicly available that permit an attacker to gain this intelligence remotely. However, the use of these tools may be easily detected because the default configurations generate too many probe packets or generate packets that are unusual, malformed, or otherwise easily identified as probe packets.
To understand how to build operating system fingerprinting tools that are more difficult to detect we make use of a measure to evaluate fingerprinting tests based on information gain developed in [11]. Fingerprinting tests with high information gain eliminate a lot of uncertainty about the target system while fingerprinting tests with low information gain leave a lot of uncertainty about the target system and are only worthwhile if higher quality tests are too costly. Test cost may be expressed in terms of the number of probes needed for the test and the likelihood that a probe will be detected by IDS/NIDS. Once we understand the quality of individual fingerprinting tests we can evaluate the quality of a probe that enables multiple fingerprinting tests. We can then select a minimum set of probes to perform operating system fingerprinting with low probability of detection.
We provide both analytical and empirical support for building operating system fingerprinting tools that use very few probes yet provide effective operating system classifications. The main contribution of this paper is to demonstrate the use of the theoretical results in [11] to evaluate fingerprinting probe packets. We additionally provide empirical results to substantiate these analytical insights. We demonstrate several sets of probes that provide highly accurate operating system fingerprinting with very few probes. Accuracy is measured in terms of the probability of correctly guessing the target operating system based on the results of a probing experiment. Furthermore, we argue that these probes are unlikely to be detected or modified by defensive devices. We provide accurate solutions using as few as a single probe packet.
We first provide, in Section 2, background material on operating system fingerprinting and theoretical results applying information gain to evaluate the 13 TCP probes used in Nmap version 4.21ALPHA4 [8]. Given the information gain evaluation of Section 2, we develop in Section 3 a set of 23 experiments to determine how few probes we can apply while still providing accurate classification. We empirically evaluate the accuracy of each of these experiments on several target systems. In Section 4 we argue that subsets of accurate probes are unlikely to be detected or modified by defensive devices. Finally we provide a discussion of alternative evaluations and related work in Section 5. An Appendix is included to summarize the analytical techniques developed in [11].
In order to evaluate a fingerprinting test, we compare how accurately we could guess the classification of a target system before and after performing the test. The difference is called the information gain. The test with the highest information gain provides the most discriminative power in fingerprinting. Information gain is built on the principles of information theory [20] and is an important tool in building decision tree classifiers [15][17][19]. Information gain is used to select the next test at each step in growing a decision tree. Decision tree classifiers have been used in many fields.
Prior to fingerprinting a target system, we can guess the operating system based on the a priori distribution of operating system classifications, over all possible classifications. After performing a fingerprinting test we can guess the operating system based on the a posteriori distribution of operating system classifications. Let X be a random variable that describes the classification of the operating system of a target system. The entropy in X is the amount of uncertainty there is in classifying an unknown system. Let Testi be a random variable that describes the result of applying test i to the probe responses of a target system. Knowing the value of Testi might tell us something about the value of X. This can be captured in the conditional entropy of X given Testi.. A measure of the amount of information we gain about X if we know the value Testi is called the mutual information, or information gain, of X and Testi. This can be expressed as the difference between the entropy in the classification before taking the test and the conditional entropy in the classification, conditioned on the value of the test. The fingerprinting test with the highest information gain removes the most uncertainty about the OS classification of a target system.
In [11] we detail a method that uses information gain to evaluate fingerprinting tests. This method is summarized in an appendix below. That paper tackles several hurdles in order to apply information gain in this context. The first hurdle is that information gain is generally computed from collections of training samples of test results from known systems. However, a fingerprinting tool stores information about known systems in a digested signature database rather than as raw training samples. This removes and obscures distribution information. Since a signature database is once-removed from the training samples used to create the database, we must derive calculations to take advantage of the knowledge represented in the signature database and make assumptions about the knowledge that has been lost. Our calculation also resolves issues concerning the use of data that is represented as disjunctive lists and ranges, and the handling of missing test values.
By default, Nmap version 4.21ALPHA4 sends a total of 16 probes (excluding re-transmissions) to a target system and applies tests to the probe responses. The test values are combined into a fingerprint, also known as a signature. The fingerprint of a target system is compared against reference fingerprints in a signature database in order to find matches to help classify the operating system of the target system. Nmap’s 16 default probes include six TCP SYN packets to an open port on the target machine (Pkt1-6), three TCP packets with various flags to an open port (T2-T4), three TCP packets with various flags to a closed port (T5-T7), one TCP packet to an open port with the Explicit Congestion Notification (ECN) control flags set, two ICMP ECHO packets (IE), and one UDP packet sent to a closed port to elicit an ICMP port unreachable packet. In this paper we focus on the 13 TCP probes. We do not study UDP and ICMP probes because (1) they are more easily blocked by defensive devices, and (2) our information gain evaluation reveals that they are of marginal value. More detail about the evaluation of ICMP and UDP probes are provided in [10] and [11].
|
R |
Responsiveness |
|
DF |
IP
don’t fragment bit |
|
T |
IP
initial time-to-live (TTL) |
|
TG |
Guessed
IP TTL |
|
W |
TCP
initial window size |
|
S |
TCP
sequence number |
|
A |
TCP
acknowledgement number |
|
F |
TCP
flags |
|
O |
TCP
options |
|
RD |
TCP
checksum |
|
TOS |
IP
type of service |
|
Q |
TCP
miscellaneous quirks |
|
SP |
TCP
initial sequence number (ISN) predictability index |
|
GCD |
TCP
ISN greatest common denominator |
|
ISR |
TCP
ISN counter rate |
|
TI |
IP
header ID sequence generation |
|
TS |
TCP timestamp option generation |
Table 1: Nmap Tests
Table 1 summarizes the tests applied to the responses of the 13 TCP probes of Nmap version 4.21ALPHA4. Pkts 1-6 serve a dual purpose. They are (1) used to determine TCP/IP properties that can only be derived by sequences of timed packets and (2) used as additional sources of TCP initial window size (W) and TCP options (O) data. These probes vary only in TCP options and TCP window fields. Pkt1 is also called T1 and its response is subject to the same tests as responses from probes T2-T7. The sequence tests include testing the TCP initial sequence number (ISN) generation algorithm (tests SP, GCD, and ISR). These tests require responses from at least four of the six Pkt1-6 probes. Other sequence tests include IP header ID (IPID) sequence generation (TI), requiring responses from three of the six Pkt1-6 probes, and TCP timestamp option generation algorithm (TS), requiring responses from at least two of the six Pkt1-6 probes.
Probes T2-T7 vary in TCP flags, initial window size, and don’t fragment bit setting. The responses to each of the T1-T7 probes are tested for responsiveness (R), IP don’t fragment bit (DF), IP initial time-to-live (T), guessed IP initial time-to-live (TG), TCP initial window size (W), TCP sequence number (S), TCP acknowledgement number (A), TCP flags (F), TCP options field (O), TCP checksum (RD), IP type of service (TOS), and miscellaneous quirks (Q). Note that the IP initial time-to-live value test (T) requires both one of the T1-T7 probes and the ICMP response from the UDP probe to reconstruct the initial time-to-live value. This additional probe can be avoided by guessing the IP initial-time-to-live value (TG). The ECN probe is subject to the same tests as responses from probes T2-T7, as well as a congestion control (CC) test. A description of these probes and tests is provided in [8].
The different TCP options and initial window sizes sent in the 13 TCP probes can cause a target system to change the window size value in its response packet. Similarly, since TCP options fields are optional, many TCP/IP implementations differ in how they handle them. As shown below, TCP options and initial window size tests are important for accurate fingerprinting.
We apply our information gain calculation to the tests of Nmap version 4.21ALPHA4 [8]. Table 2 depicts these results, grouped according to Nmap’s 13 TCP probes. Each row corresponds to exactly one probe (except for the IP initial time-to-live (T) test which makes use of the ICMP response to a UDP probe to calculate initial time-to-live). Each column in Table 2 corresponds to a test on the response to that probe. Table 3 depicts the tests that are computed over more than one probe. The entries in these tables correspond to the information gain of the corresponding test computed based on the Nmap version 4.21ALPHA4 signature database. Note that the same type of test may have a different information gain value depending on the probe packet sent to the target. Values that are very similar for the same test may be attributed to noise in the signature database.
The Nmap version 4.21ALPHA4 signature database has 417 entries with total entropy prior to testing of 8.70. Values in Tables 2 and 3 are coded based on the percentage of total uncertainty that is removed by each test. Values in bold font remove at least 50% of the total uncertainty, while values in italicized font remove at least 25%. All other values remove less than 25% of the total uncertainty. The results in these tables assume a target system is equally likely to be any entry in the database and that all possible values of a test for a given entry are also equally likely. Other assumptions or a priori information about classification or test value distributions (e.g. normal distributions over ranges) can be accommodated by adapting the calculations in [11].
Fingerprinting tests with high information gain eliminate a lot of uncertainty about the target system and may be used to build effective fingerprinting tools. Tests with low information gain leave a lot of uncertainty about the target system and are only worthwhile if higher quality tests are too costly. Even so, they are unlikely to be useful independently.
Test cost may be expressed in terms of the number of probes needed for the test and the likelihood that a probe will be detected by IDS/NIDS. Each row in Table 2 corresponds to a collection of tests that cost one probe total, while the tests in Table 3 are tests that require between two and six probes. Our goal is to select the rows from Table 2 and, optionally, tests from Table 3 that provide accurate fingerprinting with low probability of detection. Information gain provides one analytical tool for making this optimization choice. In Section 3 we verify these analytical results with experiments on several target systems using a combination of probes.
From Table 2 we can see that the W and O tests to open ports provide the most information gain. These tests can be achieved with any of the Pkt1-6 probes, the ECN probe, or the T3 probe. The T2 and T4 probes provide less information and the probes to closed ports (T5-T7) provide very little information about W and O. Probes to closed ports often elicit TCP RST responses that can provide some information. Of the remaining tests that can be accomplished with one probe, only the time-to-live tests (T, TG) remove more than 25% of the classification uncertainty. The quality of these tests does not vary much over the applicable probes. To gain the benefits of the most discriminative tests we can choose the ECN, T1 or T3 probes. We can substitute any of the Pkt2-6 probes for the T1 probe, and apply tests R, DF, T, TG, S, A, F, RD, and Q without additional cost.
|
|
R |
DF |
T |
TG |
W |
S |
A |
F |
O |
RD |
Q |
|
Pkt 2 |
|
|
|
|
4.76 |
|
|
|
5.39 |
|
|
|
Pkt 3 |
|
|
|
|
4.74 |
|
|
|
5.07 |
|
|
|
Pkt 4 |
|
|
|
|
4.75 |
|
|
|
5.36 |
|
|
|
Pkt 5 |
|
|
|
|
4.76 |
|
|
|
5.29 |
|
|
|
Pkt 6 |
|
|
|
|
4.76 |
|
|
|
4.40 |
|
|
|
ECN |
0.09 |
1.03 |
2.57 |
2.57 |
4.61 |
|
|
|
4.89 |
|
0.23 |
|
Pkt1/T1 |
0.68 |
1.01 |
2.55 |
2.55 |
4.71 |
0.19 |
0.29 |
0.29 |
5.27 |
0.62 |
0.62 |
|
T2 |
0.89 |
1.05 |
1.81 |
1.80 |
1.04 |
1.13 |
0.95 |