Check out the new USENIX Web site. next up previous
Next: Data Mining Approach Up: Methodology for Building Data Previous: Data Set

Detection Algorithms

We statically extracted byte sequence features from each binary example for the learning algorithms. Features in a data mining framework are properties extracted from each example in the data set, such as byte sequences, that a classifier uses to generate detection models. These features are then used by the algorithms to generate detection models. We used hexdump [8], an open source tool that transforms binary files into hexadecimal files. After we generated the hexdumps we produced features in the form displayed in Figure 1 where each line represents a short sequence of machine code instructions.

Figure 1: Example Set of Byte Sequence Features
646e 776f 2e73 0a0d 0024 0000 0000 0000\\
454e 3c05 0...
...e 0238 0244 02f5 0000\\
0001 0004 0000 0802 0032 1304 0000 030a\\

Matthew G. Schultz