Check out the new USENIX Web site. next up previous
Next: Data Mining Approach Up: Methodology for Building Data Previous: Data Set

Detection Algorithms

We statically extracted byte sequence features from each binary example for the learning algorithms. Features in a data mining framework are properties extracted from each example in the data set, such as byte sequences, that a classifier uses to generate detection models. These features are then used by the algorithms to generate detection models. We used hexdump [8], an open source tool that transforms binary files into hexadecimal files. After we generated the hexdumps we produced features in the form displayed in Figure 1 where each line represents a short sequence of machine code instructions.


  
Figure 1: Example Set of Byte Sequence Features
\begin{figure}
\centering
646e 776f 2e73 0a0d 0024 0000 0000 0000\\
454e 3c05 0...
...e 0238 0244 02f5 0000\\
0001 0004 0000 0802 0032 1304 0000 030a\\
\end{figure}



Matthew G. Schultz
2001-05-01