Check out the new USENIX Web site. next up previous
Next: Preliminary Results Up: Methodology for Building Data Previous: Data Mining Approach

Signature-Based Approach

To compare our results with traditional methods we implemented a signature based method. First, we calculated the byte-sequences that were only found in the malicious executable class. These byte-sequences were then concatenated together to make a unique signature for each malicious executable example. Thus each malicious executable signature contained only byte-sequences found in the malicious executable class. To make the signature unique, the byte-sequences found in each example were concatenated together to form one signature. This was done because a byte-sequence that was only found in one class during training could possibly be found in the other class during testing [3], and lead to false positives when deployed.

The virus scanner that we used to label the data set contained signatures for every malicious example in our data set, so it was necessary to implement a similar signature-based method. This was done to compare the two algorithms' accuracy in detecting new malicious executables. In our tests the signature-based algorithm was only allowed to generate signatures for the same set of training data that the data mining method used. This allowed the two methods to be fairly compared. The comparison was made by testing the two methods on a set of binaries not contained in the training set.


next up previous
Next: Preliminary Results Up: Methodology for Building Data Previous: Data Mining Approach
Matthew G. Schultz
2001-05-01