To quantify the performance of our method we computed statistics on the
preformance of the data mining-based method versus the signature-based method.
We are interested in four quantities in the experiments. They are the counts
for *true positives*, *true negatives*, *false positives*, and *false negatives*. A true positive, TP, is an malicious example that is
correctly tagged as malicious, and a true negative, TN, is a benign example
that is correctly classified as benign. A false positive, FP, is a benign
program that has been mislabeled by an algorithm as malicious, while
a false negative, FN, is a malicious executable that has been mis-classified as
a benign program.

The *overall accuracy* of the algorithm is calculated as the number of
programs the system classified correctly divided by the total number of
binaries tested. The *detection rate* is the number of malicious binaries
correctly classified divided by the total number of malicious programs tested.

We estimated our results for detecting new executables by using 5-fold cross validation [4]. Cross validation is the standard method to estimate the performance of predictions over unseen data. For each set of binary profiles we partitioned the data into five equal size partitions. We used four of the partitions for training a model and then evaluated that model on the remaining partition. Then we repeated the process five times leaving out a different partition for testing each time. This gave us a reliable measure of our method's accuracy on unseen data. We averaged the results of these five tests to obtain a measure of how the algorithm performs in detecting new malicious executables.