To quantify the performance of our method we computed statistics on the preformance of the data mining-based method versus the signature-based method. We are interested in four quantities in the experiments. They are the counts for true positives, true negatives, false positives, and false negatives. A true positive, TP, is an malicious example that is correctly tagged as malicious, and a true negative, TN, is a benign example that is correctly classified as benign. A false positive, FP, is a benign program that has been mislabeled by an algorithm as malicious, while a false negative, FN, is a malicious executable that has been mis-classified as a benign program.
The overall accuracy of the algorithm is calculated as the number of programs the system classified correctly divided by the total number of binaries tested. The detection rate is the number of malicious binaries correctly classified divided by the total number of malicious programs tested.
We estimated our results for detecting new executables by using 5-fold cross validation . Cross validation is the standard method to estimate the performance of predictions over unseen data. For each set of binary profiles we partitioned the data into five equal size partitions. We used four of the partitions for training a model and then evaluated that model on the remaining partition. Then we repeated the process five times leaving out a different partition for testing each time. This gave us a reliable measure of our method's accuracy on unseen data. We averaged the results of these five tests to obtain a measure of how the algorithm performs in detecting new malicious executables.