One of the most important areas of future work for this application is the development of more efficient algorithms. The current probabilistic method requires a machine with a significant amount of memory to generate, and employ the classifiers. This memory requirement makes the computation of the models expensive.
To make the algorithms use less space will require theoretical bounds on how to prune features from the data without losing accuracy. The details of how the pruning may work is beyond the scope of this paper.
After developing more efficient algorithms, the next most important work to be done is generating a more complete data set. The current, malicious data set, 3,301 examples, is smaller than the known number of malicious programs, 50,000+ examples. Work needs to be done with industry or security sources to develop a standard data set consisting of infected programs, macro and visual basic viruses, and many different sets of benign data. 1
On more obvious future work would be to incorporate the system into Windows and Macintosh mail servers and clients. This would require work with the individual vendors because their systems are not open-sourced. As a result of our system being freely available, these vendors could work with us to incorporate it or they could do it themselves.
Another potential future work of this filter is to make it into a stand alone virus scanner. Once the system has been fully completed, and thoroughly tested in the real world it would be possible to port the algorithms to different operating systems, such as Windows, or Macintosh.
This scanner could be run in a similar manner to traditional virus scanners. A user could run the system at bootup, or when required and analyze all the files on a personal computer. This requires though that the system be efficient enough to run on older computers with slower processors, and less memory.