Tracking the propagation of email attachments is beneficial in identifying the origin of malicious executables, and in estimating a malicious attachment's prevalence.
In order to log the attachments, we needed a way to obtain a unique identifier for each attachment. We did this by using the MD5 algorithm  to compute a unique number for each binary attachment. The input to MD5 was the hexadecimal representation of the binary. These identifiers were then kept in a log along with other information such as whether the attachment was malicious, or benign and with what certainty the system made those predictions.
The logs of malicious attachments are then sent back to the central server according to the policy of each host. Some hosts may wish to never send these logs, and can turn the feature off, while other hosts could configure the system to only send logs of borderline cases, etc.
After receiving the logs, the system measures the propagation of the malicious binaries across hosts. From these logs it can be estimated how many copies of each malicious binary were circulating the Internet, and these reports will be forwarded back to the community, and used for further research.
The current method for detailing the propagation of malicious executables is for an administrator to report an attack to an agency such as WildList . The wild list is a list of the propagation of viruses in the wild and a list of the most prevalent viruses. This is not done automatically, but instead is based upon a report issued by an attacked host. Our method would reliably, and automatically detail a malicious executable's spread over the Internet.