Steps to Reducing Unwanted Traffic on the Internet Workshop Abstract
Pp. 8591 of the Proceedings
Improving Spam Detection Based on Structural Similarity
Luiz H. Gomes, Fernando D. O. Castro,
Virgílio A. F. Almeida, Jussara M. Almeida, and Rodrigo B. Almeida, Universidade Federal de Minas Gerais;
Luis M. A. Bettencourt, Los Alamos National Laboratory
We propose a new spam detection algorithm that uses structural relationships between senders and recipients of email as the basis for spam detection. A unifying representation of users and receivers in the vectorial space of their contacts is constructed, that leads to a natural definition of similarity between them. This similarity is then used to group email senders and recipients into clusters. Historical information about the messages sent and received by the clusters is obtained by forwarding messages to an auxiliary spam detection algorithm and this information is used to reclassify messages. In the framework proposed, our algorithm aims at correcting misclassifications from an auxiliary algorithm. A simulation is performed based on actual data collected from an SMTP server from a large University. We show that our approach is able reduce false positives, produced by the auxiliary classification algorithm, up to about 60%.
- View the full text of this paper in HTML and PDF, or the talk slides in PDF.
Until July 2006, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2005 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.