Check out the new USENIX Web site.


Malware Distribution Infrastructure

In this section, we explore various properties of the hosting infrastructure for web malware. In particular, we explore the size of of the malware distribution networks, and examine the distribution of binaries hosted across sites. We argue that such analysis is important, as it sheds light on the sophistication of the hosting infrastructures and the level of malfeasance we see today. As is the case with other recent malware studies (e.g., [5,26,21]) we hope that this analysis will be of benefit to researchers and practitioners alike.

Figure 8: CDF of the number of landing sites pointing to a particular malware distribution site.
\includegraphics[width=3in]{graphs/cdf-landingpages.eps}

For the remaining discussion, recall that a malware distribution network constitutes all the landing sites that point to a single distribution site. Using the methodology described in Section 3, we identified the distribution networks associated with each malware distribution site. We first evaluate their size in terms of the total number of landing sites that point to them. Figure 8 shows the distribution of sizes for the different distribution networks.

The graph reveals two main types of malware distribution networks: (1) networks that use only one landing site, and (2) networks that have multiple landing sites. As the graph shows, distribution networks can grow to have well over 21,000 landing sites pointing to them. That said, roughly 45% of the detected malware distribution sites used only a single landing site at a time. We manually inspected some of these distribution sites and found that the vast majority were either subdomains on free hosting services, or short-lived domains that were created in large numbers. It is likely, though not confirmed, that each of these sites used only a single landing site as a way to slip under the radar and avoid detection.

Figure 9: The cumulative fraction of malware distribution sites over the $ /8$ IP prefix space.
\includegraphics[width=3in]{graphs/pdf-landing.dist.site.pfx.eps}

Next, we examine the network location of the malware distribution servers and the landing sites linking to them. Figure 9 shows that the malware distribution sites are concentrated in a limited number of /8 prefixes. About 70% of the malware distribution sites have IP addresses within 58.* - 61.* and 209.* - 221.* network ranges. Interestingly, Anderson et al. [5] observed comparable IP space concentrations for the scam hosting infrastructure. The landing sites, however exhibit relatively more IP space diversity; Roughly 50% of the landing sites fell in the above ranges.

Figure 10: The cumulative fraction of the malware distribution sites across the different ASes.
\includegraphics[width=3in]{graphs/cdf-dist.site.asnum.eps}

We further investigated the Autonomous System (AS) locality of the malware distribution sites by mapping their IP addresses to the AS responsible for the longest matching prefixes for these IP addresses. We use the latest BGP snapshot from Routeviews [23] to do the IP to AS mapping. Our results show that all the malware distribution sites' IP addresses fall into a relatively small set of ASes -- only 500 as of this writing. Figure 10 shows the cumulative fraction of these sites across the ASes hosting them (sorted in descending order by the number of sites in each AS). The graph further shows the highly nonuniform concentration of the malware distribution sites: $ 95\%$ of these sites map to only $ 210$ ASes. Finally, the results of mapping the landing sites (not shown) produced $ 2,517$ ASes with $ 95\%$ of the sites falling in these $ 500$ ASes.

Lastly, the distribution of malware across domains also gives rise to some interesting insights. Figure 11 shows the distribution of the number of unique malware binaries (as inferred from MD5 hashes) downloaded from each malware distribution site. As the graph shows, approximately 42% of the distribution sites delivered a single malware binary. The remaining distribution sites hosted multiple distinct binaries over their observation period in our data, with 3% of the servers hosting more than 100 binaries. In many cases, we observed that the multiple payloads reflect deliberate obfuscation attempts to evade detection. In what follows, we take a more in-depth look by studying the different forms of relationships among the various distribution networks.

Figure 11: CDF of the number of unique binaries downloaded from each malware distribution site.
\includegraphics[width=\columnwidth]{graphs/cdf-hashes.eps}


Subsections
Niels Provos 2008-05-13