Check out the new USENIX Web site.

Drive-by Downloads via Ads

Today, the majority of Web advertisements are distributed in the form of third party content to the advertising web site. This practice is somewhat worrisome, as a web page is only as secure as it's weakest component. In particular, even if the web page itself does not contain any exploits, insecure Ad content poses a risk to advertising web sites. With the increasing use of Ad syndication (which allows an advertiser to sell advertising space to other advertising companies that in turn can yet again syndicate their content to other parties), the chances that insecure content gets inserted somewhere along the chain quickly escalates. Far too often, this can lead to web pages running advertisements to untrusted content. This, in itself, represents an attractive avenue for distributing malware, as it provides the adversary with a way to inject content to web sites with large visitor base without having to compromise any web server.

Figure 5: Percentage of landing sites potentially infecting visitors via malicious advertisements, and their relative share in the search results.

To assess the extent of this behavior, we estimate the overall contribution of Ads to drive-by downloads. To do so, we construct the malware delivery trees from all detected malicious URLs following the methodology described in Section 3. For each tree, we examine every intermediary node for membership in a set of $ 2,000$ well known advertising networks. If any of the nodes qualify, we count the landing site as being infectious via Ads. Moreover, to highlight the impact of the malware delivered via Ads relative to the other mechanisms, we weight the landing sites associated with Ads based on the frequency of their appearance in Google search results compared to that of all landing sites. Figure 5 shows the percentage of landing sites belonging to Ad networks. On average, $ 2\%$ of the landing sites were delivering malware via advertisements. More importantly, the overall weighted share for those sites was substantial--on average, $ 12\%$ of the overall search results that returned landing pages were associated with malicious content due to unsafe Ads. This result can be explained by the fact that Ads normally target popular web sites, and so have a much wider reach. Consequently, even a small fraction of malicious Ads can have a major impact (compared to the other delivery mechanisms).

Another interesting aspect of the results shown in Figure 5 is that Ad-delivered drive-by downloads seem to appear in sudden short-lived spikes. This is likely due to the fact that Ads appearing on several advertising web sites are centrally controlled, and therefore allow the malicious content to appear on thousands of web sites sites almost instantaneously. Similarity, once detected, these Ads are removed simultaneously, and so disappear as quickly as they appeared. For this reason, we notice that drive-by downloads delivered by other content injection techniques (e.g., individual web servers compromise) have more lasting effect compared to Ad delivered malware, as each web site must be secured independently.

The general practice of Ad syndication contributes significantly to the rise of Ad delivered malware. Our results show that overall 75% of the landing sites that delivered malware via Ads use multiple levels of Ad syndication. To understand how far trust would have to extend in order to limit the Ad delivered drive-by downloads, we plot the distribution of the path length from the landing site leading to the malware distribution sites for each delivery tree. The edges connecting the nodes in these paths reflect the number of redirects a browser has to follow before receiving the final payload. Hence, for syndicated Ads that delivered malware the path length is indicative of the number of syndication steps before reaching the final Ad; in our case, the malware payload. Figure 6 shows the distribution of the number of redirects for syndicated Ads that delivered malware relative to the other malicious landing URLs. The results are quite telling: malware delivered via Ads exhibits longer delivery chains, in $ 50\%$ percent of all cases, more than $ 6$ redirection steps were required before receiving the malware payload. Clearly, it is increasingly difficult to maintain trust along such long delivery chains.

Figure 6: CDF of the number of redirection steps for Ads that successfully delivered malware.

Inspecting the delivery trees that featured syndication reveals a total of 55 unique Ad networks participating in these trees. We further studied the relative role of the different networks by evaluating the frequency of appearance of each Ad network in the malware delivery trees. Interestingly, our results show that five advertising networks appear in approximately $ 75\%$ of all malware delivery trees. Figure 7 shows the distribution of the relative position of each network in the malware delivery chains it participated in. The normalized position is calculated by dividing the index of the Ad network in each chain by the length of the chain. The graph shows that these advertising networks split into three different categories: In the first category, which includes network I, the advertising network appears at the beginning of the delivery chain. In the second category, which includes networks II-IV, advertising networks appear frequently in the middle of the delivery chains. In both these categories advertising networks do not participate directly in delivering malware. However, the relative position of networks in the delivery chain may be used as an indication of their relationship with the malware distribution sites - the deeper a network's relative position the closer it is related to the malware distribution site. Finally, in the third category, indicated by network V, our analysis revealed that in almost $ 50\%$ of all incidents, the advertising network is directly delivering malware. For example, advertising network V pushes Ads that install malware in the form of a browser toolbar.

Figure 7: CDF of the normalized position of the top five Ad networks most frequently participating in malware delivery chains.

Finally we further elucidate this problem via an interesting example from our data corpus. The landing page in our example refers to a Dutch radio station's web site. The radio station in question was showing a banner advertisement from a German advertising site. Using JavaScript, that advertiser redirected to a prominent advertiser in the US, which in turn redirected to yet another advertiser in the Netherlands. That advertiser redirected to another advertisement (also in the Netherlands) that contained obfuscated JavaScript, which when un-obfuscated, pointed to yet another JavaScript hosted in Austria. The final JavaScript was encrypted and redirected the browser via multiple IFRAMEs to, an exploit site hosted in Austria. This resulted in the automatic installation of multiple Trojan Downloaders. While it is unlikely that the initial advertising companies were aware of the malware installations, each redirection gave another party control over the content on the original web page--with predictable consequences.

Niels Provos 2008-05-13