Check out the new USENIX Web site.

Prevalence of Drive-by Downloads

We provide an estimate of the prevalence of web-malware based on data collected over a period of ten months (Jan 2007 - Oct 2007). During that period, we subjected over 60 million URLs for in-depth processing through our verification system. Overall, we detected more than $ 3$ million malicious URLs hosted on more than $ 180$ thousand landing sites. Overall, we observed more than $ 9$ thousand different distribution sites. The findings are summarized in Table 1. Overall, these results show the scope of the problem, but do not necessarily reflect the exposure of end-users to drive-by downloads. In what follows, we attempt to address this question by estimating the overall impact of the malicious web sites.

Table 1: Summary of collected data.
Data collection period Jan - Oct 2007
Total URLs checked in-depth $ 66,534,330$
Unique suspicious landing URLs  $ 3,385,889$
Unique malicious landing URLs  $ 3,417,590$
Unique malicious landing sites $ 181,699$
Unique distribution sites $ 9,340$

To study the potential impact of malicious web sites on the end-users, we first examine the fraction of incoming search queries to Google's search engine that return at least one URL labeled as malicious in the results page. Figure 3 provides a running average of this fraction. The graph shows an increasing trend in the search queries that return at least one malicious result, with an average approaching $ 1.3\%$ of the overall incoming search queries. This finding is troubling as it shows that a significant fraction of search queries return results that may expose the end-user to exploitation attempts.

Figure 3: Percentage of search queries that resulted in at least one URL labeled as malicious; 7-day running avg.

To further understand the importance of this finding, we inspect the prevalence of malicious sites among the links that appear most often in Google search results. From the top one million URLs appearing in the search engine results, about $ 6,000$ belong to sites that have been verified as malicious at some point during our data collection. Upon closer inspection, we found that these sites appear at uniformly distributed ranks within the top million web sites--with the most popular landing page having a rank of $ 1,588$ . These results further highlight the significance of the web malware threat as they show the extent of the malware problem; in essence, about $ 0.6\%$ of the top million URLs that appeared most frequently in Google's search results led to exposure to malicious activity at some point.

An additional interesting result is the geographic locality of web based malware. Table 2 shows the geographic breakdown of IP addresses of the top 5 malware distribution sites and the landing sites. The results show that a significant number of Chinese-based sites contribute to the drive-by problem. Overall, $ 67\%$ of the malware distribution sites and $ 64.6\%$ of the landing sites are hosted in China. These findings provide more evidence [13] of poor security practices by web site administrators, e.g., running out-dated and unpatched versions of the web server software.

Table 2: Top 5 Hosting countries
dist. site % of all landing site % of all
hosting country dist. sites hosting country landing sites
China 67.0% China 64.4%
United States 15.0% United States 15.6%
Russia 4.0% Russia 5.6%
Malaysia 2.2% Korea 2.0%
Korea 2.0% Germany 2.0%

Upon closer inspection of the geographic locality of the web-malware distribution networks as a whole (i.e., the correlation between the location of a distribution site and the landing sites pointing to it), we see that the malware distribution networks are highly localized within common geographical boundaries. This locality varies across different countries, and is most evident in China, with $ 96\%$ of the landing sites in China pointing to malware distribution servers hosted in that country.

Niels Provos 2008-05-13