USENIX, The Advanced Computing Systems Association

SRUTI '06 Abstract

Pp. 31–36 of the Proceedings

Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting

 

Yi-Min Wang, Doug Beck, Jeffrey Wang*, Chad Verbowski, and Brad Daniels

Microsoft Research, Redmond         * PCRethinking.com


Abstract

Typo-squatting refers to the practice of registering domain names that are typo variations of popular websites. We propose a new approach, called Strider Typo-Patrol, to discover large-scale, systematic typo-squatters. We show that a large number of typo-squatting domains are active and a large percentage of them are parked with a handful of major domain parking services, which serve syndicated advertisements on these domains. We also describe the Strider URL Tracer, a tool that we have released to allow website owners to systematically monitor typo-squatting domains of their sites.     

1.      Introduction

Typo-squatting refers to the practice of registering domain names that are typos of their target domains, which usually host websites with significant traffic. The individuals or organizations who register typo-squatting domains (or typo domains) are referred to as typo-squatters. Some major typo-squatters are known to have registered thousands or more of typo domains [1,2,3].

Web traffic generated through typo-squatting is unwanted for many reasons. From the users’ perspective, such typo traffic often startles them with unexpected results, followed by an annoying barrage of pop-up and pop-under advertisements (ads). There is a documented incident where a typo domain of a popular website was serving vulnerability-exploiting scripts to install malware [4,5]. Some typo domains of children’s websites have been observed to redirect to or link to adult websites, endangering Internet safety by potentially exposing minors to harmful material [6,7].

From the business perspective, many of the typo-squatting cases involve “bad-faith” domain registrations or trademark violations [8,9,10]. Worse yet, it is not uncommon to see a typo domain displaying ads from competitors of the target-domain owner or even negative ads against the owner (e.g., investment-loss law firm’s ads on typos of brokerage firms). In other cases, some advertisers are unwillingly paying for their ads being served on typo domains of their own websites, because such traffic is intended to go directly to their sites in the first place [11].

In this paper, we describe the Strider Typo-Patrol System for discovering and analyzing typo domains. Our patrol results reveal that a large percentage of typo domains are “parked” with a handful of major domain parking services. Domain parking is a special case of advertisement syndication: while the latter attempts to serve relevant contextual ads based on the publishers’ web content, the former serves ads based on merely the domain name because parked domains typically have no content. We show that many typo-squatters are taking advantage of the domain-parking infrastructures to perform large-scale, systematic typo-squatting. However, by doing so, they also expose their typo domains to systematic discovery enabled by monitoring and analyzing ads-fetching traffic sent to the parking services.     

The paper is organized as follows. Section 2 describes how domain parking works and discusses statistics related to the amount of unwanted traffic potentially generated through typo-squatting. Section 3 presents the Strider Typo-Patrol System. Section 4 analyzes typo-patrol data and quantifies the prevalence of typo-squatting through domain parking. Section 5 describes the Strider URL Tracer designed to provide visibility and control over typo traffic. Section 6 surveys related work, Section 7 discusses remaining issues, and Section 8 concludes the paper.

2.      Understanding Domain Parking

Advertisement syndication refers to the business practice of serving ads by instructing the client-side browser software to fetch ads from an ads server and compose them with the content of the website that the user intends to visit. Syndication is typically implemented using the browser’s third-party URL mechanism: when a user visits a primary URL (hosted by the first party) either by typing the URL into the browser address bar or by clicking a link on a web page, the browser may be instructed by the content returned by the primary-URL page to automatically visit one or more secondary URLs hosted on third-party servers, without explicit knowledge or permission from the user. We refer to these secondary URLs as third-party URLs in this paper. These third-party URLs usually contain information about the primary URL so that the syndicators can serve the most relevant contextual ads based on the primary-URL page content and potentially the historical information about the visiting machine or user.

Domain parking is a special case of advertisement syndication: the primary URL is a parked domain that does not contain any real content and syndicated domain-parking ads, usually in the form of ads listings, become the main content of the page displayed to the user. In order to attract sufficient traffic for serving ads, parked domains are usually domains with well-known generic names [12] or typo domains of popular websites. See [13] for screenshots of sample domains parked with various parking services.

Next, we use two actual examples to illustrate how typo-squatting through domain parking is typically implemented using third-party URLs. When a browser visits http://disneychannell.com, it receives a response page containing a frame that loads http://www.sedoparking.com/disneychannell.com. This URL is responsible for serving the main domain-parking ads listing. The basic idea of Strider Typo-Patrol is to scan a large number of typo domains, monitor all third-party URL traffic, and group the domains by the behind-the-scenes domain parking servers in order to facilitate investigation and prioritize actions.

Some domain parking services provide additional information in their third-party URLs that enables further analysis. For example, when a browser visits http://disneyg.com, the response page contains a frame that loads

http://apps5.oingo.com/apps/domainpark/domainpark.cgi?s=disneyg.com&dp_lp=24&hl=en&dp_lp=7&cid=DTRG4295&dp_p4pid=oingo_inclusion_xml_06&dp_format=1.3

where the “cid” field appears to contain a Client ID that uniquely identifies a typo-squatter. In Section 4, we show how this information enables us to quickly discover thousands of typo domains that are registered to a well-known, serial typo-squatter [2,14].  

Domain parking services provide convenient and effective contextual-ads infrastructures that make even marginal typo domains profitable [15]. With the annual domain registration fee as low as $7.00 [16], a rule-of-thumb figure for pay-per-click programs is that a parked typo domain only needs to attract between one unique visitor every two days and two visitors per day (depending on the pay-out levels) to generate sufficient income to cover the fee. (As a reference, http://slsahdot.org records statistics of tens of hits per day.) According to alexa.com on March 12, 2006, the servers owned by the top two domain parking services identified in our study were reaching between 3,300 and 5,200 per million users daily and their servers had a traffic rank between #221 and #438. These numbers are comparable to those for popular websites such as travelocity.com (#248), orbitz.com (#315), usatoday.com (#347), and slashdot.org (#375). Although many parked domains may be generic-name domains, the fact that we were able to discover thousands of parked typo domains within a short time through simple automated searching does provide evidence that unwanted traffic due to parked typo domains could be significant.

3.      Strider Typo-Patrol System

The Strider Typo-Patrol System provides automatic scanning and systematic analysis of typo domains. It consists of three main components: a typo-neighborhood generator, a typo-neighborhood scanner, and a domain-parking analyzer.

3.1.   Typo-Neighborhood Generation

Given a target domain, we define its typo-neighborhood as the set of URLs generated from the following five typo-generation models, which are commonly used in the wild:

(1) Missing-dot typos: The “.” following “www” is removed, e.g., wwwSouthwest.com. 

(2) Character-omission typos: Characters are omitted one at a time, e.g., Diney.com and MarthStewart.com.

(3) Character-permutation typos: Consecutive characters are swapped one pair at a time, unless they are the same characters, e.g., NYTiems.com.

(4) Character-replacement typos: characters are replaced one at a time and the replacement is selected from the set of characters adjacent to the given character on the standard keyboard, e.g., DidneyWorld.com and USATodsy.com.

(5) Character-insertion typos: characters are inserted one at a time and the inserted character is chosen from the set of characters adjacent to either of the given pair on the standard keyboard (and including the given pair), e.g., WashingtonPoost.com and Googlle.com.

3.2.   Typo-Neighborhood Scanning

The Typo-Patrol scanner is an extension of our previous Strider HoneyMonkey scanner [5]. Given a typo-neighborhood list, it launches a browser to visit each domain and records all secondary URLs visited and their ordering, the content of all HTTP requests and responses, and optionally a screenshot.    

3.3.   Domain-Parking Analysis

We currently perform three types of analysis on the typo-neighborhood scan data:

(1) Given a target category and the lists of typos of target domains in the category, we analyze how heavily the category is being typo-squatted and which domain parking services are the major players. Specifically, we group the scanned typo domains by the parking services they generated third-party traffic to, and highlight those services that are behind a large number of typo domains.    

(2) Given the typo-patrol results of a trademarked target domain, we perform a similar analysis to identify those major parking services with which the trademark owner may want to file complaints. In some cases, it is more effective to go after the typo-squatters who actually purchased the typo domains than to complain to parking services which are only responsible for profiting from serving ads on those domains. We use two additional pieces of information to further divide and rank typo domains parked with a single service in order to help trademark owners prioritize their actions against typo-squatters.

The first piece of information is the Client ID field mentioned in Section 2. The second piece of information is the anchor domain that is used to aggregate traffic from multiple typo domains to simplify operations and to enable scalable typo-squatting. For example, tens of typo domains of NationalGeographic.com were “funneling” traffic through the same anchor playbov.com; typo domains LaSalleBanl.com and SovererignBank.com are sharing the same anchor baankaccount.com. We found that, in most cases, typo domains sharing the same anchor are registered to the same WhoIs registrant [17]. By grouping those typo domains that first redirect to the same anchor domain before generating traffic to the parking service, we eliminate the need to investigate each individual domains.

(3) For analyses that require searching for specific keywords (e.g., sexually-explicit keywords used in the analysis in Section 4.4), we analyze the HTTP response pages and extract all typo domains with a match.

4.      Typo-Patrol Data Analysis

We first present two kinds of analysis to assess the prevalence of typo-squatting and to identify major domain parking services that are involved: vertical analysis uses a single type of typos for a large number of target domains; horizontal analysis uses multiple types of typos for a smaller set of target domains. Then, we present a case study in which we identified thousands of typo domains owned by a well-known typo-squatter. Finally, we investigate typo domains of children’s websites that serve questionable ads.

4.1.   Missing-dot Typos of Top 10,000 Sites

For the vertical analysis, we scanned the missing-dot typos of the 10,000 most popular domains. Our result showed that 5,094 (51%) of the 10,000 typo domains were active at the time of the scan. Figure 1 ranks the top six domain parking services by the number of typo domains that serve ads from them. We make the following observations: (1) the top two parking services clearly stand out, each covering approximately 20% of active typo domains; (in addition, note that sedoparking.com uses the same ads-serving infrastructure as oingo.com according to http://www.sedoparking.com); (2) the top six parking services together account for more than half (59%) of the active domains and 30% of all the artificially generated missing-dot typo domains.

4.2.   Typo-Neighborhoods of Popular Sites and High-Risk Phishing Targets

For the horizontal analysis, we selected two sets of target domains: the first set consists of 30 of the most popular sites according to alexa.com; the second set consists of 30 high-risk targets by phishing attacks, selected from [18]. For each target domain, we scanned its typo-neighborhood composed of typo domains generated from all five typo-generation models. The two sets of results are shown in Figure 2 and Figure 3, respectively.

 

Parking service

# typos parked

 

% of active (5,094)

% of all (10,000)

#1

Information.com/

Domainsponsor.com

1,082

21%

11%

#2

Oingo.com

992

20%

9.9%

#3

Sedoparking.com

439

8.6%

4.4%

#4

Qsrch.com

227

4.5%

2.3%

#5

Netster.com

146

2.9%

1.5%

#6

Hitfarm.com

109

2.1%

1.1%

 

Total

2,995

59%

30 %

Figure 1. Top six domain parking services in the missing-dot typo-neighborhoods of top 10,000 websites

 

 

Parking service

# typos parked

% of active

(2,233)

% of all

(3,136)

#1

Oingo.com

420

19%

13%

#2

Information.com/

Domainsponsor.com

306

14%

9.8%

#3

Sedoparking.com

74

3.3%

2.4%

#4

Qsrch.com

74

3.3%

2.4%

#5

Hitfarm.com

69

3.1%

2.2%

#6

Netster.com