Abstract
Typo-squatting refers to the practice of registering
domain names that are typo variations of popular websites. We propose a new
approach, called Strider Typo-Patrol, to discover large-scale, systematic
typo-squatters. We show that a large number of typo-squatting domains are
active and a large percentage of them are parked with a handful of major domain
parking services, which serve syndicated advertisements on these domains. We
also describe the Strider URL Tracer, a tool that we have released to allow
website owners to systematically monitor typo-squatting domains of their sites.
1. Introduction
Typo-squatting refers to the practice of registering domain names
that are typos of their target domains, which usually host
websites with significant traffic. The individuals or organizations who
register typo-squatting domains (or typo domains) are referred to as typo-squatters.
Some major typo-squatters are known to have registered thousands or more of typo
domains [1,2,3].
Web traffic generated through
typo-squatting is unwanted for many reasons. From the users’ perspective, such typo
traffic often startles them with unexpected results, followed by an
annoying barrage of pop-up and pop-under advertisements (ads). There is a
documented incident where a typo domain of a popular website was serving
vulnerability-exploiting scripts to install malware [4,5]. Some typo domains of
children’s
websites have been observed to redirect to or link to adult websites,
endangering Internet safety by potentially exposing minors to harmful material
[6,7].
From the business perspective, many of
the typo-squatting cases involve “bad-faith” domain registrations or trademark
violations [8,9,10]. Worse yet, it is not uncommon to see a typo domain
displaying ads from competitors of the target-domain owner or even negative ads
against the owner (e.g., investment-loss law firm’s ads on typos of brokerage
firms). In other cases, some advertisers are unwillingly paying for their ads being
served on typo domains of their own websites, because such traffic is intended
to go directly to their sites in the first place [11].
In this paper, we describe the Strider
Typo-Patrol System for discovering and analyzing typo domains. Our patrol
results reveal that a large percentage of typo domains are “parked” with a handful
of major domain parking services. Domain parking is a special case of advertisement
syndication: while the latter attempts to serve relevant contextual ads based
on the publishers’ web content, the former serves ads based on merely the
domain name because parked domains typically have no content. We show that many
typo-squatters are taking advantage of the domain-parking infrastructures to
perform large-scale, systematic typo-squatting. However, by doing so, they also
expose their typo domains to systematic discovery enabled by monitoring and
analyzing ads-fetching traffic sent to the parking services.
The paper is organized as follows.
Section 2 describes how domain parking works and discusses statistics related
to the amount of unwanted traffic potentially generated through typo-squatting.
Section 3 presents the Strider Typo-Patrol System. Section 4 analyzes
typo-patrol data and quantifies the prevalence of typo-squatting through domain
parking. Section 5 describes the Strider URL Tracer designed to provide visibility
and control over typo traffic. Section 6 surveys related work, Section 7
discusses remaining issues, and Section 8 concludes the paper.
2. Understanding Domain Parking
Advertisement
syndication refers to the business practice of serving ads by instructing the client-side
browser software to fetch ads from an ads server and compose them with the
content of the website that the user intends to visit. Syndication is typically
implemented using the browser’s third-party URL mechanism: when a user visits a
primary
URL (hosted by the first party) either by typing the URL into the
browser address bar or by clicking a link on a web page, the browser may be
instructed by the content returned by the primary-URL page to automatically
visit one or more secondary URLs hosted on third-party servers, without explicit
knowledge or permission from the user. We refer to these secondary URLs as third-party
URLs in this paper. These third-party URLs usually contain information
about the primary URL so that the syndicators can serve the most relevant contextual
ads based on the primary-URL page content and potentially the
historical information about the visiting machine or user.
Domain parking
is a
special case of advertisement syndication: the primary URL is a parked
domain that does not contain any real content and syndicated
domain-parking ads, usually in the form of ads listings, become the main
content of the page displayed to the user. In order to attract sufficient
traffic for serving ads, parked domains are usually domains with well-known generic
names [12] or typo domains of popular websites. See [13] for screenshots of
sample domains parked with various parking services.
Next, we use two actual examples to
illustrate how typo-squatting through domain parking is typically implemented
using third-party URLs. When a browser visits http://disneychannell.com,
it receives a response page containing a frame that loads http://www.sedoparking.com/disneychannell.com. This URL is responsible for serving the main
domain-parking ads listing. The basic idea of Strider Typo-Patrol is to scan a large number of typo domains, monitor
all third-party URL traffic, and group the domains by the behind-the-scenes domain
parking servers in order to facilitate investigation and prioritize actions.
Some domain parking services provide
additional information in their third-party URLs that enables further analysis.
For example, when a browser visits http://disneyg.com, the response page contains a frame that loads
http://apps5.oingo.com/apps/domainpark/domainpark.cgi?s=disneyg.com&dp_lp=24&hl=en&dp_lp=7&cid=DTRG4295&dp_p4pid=oingo_inclusion_xml_06&dp_format=1.3
where the “cid” field appears to contain a Client ID
that uniquely identifies a typo-squatter. In Section 4, we show how this information
enables us to quickly discover thousands of typo domains that are registered to
a well-known, serial typo-squatter [2,14].
Domain parking services provide
convenient and effective contextual-ads infrastructures that make even marginal
typo domains profitable [15]. With the annual domain registration fee as low as
$7.00 [16], a rule-of-thumb figure for pay-per-click programs is that a parked
typo domain only needs to attract between one
unique visitor every two days and two
visitors per day (depending on the pay-out levels) to generate sufficient
income to cover the fee. (As a reference, http://slsahdot.org records statistics of tens of hits per day.) According
to alexa.com on March 12, 2006, the servers owned
by the top two domain parking services identified in our study were reaching
between 3,300 and 5,200 per million users daily and their servers had a traffic
rank between #221 and #438. These numbers are comparable to those for popular
websites such as travelocity.com (#248),
orbitz.com (#315), usatoday.com (#347), and slashdot.org
(#375). Although many parked domains may be generic-name domains, the fact that
we were able to discover thousands of parked typo domains within a short time
through simple automated searching does provide evidence that unwanted traffic
due to parked typo domains could be significant.
3.
Strider
Typo-Patrol System
The Strider Typo-Patrol System provides
automatic scanning and systematic analysis of typo domains. It consists of
three main components: a typo-neighborhood
generator, a typo-neighborhood
scanner, and a domain-parking
analyzer.
3.1.
Typo-Neighborhood Generation
Given a target
domain, we define its typo-neighborhood
as the set of URLs generated from the following five typo-generation models,
which are commonly used in the wild:
(1) Missing-dot typos: The “.” following “www” is removed, e.g.,
wwwSouthwest.com.
(2) Character-omission
typos: Characters are
omitted one at a time, e.g., Diney.com
and MarthStewart.com.
(3) Character-permutation
typos: Consecutive
characters are swapped one pair at a time, unless they are the same characters,
e.g., NYTiems.com.
(4) Character-replacement
typos: characters are
replaced one at a time and the replacement is selected from the set of
characters adjacent to the given character on the standard keyboard, e.g., DidneyWorld.com
and USATodsy.com.
(5) Character-insertion typos: characters are inserted one at a
time and the inserted character is chosen from the set of characters adjacent
to either of the given pair on the standard keyboard (and including the given
pair), e.g., WashingtonPoost.com
and Googlle.com.
3.2.
Typo-Neighborhood Scanning
The Typo-Patrol scanner
is an extension of our previous Strider HoneyMonkey scanner [5]. Given a
typo-neighborhood list, it launches a browser to visit each domain and records all secondary URLs visited and their
ordering, the content of all HTTP requests and responses, and optionally a
screenshot.
3.3.
Domain-Parking Analysis
We currently perform three types of
analysis on the typo-neighborhood scan data:
(1) Given a target category and the lists of typos of
target domains in the category, we analyze how heavily the category is being
typo-squatted and which domain parking services are the major players.
Specifically, we group the scanned typo domains by the parking services they
generated third-party traffic to, and highlight those services that are behind
a large number of typo domains.
(2) Given the typo-patrol results of a trademarked target
domain, we perform a similar analysis to identify those major parking services
with which the trademark owner may want to file complaints. In some cases, it
is more effective to go after the typo-squatters who actually purchased the
typo domains than to complain to parking services which are only responsible
for profiting from serving ads on those domains. We use two additional pieces
of information to further divide and rank typo domains parked with a single service
in order to help trademark owners prioritize their actions against
typo-squatters.
The first piece of information is the
Client ID field mentioned in Section 2. The second piece of information is the anchor
domain that is used to aggregate traffic from multiple typo domains to
simplify operations and to enable scalable typo-squatting. For example, tens of
typo domains of NationalGeographic.com were “funneling” traffic through the same anchor playbov.com; typo domains LaSalleBanl.com and SovererignBank.com are sharing the same anchor baankaccount.com. We found that, in most cases, typo domains sharing the
same anchor are registered to the same WhoIs registrant [17]. By grouping those
typo domains that first redirect to the same anchor domain before generating
traffic to the parking service, we eliminate the need to investigate each
individual domains.
(3) For analyses that require searching for specific
keywords (e.g., sexually-explicit keywords used in the analysis in Section
4.4), we analyze the HTTP response pages and extract all typo domains with a
match.
4. Typo-Patrol Data Analysis
We first present two kinds of analysis to
assess the prevalence of typo-squatting and to identify major domain parking
services that are involved: vertical analysis uses a single type
of typos for a large number of target domains; horizontal analysis uses
multiple types of typos for a smaller set of target domains. Then, we present a
case study in which we identified thousands of typo domains owned by a
well-known typo-squatter. Finally, we investigate typo domains of children’s
websites that serve questionable ads.
4.1.
Missing-dot Typos of Top 10,000 Sites
For the
vertical analysis, we scanned the missing-dot typos of the 10,000 most popular
domains. Our
result showed that 5,094 (51%) of the 10,000 typo domains were
active at the time of the scan. Figure 1 ranks the top six domain parking services by the
number of typo domains that serve ads from them. We make the following
observations: (1) the top two parking services clearly stand out, each covering
approximately 20% of active typo domains; (in addition, note that sedoparking.com uses the same ads-serving
infrastructure as oingo.com according to
http://www.sedoparking.com);
(2) the top six parking services together
account for more than half (59%) of the active domains and 30%
of all the artificially generated missing-dot typo domains.
4.2.
Typo-Neighborhoods of Popular Sites and High-Risk
Phishing Targets
For the horizontal analysis, we selected
two sets of target domains: the first set consists of 30 of the most popular
sites according to alexa.com; the second
set consists of 30 high-risk targets by phishing attacks, selected from [18].
For each target domain, we scanned its typo-neighborhood composed of typo
domains generated from all five typo-generation models. The two sets of results
are shown in Figure
2 and Figure
3, respectively.
|
|
Parking service
|
# typos parked
|
% of active
(5,094)
|
% of all (10,000)
|
|
#1
|
Information.com/
Domainsponsor.com
|
1,082
|
21%
|
11%
|
|
#2
|
Oingo.com
|
992
|
20%
|
9.9%
|
|
#3
|
Sedoparking.com
|
439
|
8.6%
|
4.4%
|
|
#4
|
Qsrch.com
|
227
|
4.5%
|
2.3%
|
|
#5
|
Netster.com
|
146
|
2.9%
|
1.5%
|
|
#6
|
Hitfarm.com
|
109
|
2.1%
|
1.1%
|
|
|
Total
|
2,995
|
59%
|
30 %
|
Figure
1. Top six domain parking services in the
missing-dot typo-neighborhoods of top 10,000 websites
|
|
Parking service
|
# typos parked
|
% of active
(2,233)
|
% of all
(3,136)
|
|
#1
|
Oingo.com
|
420
|
19%
|
13%
|
|
#2
|
Information.com/
Domainsponsor.com
|
306
|
14%
|
9.8%
|
|
#3
|
Sedoparking.com
|
74
|
3.3%
|
2.4%
|
|
#4
|
Qsrch.com
|
74
|
3.3%
|
2.4%
|
|
#5
|
Hitfarm.com
|
69
|
3.1%
|
2.2%
|
|
#6
|
Netster.com
|
|