Check out the new USENIX Web site. next up previous
Next: Reliability and Monitoring Up: Reliability and Security in Previous: Reliability and Security in

Introduction


The recent development of Internet-scale network testbeds, such as PlanetLab, enables researchers to develop and deploy large-scale, wide-area network projects subjected to real traffic conditions. Previously, such systems have either been commercial enterprises (e.g., content distribution networks, or CDNs), or have been community-focused distributed projects (e.g., free file-sharing networks). If we define a design space of latency versus throughput and tightly-controlled versus decentralized management, we can see that existing CDNs and file-sharing services occupy three portions of the space. The remaining portion, latency-sensitive decentralized systems, remains more elusive, without an easily-identifiable representative. In this paper, we describe CoDeeN, an academic Content Distribution Network deployed on PlanetLab, that uses a decentralized design to address a latency-sensitive problem.

To reduce access latency, content distribution networks use geographically distributed server surrogates, which cache content from the origin servers, and request redirectors, which send client requests to the surrogates. Commercial CDNs [2,23] replicate pages from content providers and direct clients to the surrogates via custom DNS servers often coupled with URL rewriting by the content providers. The infrastructure for these systems is usually reverse-mode proxy caches with custom logic that interprets rewritten URLs. This approach is transparent to the end user, since content providers make the necessary changes to utilize the reverse proxies.

Our academic testbed CDN, CoDeeN, also uses caching proxy servers, but due to its non-commercial nature, engages clients instead of content providers. Clients must currently specify a CoDeeN proxy in their browser settings, which makes the system demand-driven, and allows us to capture more information on client access behavior. Given the high degree of infrastructural overlap, our future work may include support for non-commercial content providers, or even allowing PlanetLab members to automatically send their HTTP traffic to CoDeeN by using transparent proxying.

As shown in Figure 1, a CoDeeN instance consists of a proxy operating in both forward and reverse modes, as well as the redirection logic and monitoring infrastructure. When a client sends requests to a CoDeeN proxy, the node acts as a forward proxy and tries to satisfy the requests locally. Cache misses are handled by the redirector to determine where the request should be sent, which is generally another CoDeeN node acting as the reverse proxy for the origin server. For most requests, the redirector considers request locality, system load, reliability, and proximity when selecting another CoDeeN node. The reliability and security mechanisms can exclude nodes from being candidates, and can also reject requests entirely for various reasons described later.

Figure: CoDeeN architecture - Clients configure their browsers to use a CoDeeN node, which acts as a forward-mode proxy. Cache misses are deterministically hashed and redirected to another CoDeeN proxy, which acts as a reverse-mode proxy, concentrating requests for a particular URL. In this way, fewer requests are forwarded to the origin site.
\begin{figure}
\begin{center}
{\epsfig {file=figs/codeen_arch.ps,width=3.0in,height=1.7in, clip=}}
\vspace{-.125in}\vspace{-.15in}\end{center}
\end{figure}

Although some previous research has simulated caching in decentralized/peer-to-peer systems [13,26], we believe that CoDeeN is the first deployed system, and one key insight in this endeavor has been the observation that practical reliability is more difficult to capture than traditional fail-stop models assume. In our experience, running CoDeeN on a small number of PlanetLab nodes was simple, but overall system reliability degraded significantly as nodes were added. CoDeeN now runs on over 100 nodes, and we have found that the status of these proxy nodes are much more dynamic and unpredictable than we had originally expected. Even accounting for the expected problems, such as network disconnections and bandwidth contention, did not improve the situation. In many cases, we found CoDeeN unsuccessfully competing with other PlanetLab projects for system resources, leading to undesirable behavior.

The other challenging aspect of CoDeeN's design, from a management standpoint, is the decision to allow all nodes to act as ``open'' proxies, accepting requests from any client in the world instead of just those at organizations hosting PlanetLab nodes. This decision makes the system more useful and increases the amount of traffic we receive, but the possibility of abuse also increases the chances that CoDeeN becomes unavailable due to nodes being disconnected. However, we overestimated how long it would take for others to discover our system and underestimated the scope of activities for which people seek open proxies. Within days of CoDeeN becoming stable enough to stay continuously running, the PlanetLab administrators began receiving complaints regarding spam, theft of service, abetting identity theft, etc.

After fixing the discovered security-related problems, CoDeeN has been running nearly continuously since June 2003. In that time, it has received over 300 million requests from over 500,000 unique IP addresses (as of December 2003), while generating only three complaints. Node failure and overload are automatically detected and the monitoring routines provide useful information regarding both CoDeeN and PlanetLab. We believe our techniques have broader application, ranging from peer-to-peer systems to general-purpose monitoring services. Obvious beneficiaries include people deploying open proxies for some form of public good, such as sharing/tolerating load spikes, avoiding censorship, or providing community caching. Since ISPs generally employ transparent proxies, our techniques would allow them to identify customers abusing other systems before receiving complaints from the victims. We believe that any distributed system, especially those that are latency-sensitive or that run on non-dedicated environments, can benefit from our infrastructure for monitoring and avoidance.

The rest of the paper is organized as follows. In Section 2, we discuss system reliability and CoDeeN's monitoring facilities. We discuss the security problems facing CoDeeN in Section 3, followed by our remedies in Section 4. We then show some preliminary findings based on the data we collected and discuss the related work.



next up previous
Next: Reliability and Monitoring Up: Reliability and Security in Previous: Reliability and Security in
Vivek Pai
2004-05-04