Check out the new USENIX Web site. next up previous
Next: Filter Specification Up: Architecture Design Previous: Prioritized Listen Queue

HTTP Header-based Controls

The SYN policer and prioritized listen queue have limited knowledge about the type and nature of a connection request, since they are based on the information available in the TCP and IP headers. For web servers with the majority of the traffic being HTTP over TCP, a more informed control is possible by examining the HTTP headers. For example, a majority of the load is caused by a few CGI requests and most of the bytes transferred belong to a small set of large files. This suggests that targeting specific URLs, types of URLs, or cookie information for service differentiation can have a wide impact during overload.

Our third mechanism, {\em HTTP header-based connection control}, enables content-based connection control by examining application layer information in the HTTP header, such as the URL name or type (e.g., CGI requests) and other application-specific information available in cookies. The control is applied in the form of rate policing and priorities based on URL names and types and cookie attributes.

This mechanism involves parsing the HTTP header in the kernel and waking the sleeping web server process only after a decision to service the connection is made. If a connection is discarded, a TCP RST is sent to the client and the socket receive buffer contents are flushed.

{\figurename}: {\dimen0=\fontdimen6\the\font
\lineskip=1\dimen0
\advance\lineskip.5\fontdimen...
...aselineskip
\ignorespaces
The HTTP header-based connection control mechanism.}
\begin{figure}
\begin{center}
\epsfig {file=figures/url_scheme.eps, width=0.4\textwidth}\end{center}\end{figure}



{\tablename} 1: URL action table
URL ACTION
*noaccess* drop
/shop.html priority=1
/index.html rate=15 conn./sec, burst=5 conn.
priority=1
/cgi-bin/* rate=10 , burst=2

For URL parsing, our implementation relies upon Advanced Fast Path Architecture(AFPA) [13], an in-kernel web cache on AIX. For Linux, an in-kernel web engine called KHTTPD is available [14]. As opposed to the normal operation, where the sleeping process is woken up after a connection is established, AFPA responds to cached HTTP requests directly without waking up the server process. With AFPA, a connection is {\em not} moved out of the partial listen queue even after the 3-way handshake is over. The normal data flow of TCP continues with the data being stored in the socket receive buffer. When the HTTP header is received (that is when the AFPA parser finds two CR control characters in the data stream), AFPA checks for the object in its cache. On a cache miss, the socket is moved to the listen queue and the web server process is woken up to service the request.

The HTTP header-based connection control mechanism comes into play at this juncture, as illustrated in Figure 3, before the socket is moved out of the partial listen queue. The URL action table (Table 1) specifies three types of actions/controls for each URL or set of URLs. A drop action implies that a TCP RST is sent before discarding the connection from the partial listen queue and flushing the socket receive buffer. If a priority value is set it determines the location of the corresponding socket in the ordered listen queue. Finally, rate control specifies a token bucket profile of a $< $rate, burst$>$ pair which drops out-of-profile connections similar to the SYN policer.


next up previous
Next: Filter Specification Up: Architecture Design Previous: Prioritized Listen Queue
Renu Tewari
2001-05-01