Check out the new USENIX Web site. next up previous
Next: Latency Up: Case Study Previous: Other optimization opportunities

Case Study Summary


By addressing the interaction areas identified by DeBox, we achieve a factor of four improvement in our SpecWeb99 score, supporting four times as many simultaneous connections while also handling a data set that almost three times as large as the physical memory of our machine. The SpecWeb99 results of our modifications can be seen in Figure 8, where we show the scores for all of the intermediate modifications we made. Our final result of 820 compares favorably to published SpecWeb99 scores, though no directly comparable systems have been benchmarked. We outperform all uniprocessor systems with similar memory configurations but using other server software - the highest score for a system with less than 2GB of memory is 575.

Figure: SpecWeb99 summary - 1. Original 2. VM patch 3. Using sendfile() 4. FD-passing helpers 5. Fork helper 6. Eliminate mmap 7. New CGI interface 8. New sendfile()
\begin{figure}
\centerline {\psfig{figure=debox_spec99.ps,width=4in,height=2.5in}}\vspace{-.125in}\vspace{-.125in}\end{figure}

Most of our changes are portable architectural modifications to the Flash Web Server, including (1) passing file descriptors between the helpers and the main process to avoid most disk operations in the main process, (2) introducing a new fork() helper to handle forking CGI requests, (3) eliminating the mapped file cache, and (4) allowing CGI processes to write directly to the clients instead of writing to the main process. Figure 9 shows the original and new architectures of the static content path for the server.

Figure: Architectural changes - The architecture is greatly simplified by using file descriptor passing and eliminating mapped file caching. Modified components are indicated with dark boxes.
\begin{figure}
\vspace{-.15in}
\centerline {\epsfig{figure=flash-arch.eps,width=4in,height=3.5in}}\vspace{-.125in}\vspace{-.125in}\end{figure}

The changes we make to the operating system focus on sendfile(), including (1) adding a new flag and return value to indicate when blocking on disk would occur, (2) caching kernel address space mapping to avoid unnecessary physical map operations, and (3) sending headers and file data in a single mbuf chain to avoid multiple packets for small responses. Additionally, we apply a virtual memory system patch that ultimately is superfluous since we remove the memory-mapped file cache. We have provided our modifications to the FreeBSD developer group and all three optimizations have been incorporated into FreeBSD.



next up previous
Next: Latency Up: Case Study Previous: Other optimization opportunities
Yaoping Ruan
2004-05-04