Profiling by call site

Next: Other optimization opportunities Up: Case Study Previous: Memory lookup overhead

Profiling by call site

We take advantage of DeBox's flexibility by separating the kernel time consumption based on call site rather than call name. We are interested in the cost of handling dynamic content since SpecWeb99 includes 30% dynamic requests which could be processed by various interfaces. Flash uses a persistent CGI interface similar to FastCGI (28) to reuse CGI processes when possible, and this mechanism communicates over pipes. Although the read() and write() system calls are used by the main process, the helpers, and all of the CGI processes, we measure the overhead of only those involved in communication with CGI processes.

Our measurements show that the single call site responsible for most of the time is where the main process reads from the CGIs, consuming 20% of all kernel time, (176 seconds out of 891 seconds total). Writing the request to the CGI processes is much smaller, requiring only 24.3 seconds of system call time. This level of detail demonstrates the power of making performance a first-class result, since existing kernel profilers would not have been able to separate the time for the read() calls by call sites. By modifying our CGI interface slightly, the main process writes only the HTTP header to the client, and passes the socket to the CGI application to let it write the data directly. This change allows us to reach 710 connections (2.35GB dataset).

Next: Other optimization opportunities Up: Case Study Previous: Memory lookup overhead

Yaoping Ruan
2004-05-04