Check out the new USENIX Web site. next up previous
Next: Capturing rare anomaly paths Up: Case Study Previous: Initial experiments

Successive refinement of detail


With the server exhibiting idle CPU time but an inability to meet SpecWeb99's quality-of-service requirements, an obvious candidate is blocking system calls. However, Flash's main process is designed to avoid blocking. We tried tracing the problem using existing tools, but found they suffered from the problems discussed in section 2. These experiences motivated the creation of DeBox.


Table: Summarized DeBox output showing blocking counts - The layout is organized by resource label and system call name. For example, of the 127 times this test blocked with the ``inode'' label, 28 were from the open() system call
biord/166 inode/127 getblk/1 sfpbsy/1
open/162 readlink/84 close/1 sendfile/1
read/3 open/28    
unlink/1 read/9    
  stat/6    



The DeBox structures provide various levels of detail, allowing applications to specify what to measure. A typical use would first collect the basic DeBoxInfo to observe anomalies, then enable more details to identify the affected system calls, invocations, and finally the whole call trace.

We first use DeBox to get the blocking information, which is stored in the PerSleepInfo field. The PerSleepInfo data shows seven different system calls blocking in the kernel, and examination of the resource labels (wmesg) shows four reasons for blocking. These results are shown in Table 4, where each column header shows the resource label causing the blocking, followed by the total number of times blocked at that label. The elements in the column are the system calls that block on that resource, and the number of invocations involved. As evidenced by the calls involved, the ``biord'' (block I/O read) and ``inode'' (vnode lock) labels are both involved in opening and retrieving files, which is not surprising since our data set exceeds the physical memory of the machine.

The finest-grained kernel information is provided in the CallTrace structure, and we enable this level of detail once the PerSleepInfo identifies possible candidates. The main process should only be accessing cached data, so the fact that it blocks on disk-related calls is puzzling. For portability, the main process in Flash uses the helpers to demand-fetch disk data and metadata into the OS caches, and repeats the operation immediately after the helpers have completed loading, assuming that the recently loaded information will prevent it from blocking. Observing the full CallTrace of some of these blocking invocations shows the blocking is not caused by disk access, but contention on filesystem locks. Combining the blocking information from helper processes reveals that when the main process blocks, the helpers are operating on similarly-named files as the main process. We solve this problem by having the helpers return open file descriptors using sendmsg(), eliminating duplication of work in the main process. With this change, we are able to handle 370 simultaneous connections from SpecWeb99, with a dataset size of 1.28GB.



next up previous
Next: Capturing rare anomaly paths Up: Case Study Previous: Initial experiments
Yaoping Ruan
2004-05-04