Check out the new USENIX Web site. next up previous
Next: Heap Profiling Up: CPU Time Profiling Previous: Statistical Sampling vs. Code

Thread-Aware CPU Time Sampling

The Java virtual machine is a multi-threaded execution environment. One difficulty in building CPU time profilers for such systems is how to properly attribute CPU time to each thread, so that the time spent in a method is accounted only when the method actually runs on the CPU, not when it is unscheduled and waiting to run. The basic CPU time sampling algorithm is as follows:

while (true) {
 - sleep for a short interval;
 - suspend all threads;
 - record the stack traces of all threads 
   that have run in the last interval;
 - attribute a cost unit to these stack
   traces;
 - resume all threads;
}

The profiler needs to suspend the thread while collecting its stack trace, otherwise a running thread may change the stack frames as the stack trace is being collected.

The main difficulty in the above scheme is how to determine whether a thread has run in the last sampling interval. We should not attribute cost units to threads that are waiting for an I/O operation, or waiting to be scheduled in the last sampling interval. Ideally, this problem would be solved if the scheduler could inform the profiler the exact time interval in which a thread is running, or if the profiler could find out the amount of CPU time a thread has consumed at each sampling point.

Unfortunately, modern operating systems such as Windows NT and Solaris neither expose the kernel scheduler nor provide ways to obtain accurate per-thread CPU time. For example, the GetThreadTimes call on Windows NT returns per-thread CPU time in 10 millisecond increments, too inaccurate for profiling needs.

Our solution is to determine whether a thread has run in a sampling interval by checking whether its register set has changed. If a thread has run in the last sampling interval, it is almost certain that the contents of the register set have changed.

The information gathered for the purpose of profiling need not be 100% reliable. It is extremely unlikely, however, that a running thread maintains an unchanged register set, which includes such registers as the stack pointer, the program counter, and all general-purpose registers. One pathological example of a running program with a constant register set is the following C code segment, where the program enters into an infinite loop that consists of one instruction:

    loop: goto loop;

In practice, we find that it suffices to compute and record a checksum of a subset of the registers, thus further reducing the overhead of the profiler.

The cost of suspending all threads and collecting their stack traces is roughly proportional of the number of threads running in the virtual machine. A minor enhancement to the sampling algorithm discussed earlier is that we need not suspend and collect stack traces for threads that are blocked on monitors managed by the virtual machine. This significantly reduces the profiling overhead for many multi-threaded programs in which most threads are blocked most of the time. Our experience shows that, for typical programs, the total overhead of our sampling-based CPU time profiler with a sampling interval of 1 millisecond is less than 20%.


next up previous
Next: Heap Profiling Up: CPU Time Profiling Previous: Statistical Sampling vs. Code
Sheng Liang
1998-12-19