Check out the new USENIX Web site. next up previous
Next: Floating-Point Modes Up: Optimizations for x86 Previous: Optimizations for x86

Calling Convention

There are four problems with the x86 calling convention that make it difficult to port the RISC JVM to x86. First, the x86 argument passing and (some) return value passing are done on the stack instead of in registers. Second, the x86 dedicates two registers to stack management, a frame pointer and a stack pointer, when it is possible to use only a single register. Third, we need to be able to unwind the stack to implement the Java exception model, and changes to the x86 calling convention were required to simplify and speed up this unwinding. In addition, Java requires precise detection of stack overflow, which is difficult in the standard calling convention because almost any instruction can cause a stack overflow. Finally, the x86 calling convention enforces only 4-byte alignment of stack frames, which can be a performance problem because 8-byte stack operations might be unaligned.

Figure: Optimized Calling Convention
\begin{figure}{\small\begin{tabular}{\vert r\vert l\vert l\vert}
\hline
register...
...\\
\hline
esp & stack pointer & preserved\\
\hline
\end{tabular}}
\end{figure}

In order to solve all of these problems, we developed a new calling convention as shown in Figure 2. This register assignment gives us 3 scratch (caller-save) registers and 4 preserved (callee-save) registers, plus a stack pointer.

We modified the calling convention to use a fixed stack pointer over the life of a method, as opposed to the standard x86 convention which encourages the use of push and pop instructions which modify the stack pointer. Local stack variables can be accessed at constant offsets from the stack pointer. The optimized stack scheme of our implementation is shown in Figure 3. The prolog/epilog and a sample callsite of the optimized calling convention can be found in Figures 4 and 5 respectively.

By allocating a callee-saved register slot at the bottom of the stack frame, the prolog of a method can immediately check whether a stack overflow has occurred by storing a callee-saved register (or any value, if there aren't any registers that need to be saved) to the bottom of the stack frame. Thus, the only instructions that can cause a stack overflow are the first store in the method prolog, and call instructions (which push their return address). At both of these locations, stack overflow exceptions are simple to deal with.

We also took the opportunity while changing the calling convention to align stack frames to 8-byte boundaries for faster stack operations on the double type.

Figure: Optimized stack frame layout
\begin{figure}\centering\begin{tabular}{\vert c\vert}
\hline
input arguments  ...
...dots \\
\hline
callee-save space (4 bytes) \\
\hline
\end{tabular}\end{figure}

Figure: Method prolog/epilog of the optimized calling convention.
\begin{figure}\centering\begin{verbatim}subl $24, %esp
movl %ebx, (%esp)  ...

Figure: Example call site in the optimized calling convention.
\begin{figure}\centering\begin{verbatim}movl $1, %eax % 1st arg
movl $2, %ed...
...esp) % 3rd arg
call method
% return value
in %eax
\end{verbatim}\end{figure}


next up previous
Next: Floating-Point Modes Up: Optimizations for x86 Previous: Optimizations for x86