Check out the new USENIX Web site. next up previous
Next: Reducing TLB and Hash Up: Reducing the Cost of Previous: Fast Reload Code

Improving Hash Tables Away

The 603 databook recommends using hardware hashing assists to emulate the 604 behavior on the 603. Following this recommendation, the early Linux/PPC TLB miss handler code searched the hash table for a matching PTE. If no match was found, software would emulate a hash table miss interrupt and the code would execute as if it were on a 604 that had done a search in hardware. Our conjecture was that this approach simply added another level of indirection and would cause cache misses as the software stumbled about the hash table.

The optimization we tried was to eliminate any use of the hash table and to have the TLB miss handler go directly to the Linux PTE tree. By following this strategy we make a 180MHz 603 keep pace with a 185MHz 604 despite the two times larger L1 cache and TLB in the 604. In fact, on some LmBench points, the 180MHz 603 kept pace with a 200MHz 604 on a machine with significantly faster main memory and a better board design. Unfortunately, the 604 does not permit software to reload the TLB directly, which would allow us to make this optimization on the 604. The end result of these changes was a kernel compile time reduced by 5%.


Table 1: LmBench summary for direct (bypassing hash table) TLB reloads
processor pstart ctxsw pipe lat. pipe bw file reread
603 180MHz (htab) 1.8s 4 $\mu$s 17 $\mu$s 69 MB/s 33 MB/s
603 180MHz (no htab) 1.7s 3 $\mu$s 19 $\mu$s 73 MB/s 36 MB/s
604 185MHz 1.6s 4 $\mu$s 21 $\mu$s 88 MB/s 39 MB/s
604 200MHz 1.6s 4 $\mu$s 20 $\mu$s 92 MB/s 41 MB/s

Using software TLB reloads which are available on many platforms, such as the Alpha [9], MIPS [2] and Ultra-SPARC, allows the operating system designer to consider many different page-table data structures (such as clustered page tables [11]). If the hardware doesn't constrain the choices many optimizations can be made depending on the type of system and typical load the system is under.


next up previous
Next: Reducing TLB and Hash Up: Reducing the Cost of Previous: Fast Reload Code
Cort Dougan
1999-01-04