Next: Background: Compression Up: The Case for Compressed Previous: Introduction

Compression Algorithms

In [WLM91] we explained how a compressor with a knowledge of a programming language implementation could exploit that knowledge to achieve high compression ratios for data used by programs. In particular, we explained how pointer data contain very little information on average, and that pointers can often be compressed down to a single bit.

Here we describe algorithms that make much weaker assumptions, primarily exploiting data regularities imposed by hardware architectures and common programming and language-implementation strategies. These algorithms are fast and fairly symmetrical--compression is not much slower than decompression. This makes them especially suitable for compressed virtual memory applications, where pages are compressed about as often as they're decompressed. ¹

As we explain below, the results for these algorithms are quite encouraging. A straightforward implementation in C is competitive with the best assembly-coded Ziv-Lempel compressor we could find, and superior to the LZRW1 algorithm (written in C by Ross Williams)[Wil91a] used in previous studies of compressed virtual memory and compressed file caching.

As we will explain, we believe that our results are significant not only because our algorithms are competitive and often superior to advanced Ziv-Lempel algorithms, but because they are different. Despite their immaturity, they work well, and they complement other techniques. They also suggest areas for research into significantly more effective algorithms for in-memory data.

(Our algorithms are also interesting in that they could be implemented in a very small amount of hardware, including only a tiny amount of space for dictionaries, providing extraordinarily fast and cheap compression with a small amount of hardware support.)

Next: Background: Compression Up: The Case for Compressed Previous: Introduction

Scott F. Kaplan
1999-04-27