Non-alternatives to compare-by-hash

Next: Existence proof: Rsync vs. Up: Alternatives to compare-by-hash Previous: Alternatives to compare-by-hash

Non-alternatives to compare-by-hash

Several minor variations on compare-by-hash were proposed to me during the writing of this paper. I mention them here because they come up frequently enough in conversation with other operating systems researchers that it is worth explaining why they don't actually solve any of the problems of compare-by-hash.

In double-hashing, we first hash the block the normal way, then hash it again backwards, or starting at an offset. Then we use both hashes as the address. Another proposal is to first encrypt the block, then hash it. In both cases, we are merely creating a slightly different hash function, which still suffers from collisions. As long as the number of bits in the output is smaller than the number of bits in the input, collisions are inevitable. An easy way to think about this is to count the number of possible different inputs and compare that to the number of possible different outputs. If there are fewer outputs than inputs, then no function can map every input to a different output. Designing collision-resistant hashes is surprisingly difficult, and probably should be left to the experts.

Next: Existence proof: Rsync vs. Up: Alternatives to compare-by-hash Previous: Alternatives to compare-by-hash

2003-06-16