When is compare-by-hash appropriate?

Next: Alternatives to compare-by-hash Up: Questions about compare-by-hash Previous: Software and reliability

When is compare-by-hash appropriate?

Taking all this into account, when is it reasonable to use compare-by-hash? For one, users of software should know when they are getting best effort and when they are getting correctness. When using rsync, the user knows that there is a tiny but real possibility of an incorrect target file (in rsync's case, the user has only to read the man page). When using a file system, or incurring a page fault, users expect to get exactly the data they wrote, all the time. Another consideration is whether other users share the ``address space'' produced by compare-by-hash. If only trusted users write data to the system, they don't have to worry about maliciously generated collisions and can avoid known collisions. By these standards, rsync is an appropriate use of compare-by-hash, whereas LBFS, Venti, Pastiche, and Stanford's virtual machine migration are not.

2003-06-16