As an example of a system that improves on compare-by-hash while retaining correctness, compare rsync and BitKeeper, a commercial source configuration management tool. They both solve the problem of keeping several source code trees in sync. (We will ignore the unrelated features of BitKeeper, such as versioning, in this comparison.) Rsync is stateless; it has no a priori knowledge of the relationship between two source code trees. It uses compare-by-hash to determine which blocks are different between the two trees and sends only the blocks with different hashes. BitKeeper keeps state about each file under source control and knows what changes have been made since the last time each tree was synchronized. When synchronizing, it sends only the differences since the last synchronization occurred, in compressed form. In comparison to rsync, BitKeeper provides similar and sometimes better bandwidth usage when simply synchronizing two trees without resorting to compare-by-hash. Improvements BitKeeper provides over rsync include elimination of reverse updates (synchronizing in the wrong direction and losing your changes), automerging algorithms optimized for source code (so trees can be updated in parallel and then synchronized), and intelligent handling of metadata operations such as renaming of files (which rsync sees as deletion and creation of files).
With a little more programming effort, we can get the bandwidth reduction promised by compare-by-hash without sacrificing correctness and at the same time adding functionality. Compare-by-hash still has applications in areas where statelessness and low bandwidth are more important than correctness of data referenced, and users are aware of the risk they are taking, as in rsync.