Knockoff: Cheap Versions in the Cloud
Xianzheng Dou, Peter M. Chen, and Jason Flinn
Cloud-based storage provides reliability and ease-of-management. Unfortunately, it can also incur significant costs for both storing and communicating data. These costs increase when systems retain past versions of files for data recovery, auditing, and forensic troubleshooting. While techniques such as chunk-based deduplication and delta compression have proven very effective in reducing bytes stored and sent over the network, further optimizations to these techniques are yielding increasingly incremental benefits. We argue that it is time to consider additional strategies for reducing storage costs. In our current work, we are demonstrating that one such strategy, deterministic recomputation of data, can substantially reduce the cost of cloud storage. Our distributed file system, Knockoff, selectively substitutes nondeterministic inputs for file data. Our results show that this reduces the cost of sending files to the cloud without versioning by 21–24%; the relative benefit is substantially greater when past versions are retained.