You are here
Erasure Encoding—Practice and Principles
It's common knowledge that the volume of global data has exploded. Simultaneously, the challenge to store, protect, and access this data securely "at scale" has produced hyperscale hardware and software architectures that continue to subduct traditional enterprise datacenter systems. These new architectures will prove essential inresponding to the unrelenting global "data tsunami".
One important hyperscale data storage methodology is Object Storage. Object Storage often uses Erasure Coding as a means to reduce data loss probabilities while simultaneously economizing data storage capital costs. Erasure Coding's powerful principles are also found in numerous other data retention methodologies, including Information Dispersal Algorithm (IDA) deployments and Secret Sharing, a method of providing shared-data security.
Unfortunately, understanding Erasure Coding's deployment strategies and powerful foundations can quickly prove challenging, if not impossible, because Erasure Coding's simple principles are typically steeped in academic obfuscation. This has historically presented impenetrable obstacles to many engineers. Luckily, that's totally unnecessary.
The first part of this tutorial will provide a brief Object Storage and Erasure Coding introduction as a backdrop for a deep exploration of effective Erasure Coding deployment strategies, including performance and bandwidth tradeoff considerations. It will also introduce IDA and Secret Sharing and briefly discuss their relation to Erasure Coding.
After an intermission, the second part of the tutorial will provide a programming lab which exercises running Python 2.7 programs distributed on the FAST '16 Tutorial Sessions USB thumb drive. This lab should help cement Erasure Code principles and deployment considerations as well as provide demonstrations of their utility. As an example, the programs will illustrate Erasure Code operations using tables as well as on-the-fly calculations—useful in configurations where it is necessary to trade processing cycles for addressable memory.
This tutorial portion will conclude with an intense, but an extremely accessible, Erasure Coding principles discussion that will be of interest for attendees desiring a deeper understanding of how Erasure Codes achieve their results. This material will be devoid of impenetrable mathematical jargon typically prevalent in Erasure Code literature. The discussion progressively examines various Galois Finite Fields in detail, with a brief discussion of GF(2^16).
Finally, the tutorial will include discussion from the forthcoming book titled Exabyte Data Preservation, Postponing the Inevitable, co-authored by the speakers and Dr. Ethan Miller of University of California, Santa Cruz.
- Brief Object Storage Introduction
- Erasure Coding and Object Storage
- Erasure Coding Deployment Strategy and Tradeoff Considerations
- Information Dispersal Algorithm and Secret Sharing
- Understanding Galois Finite Fields
- Galois Finite Field Computations (made extremely accessible)
- Python 2.7 Galois Finite Field Computation Demonstration Programs
- Python 2.7 programming lab