You are here
CodePlugin: Plugging Deduplication into Erasure Coding for Cloud Storage
Mengbai Xiao, George Mason University; Mohammed A. Hassan, NetApp, Inc.; Weijun Xiao, Virginia Commonwealth University; Qi Wei and Songqing Chen, George Mason University
Cloud storage systems play a key role in many cloud services. To tolerate multiple simultaneous disk failures and reduce the storage overhead, today cloud storage systems often employ erasure coding schemes. To simplify implementations, existing systems, such as MicrosoftAzure and EMC Atmos, only support file appending operations. However, this feature leads to a nontrivial and increasing portion of redundant data on cloud storage systems.
To reduce the data redundancy due to file updates by users so as to reduce the corresponding encoding and storage cost, in this work, we investigate how to efficiently integrate the inline deduplication capability into the general context of the Reed-Solomon (RS) code. For this purpose, we present our initial design of CodePlugin. Basically, CodePlugin introduces some preprocessing steps before the normal encoding. In these pre-processing steps, the data duplications are identified and properly shuffled so that the redundant blocks do not have to be encoded. CodePlugin is applicable to any existing coding scheme and our preliminary experimental results show that CodePlugin can effectively improve the encoding throughput (by ~20%) and reduce the storage cost (by ~17.4%).
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.