A Case for Packing and Indexing in Cloud File Systems

Authors:

Saurabh Kadekodi, Carnegie Mellon University; Bin Fan and Adit Madan, Alluxio, Inc.; Garth A. Gibson, Carnegie Mellon University, Vector Institute; Gregory R. Ganger, Carnegie Mellon University

Abstract:

Small (kilobyte-sized) objects are the bane of highly scalable cloud object stores. Larger (at least megabyte-sized) objects not only improve performance, but also result in orders of magnitude lower cost, due to the current operation-based pricing model of commodity cloud object stores. For example, in Amazon S3's current pricing scheme, uploading 1GiB data by issuing 4KiB PUT requests (at 0.0005 cents each) is approximately $57\times$ more expensive than storing that same 1GiB for a month. To address this problem, we propose client-side packing of small immutable files into gigabyte-sized \textit{blobs} with embedded indices to identify each file's location. Experiments with a packing implementation in Alluxio (an open-source distributed file system) illustrate the potential benefits, such as simultaneously increasing file creation throughput by up to 60$\times$ and decreasing cost to $1/25000$ of the original.