HotStorage '10 Session Abstracts

WORKSHOP PROGRAM ABSTRACTS

Tuesday, June 22, 2010

9:05 a.m.–10:40 a.m.

Removing the Costs of Indirection in Flash-based SSDs with Nameless Writes
Back to Program
We present nameless writes, a new interface that obviates the need for indirection in modern solid-state storage devices (SSDs). Nameless writes allow the device to pick the location of a write and only then inform the client above of the decision. Doing so keeps control of block allocation decisions in the device, thus enabling it to perform important tasks such as wear-leveling, while removing the need for large and costly indirection tables. We discuss the proposed interface as well as the requisite device and file-system support.

Depletable Storage Systems
Back to Program
Depletable storage systems use media such as NAND flash that has a limited lifetime, which decreases with more usage. Such systems treat write cycles, in addition to space, as a constrained resource of the system. Depletable storage systems must be equipped to monitor writes, attribute depletion to appropriate applications, and even control the rate of depletion. We outline the new functionalities enabled by depletion-aware mechanisms and discuss the challenges in building them.

How I Learned to Stop Worrying and Love Flash Endurance
Back to Program
Flash memory in Solid-State Disks (SSDs) has gained tremendous popularity in recent years. The performance and power benefits of SSDs are especially attractive for use in data centers, whose workloads are I/O intensive. However, the apparent limited write-endurance of flash memory has posed an impediment to the wide deployment of SSDs in data centers. Prior architecture and system level studies of flash memory have used simplistic endurance estimates derived from datasheets to highlight these concerns. In this paper, we model the physical processes that affect endurance, which include both stresses to the memory cells as well as a recovery process. Using this model, we show that the recovery process, which the prior studies did not consider, significantly boosts flash endurance. Using a set of real enterprise workloads, we show that this recovery process allows for orders of magnitude higher number of writes and erases than those given in datasheets. Our results indicate that SSDs that use standard wear-leveling techniques are much more resilient under realistic operating conditions than previously assumed and serve to explain some trends observed in recent flash measurement studies.

11:00 a.m.–12:05 p.m.

Block-level RAID Is Dead
Back to Program
The common storage stack as found in most operating systems has remained unchanged for several decades. In this stack, the RAID layer operates under the file system layer, at the block abstraction level. We argue that this arrangement of layers has fatal flaws. In this paper, we highlight its main problems, and present a new storage stack arrangement that solves these problems.

Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability
Back to Program
Mean Time To Data Loss (mttdl) has been the standard reliability metric in storage systems for more than 20 years. mttdl represents a simple formula that can be used to compare the reliability of small disk arrays and to perform comparative trending analyses. The mttdl metric is often misused, with egregious examples relying on the mttdl to generate reliability estimates that span centuries or millennia. Moving forward, the storage community needs to replace MTTDL with a metric that can be used to accurately compare the reliability of systems in a way that reflects the impact of data loss in the real world.

1:45 p.m.–3:20 p.m.

KVZone and the Search for a Write-Optimized Key-Value Store
Back to Program
Key-value stores are becoming a popular choice for persistent data storage for a wide variety of applications, and multiple implementations are currently available. Deciding which one to use for a specific application requires comparing performance, a daunting task due to the lack of benchmarking tools for such purpose. We present KVZone, a tool specifically designed to evaluate key-value store performance. We used KVZone to search for a key-value store suitable for implementing a low-latency content-addressable store that supports write-intensive workloads. We present a comparative evaluation of three popular key-value stores: Berkeley DB, Tokyo Cabinet, and SQLite, and find that none is capable of approaching the IO rate of our persistent device (a high-throughput SSD). Finally, we present the Alphard key-value store which is optimized for such workloads and devices.

Rethinking Deduplication Scalability
Back to Program
Deduplication, a form of compression aiming to eliminate duplicates in data, has become an important feature of most commercial and research backup systems. Since the advent of deduplication, most research efforts have focused on maximizing deduplication efficiency—i.e., the offered compression ratio—and have achieved near-optimal usage of raw storage. However, the capacity goals of next-generation Petabyte systems requires a highly scalable design, able to overcome the current scalability limitations of deduplication. We advocate a shift towards scalability-centric design principles for deduplication systems, and present some of the mechanisms used in our prototype, aiming at high scalability, good deduplication efficiency, and high throughput.

TrapperKeeper: The Case for Using Virtualization to Add Type Awareness to File Systems
Back to Program
TrapperKeeper is a system that enables the development of type aware file system functionality. In contrast to existing plug-in-based architectures that require a software developer to write and maintain code for each file type, TrapperKeeper requires no type-specific code. Instead, TrapperKeeper executes existing software applications that already parse the desired file type in virtual machines. It then uses accessibility APIs to control the application and extract desired information from the application's graphical user interface.

3:45 p.m.–4:50 p.m.

Fast and Cautious Evolution of Cloud Storage
Back to Program
When changing a storage system, the stakes are high. Any modification can undermine stability, causing temporary downtime, a permanent loss of data, and still worse—a loss of user confidence. This results in a cautious conservatism among storage developers. On one hand, the risks do justify taking great care with storage system changes. On the other hand, this slow and cautious deployment attitude is a poor match for cloud services tied closely to web-based frontends that follow an "always beta" mantra. Unlike traditional enterprise servers, cloud-based systems are still exploring what facilities should be provided by the storage layer, requiring that storage services be able to evolve as quickly as the applications that consume them. In this paper, we argue that by building support for evolution into the basic structure of a storage system, new features (and fixes) can be deployed in a fast and cautious manner. We summarize our experiences in developing such a system and detail its requirements and design. We also share some initial experience in deploying it on a rapidly evolving, but production, cloud hosting service that we have been building at UBC.

Adaptive Memory System over Ethernet
Back to Program
For cloud computing, computer infrastructures need to be scaled up adaptively. However, their local memories cannot be expanded beyond the amount loaded to each computer. We present a method for scaling up of memory system beyond its local memory's capacity by high-speed page swapping using an adaptively attached solid-state disk (SSD) to a computer. Our PCI Express (PCIe) technology, "ExpEther" (Express Ether), interconnects a computer and a PCIe-based SSD via a standard Ethernet. The data transfer between the local memory of the computer and the SSD is performed without slow TCP/IP but with PCIe-standard direct memory access (DMA). It achieves IOPS of 33-K read and 36-K write for an access of 4-KB page size, which is twice as good as that for iSCSI with TCP-offloading. With the proposed method, a computer which only has 2-GB local physical memory can sustain its performance even when a 10-GB in-memory database is loaded.

Need help? Use our Contacts page.

Last changed: 25 June 2010 jp