Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • FAST '13 Home
  • Organizers
  • Registration Information
  • Registration Discounts
  • At a Glance
  • Calendar
  • Training Program
  • Technical Sessions
  • Purchase the Box Set
  • Posters and WiPs
  • Birds-of-a-Feather Sessions
  • Sponsors
  • Activities
  • Hotel and Travel Information
  • Services
  • Students
  • Questions
  • Help Promote
  • For Participants
  • Call for Papers
  • Past Proceedings

sponsors

Platinum Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Technical Sessions
Tweet

connect with us

http://twitter.com/usenix
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups?home=&gid=49559
http://www.youtube.com/user/USENIXAssociation

Technical Sessions

To access a presentation's content, please click on its title below.

Proceedings Front Matter: 
Cover Page | Title Page and List of Organizers | Message from the Program Co-Chairs

The full FAST '13 conference proceedings (in PDF, EPUB, and MOBI) and the table of contents (in PDF), which contain papers entitled SD Codes: Erasure Codes Designed for How Storage Systems Really Fail and Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions, which were posted here, have been taken down due to a dispute over the two papers' contents. Please see this memo from February 17, 2015, for more information. Note that all other FAST '13 papers are available individually below.

Attendee Files 
FAST '13 Errata Slip (PDF)
FAST '13 Errata Slip (EPUB)
FAST '13 Errata Slip (MOBI)

 

Wednesday, February 13, 2013

8:45 a.m.–9:00 a.m. Wednesday

Opening Remarks and Best Paper Awards

Imperial Ballroom

Program Co-Chairs: Keith A. Smith, NetApp, and Yuanyuan Zhou, University of California, San Diego

FAST '13 Opening Remarks & Awards

Available Media

  • Read more about FAST '13 Opening Remarks & Awards
9:00 a.m.–10:30 a.m. Wednesday

File Systems

Imperial Ballroom

Session Chair: Ric Wheeler, Red Hat

ffsck: The Fast File System Checker

Ao Ma, EMC Corporation and University of Wisconsin—Madison; Charlotte Dragga, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison

Crash failures, hardware errors, and file system bugs can corrupt file systems and cause data loss, despite the presence of journals and similar preventive techniques. While consistency checkers such as fsck can detect this corruption and restore a damaged image to a usable state, they are generally created as an afterthought, to be run only at rare intervals. Thus, checkers operate slowly, causing significant downtime for large scale storage systems when they are needed.

We address this dilemma by treating the checker as a key component of the overall file system (and not merely a peripheral add-on). To this end, we present a modified ext3 file system, rext3, to directly support the fast file system checker, ffsck. The rext3 file system co-locates and self-identifies its metadata blocks, removing the need for costly seeks and tree traversals during checking. These modifications to the file system allow ffsck to scan and repair the file system at rates approaching the full sequential bandwidth of the underlying device. In addition, we demonstrate that rext3 performs competitively with ext3 in most cases and exceeds it in handling random reads and large writes.

Available Media

Building Workload-Independent Storage with VT-Trees

Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok, Stony Brook University

As the Internet and the amount of data grows, the variability of data sizes grows too—from small MP3 tags to large VM images. With applications using increasingly more complex queries and larger data-sets, data access patterns have become more complex and randomized. Current storage systems focus on optimizing for one band of workloads at the expense of other workloads due to limitations in existing storage system data structures. We designed a novel workload-independent data structure called the VT-tree  which extends the LSM-tree to efficiently handle sequential and file-system workloads. We designed a system based solely on VT-trees which offers concurrent access to data via file system and database APIs, transactional guarantees, and consequently provides efficient and scalable access to both large and small data items regardless of the access pattern. Our evaluation shows that our user-level system has 2–6.6  better performance for random-write workloads and only a small average overhead for other workloads.

Available Media

A Study of Linux File System Evolution

Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Shan Lu, University of Wisconsin—Madison
    Awarded Best Paper! 

We conduct a comprehensive study of file-system code evolution. By analyzing eight years of Linux file-system changes across 5079 patches, we derive numerous new (and sometimes surprising) insights into the file-system development process; our results should be useful for both the development of file systems themselves as well as the improvement of bug-finding tools.

Available Media


10:30 a.m.–11:00 a.m. Wednesday

Break

 Market Street Foyer
 
11:00 a.m.–12:20 p.m. Wednesday

Caching

Imperial Ballroom

Session Chair: Eno Thereska, Microsoft Research

Write Policies for Host-side Flash Caches

Ricardo Koller, Florida International University and VMware; Leonardo Marmol and Raju Rangaswami, Florida International University; Swaminathan Sundararaman and Nisha Talagala, FusionIO; Ming Zhao, Florida International University

Host-side flash-based caching offers a promising new direction for optimizing access to networked storage. Current work has argued for using host-side flash primarily as a read cache and employing a write-through policy which provides the strictest consistency and durability guarantees. However, write-through requires synchronous updates over the network for every write. For write-mostly or write-intensive workloads, it significantly under-utilizes the high-performance flash cache layer. The write-back policy, on the other hand, better utilizes the cache for workloads with significant write I/O requirements. However, conventional write-back performs out-of-order eviction of data and unacceptably sacrifices data consistency at the network storage.

We develop and evaluate two consistent write-back caching policies, ordered and journaled, that are designed to perform increasingly better than write-through. These policies enable new trade-off points across performance, data consistency, and data staleness dimensions. Using benchmark workloads such as PostMark, TPC-C, Filebench, and YCSB we evaluate the new write policies we propose alongside conventional write-through and write-back. We find that ordered write-back performs better than write-through. Additionally, we find that journaled write-back can trade-off staleness for performance, approaching, and in some cases, exceeding conventional write-back performance. Finally, a variant of journaled write-back that utilizes consistency hints from the application can provide straight forward application-level storage consistency, a stricter form of consistency than the transactional consistency provided by write-through.

Available Media

Warming Up Storage-Level Caches with Bonfire

Yiying Zhang, University of Wisconsin—Madison; Gokul Soundararajan, Mark W. Storer, Lakshmi N. Bairavasundaram, and Sethuraman Subbiah, NetApp; Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison

Large caches in storage servers have become essential for meeting service levels required by applications. These caches need to be warmed with data often today due to various scenarios including dynamic creation of cache space and server restarts that clear cache contents. When large storage caches are warmed at the rate of application I/O, warmup can take hours or even days, thus affecting both application performance and server load over a long period of time.

We have created Bonfire, a mechanism for accelerating cache warmup. Bonfire monitors storage server workloads, logs important warmup data, and efficiently preloads storage-level caches with warmup data. Bonfire is based on our detailed analysis of block-level data-center traces that provides insights into heuristics for warmup as well as the potential for efficient mechanisms. We show through both simulation and trace replay that Bonfire reduces both warmup time and backend server load significantly, compared to a cache that is warmed up on demand.

Available Media

Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory

Eunji Lee and Hyokyung Bahn, Ewha University; Sam H. Noh, Hongik University
    Awarded Best Paper! 

Journaling techniques are widely used in modern file systems as they provide high reliability and fast recovery from system failures. However, it reduces the performance benefit of buffer caching as journaling accounts for a bulk of the storage writes in real system environments. In this paper, we present a novel buffer cache architecture that subsumes the functionality of caching and journaling by making use of non-volatile memory such as PCM or STT-MRAM. Specifically, our buffer cache supports what we call the in-place commit scheme. This scheme avoids logging, but still provides the same journaling effect by simply altering the state of the cached block to frozen. As a frozen block still performs the function of caching, we show that in-place commit does not degrade cache performance. We implement our scheme on Linux 2.6.38 and measure the throughput and execution time of the scheme with various file I/O benchmarks. The results show that our scheme improves I/O performance by 76% on average and up to 240% compared to the existing Linux buffer cache with ext4 without any loss of reliability.

Available Media


12:20 p.m.–2:00 p.m. Wednesday

Conference Luncheon

Regency Ballroom

Presentation of the FAST '13 Test of Time Award

2:00 p.m.–3:20 p.m. Wednesday

Protecting Your Data

Imperial Ballroom

Session Chair: Cheng Huang, Microsoft Research

Memory Efficient Sanitization of a Deduplicated Storage System

Fabiano C. Botelho, Philip Shilane, Nitin Garg, and Windsor Hsu, EMC Backup Recovery Systems Division

Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challenging in deduplicated storage systems because each piece of data on the physical media could be referred to by multiple namespace objects. For large storage systems, where available memory is a small fraction of storage capacity, standard techniques for tracking data references will not fit in memory, and we discuss multiple sanitization techniques that trade-off I/O and memory requirements. We have three key contributions. First, we provide an understanding of the threat model and what is required to sanitize a deduplicated storage system ascompared to a device. Second, we have designed a memory efficient algorithm using perfect hashing that only requires from 2.54 to 2.87 bits per reference (98% savings) while minimizing the amount of I/O. Third, we present acomplete sanitization design for EMC Data Domain.

Available Media

SD Codes: Erasure Codes Designed for How Storage Systems Really Fail

James S. Plank, University of Tennessee; Mario Blaum and James L. Hafner, IBM Almaden Research Center

This paper has been removed because of a dispute over its contents. Please see this memo from February 17, 2015, for more information.

Available Media

HARDFS: Hardening HDFS with Selective and Lightweight Versioning

Thanh Do, Tyler Harter, and Yingchao Liu, University of Wisconsin—Madison; Haryadi S. Gunawi, University of Chicago; Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison

We harden the Hadoop Distributed File System (HDFS) against fail-silent (non fail-stop) behaviors that result from memory corruption and software bugs using a new approach: selective and lightweight versioning (SLEEVE). With this approach, actions performed by important subsystems of HDFS (e.g., namespace management) are checked by a second implementation of the subsystem that uses lightweight, approximate data structures. We show that HARDFS detects and recovers froma wide range of fail-silent behaviors caused by random bit flips, targeted corruptions, and real software bugs. In particular, HARDFS handles 90% of the fail-silentfaults that result from random memory corruption and correctly detects and recovers from 100% of 78 targeted corruptions and 5 real-world bugs. Moreover, it recovers orders of magnitude faster than full reboot by using micro-recovery. The extra protection in HARDFS incurs minimal performance and space overheads.

Available Media


3:20 p.m.–3:50 p.m. Wednesday

Break

 Market Street Foyer
 
3:50 p.m.–5:20 p.m. Wednesday

Big Systems, Big Challenges

Imperial Ballroom

Session Chair: Daniel Peek, Facebook

Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines

Devesh Tiwari, North Carolina State University; Simona Boboila, Northeastern University; Sudharshan Vazhkudai and Youngjae Kim, Oak Ridge National Laboratory; Xiaosong Ma, North Carolina State University; Peter Desnoyers, Northeastern University; Yan Solihin, North Carolina State University

Modern scientific discovery is increasingly driven by large-scale supercomputing simulations, followed by data analysis tasks. These data analyses are either performed offline, on smaller-scale clusters, or on the supercomputer itself. Unfortunately, these techniques suffer from performance and energy inefficiencies due to increased data movement between the compute and storage subsystems. Therefore, we propose Active Flash, an insitu scientific data analysis approach, wherein data analysis is conducted on the solid-state device (SSD), wherethe data already resides. Our performance and energy models show that Active Flash has the potential to address many of the aforementioned concerns without degrading HPC simulation performance. In addition, we demonstrate an Active Flash prototype built on a commercial SSD controller, which further reaffirms the viability of our proposal.

Available Media

MixApart: Decoupled Analytics for Shared Storage Systems

Madalin Mihailescu, University of Toronto and NetApp; Gokul Soundararajan, NetApp; Cristiana Amza, University of Toronto

Distributed file systems built for data analytics and enterprise storage systems have very different functionality requirements. For this reason, enabling analytics on enterprise data commonly introduces a separate analytics storage silo. This generates additional costs, and inefficiencies in data management, e.g., whenever data needsto be archived, copied, or migrated across silos.

MixApart uses an integrated data caching and scheduling solution to allow MapReduce computations to analyze data stored on enterprise storage systems. The front-end caching layer enables the local storage performance required by data analytics. The shared storage back-end simplifies data management.

We evaluate MixApart using a 100-core Amazon EC2 cluster with micro-benchmarks and production workload traces. Our evaluation shows that MixApart provides (i) up to 28% faster performance than the traditional ingest then-compute workflows used in enterprise IT analytics, and (ii) comparable performance to an ideal Hadoop setup without data ingest, at similar cluster sizes.

Available Media

Horus: Fine-Grained Encryption-Based Security for Large-Scale Storage

Yan Li, Nakul Sanjay Dhotre, and Yasuhiro Ohara, University of California, Santa Cruz; Thomas M. Kroeger, Sandia National Laboratories; Ethan L. Miller and Darrell D. E. Long, University of California, Santa Cruz

With the growing use of large-scale distributed systems, the likelihood that at least one node is compromised is increasing. Large-scale systems that process sensitive data such as geographic data with defense implications, drug modeling, nuclear explosion modeling, and private genomic data would benefit greatly from strong security for their storage. Nevertheless, many high performance computing (HPC), cloud, or secure content delivery network (SCDN) systems that handle such data still store them unencrypted or use simple encryption schemes, relying heavily on physical isolation to ensure confidentiality, providing little protection against compromised computers or malicious insiders. Moreover, current encryption solutions cannot efficiently provide fine-grained encryption for large datasets.

Available Media



5:20 p.m.–7:20 p.m. Wednesday

Poster Session and Reception I

Regency Ballroom

Session Chair: Nitin Agrawal, NEC Labs

The list of accepted posters is available here.

 

Thursday, February 14, 2013

9:00 a.m.–10:15 a.m. Thursday

Keynote Address

Imperial Ballroom

Disruptive Innovation: Data Domain Experience

Kai Li, Princeton University

Data Domain is an example of a storage company with disruptive innovation during the past decade. The company started in 2001 with the mission to create and deploy deduplication storage products to replace tapes in data centers. The company went public in 2007 and was acquired by EMC Corporation in 2009. The revenue of the Data Domain product line has successfully disrupted the tape automation market.

How was Data Domain created? What makes Data Domain successful? In this talk, I will share my experience with the conference attendees to answer these two questions and also present my views on the creation of disruptive innovations, the relationship between academic research and innovation, and why most large companies are incapable of disruptive innovations.

Data Domain is an example of a storage company with disruptive innovation during the past decade. The company started in 2001 with the mission to create and deploy deduplication storage products to replace tapes in data centers. The company went public in 2007 and was acquired by EMC Corporation in 2009. The revenue of the Data Domain product line has successfully disrupted the tape automation market.

How was Data Domain created? What makes Data Domain successful? In this talk, I will share my experience with the conference attendees to answer these two questions and also present my views on the creation of disruptive innovations, the relationship between academic research and innovation, and why most large companies are incapable of disruptive innovations.

Available Media

  • Read more about Disruptive Innovation: Data Domain Experience

10:15 a.m.–10:45 a.m. Thursday

Break

 Market Street Foyer
 
10:45 a.m.–12:05 p.m. Thursday

Deduplication

Imperial Ballroom

Session Chair: Fred Douglis, EMC Backup Recovery Systems Division

Concurrent Deletion in a Distributed Content-Addressable Storage System with Global Deduplication

Przemyslaw Strzelczak, Elzbieta Adamczyk, Urszula Herman-Izycka, Jakub Sakowicz, Lukasz Slusarczyk, Jaroslaw Wrona, and Cezary Dubnicki, 9LivesData, LLC

Scalable, highly reliable distributed systems supporting data deduplication have recently become popular for storing backup and archival data. One of the important requirements for backup storage is the ability to delete data selectively. Unlike in traditional storage systems, data deletion in distributed systems with deduplication is a major challenge because deduplication leads to multiple owners of data chunks. Moreover, system configuration changes often due to node additions, deletions and failures. Expected high performance, high availability and low impact of deletion on regular user operations additionally complicate identification and reclamation of unnecessary blocks.

This paper describes a deletion algorithm for a scalable, content-addressable storage with global deduplication. The deletion is concurrent: user reads and writes can proceed in parallel with deletion with only minor restrictions established to make reclamation feasible. Moreover, our approach allows for deduplication of user writes during deletion. We extend traditional distributed reference counting to deliver a failure-tolerant deletion that accommodates not only deduplication, but also the dynamic nature of a scalable system and its physical resource constraints. The proposed algorithm has been verified with an implementation in a commercial deduplicating storage system. The impact of deletion on user operations is configurable. Using a default setting that grants deletion maximum 30% of system resources running the deletion reduces end performance by not more that 30%. This impact can be reduced to less than 5% when deletion is given only minimal resources.

Available Media

File Recipe Compression in Data Deduplication Systems

Dirk Meister, André Brinkmann, and Tim Süß, Johannes Gutenberg University Mainz

Data deduplication systems discover and exploit redundancies between different data blocks. The most common approach divides data into chunks and identifies redundancies via fingerprints. The file content can be rebuilt by combining the chunk fingerprints which a restored sequentially in a file recipe. The corresponding file recipe data can occupy a significant fraction of the total disk space, especially if the deduplication ratio is very high. We propose a combination of efficient and scalable compression schemes to shrink the file recipes’ size. A trace-based simulation shows that these methods can compress file recipes by up to 93%.

Available Media

Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication

Mark Lillibridge and Kave Eshghi, HP Labs; Deepavali Bhagwat, HP Storage

Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime of a system. We study three techniques—increasing cache size, container capping, and using a forward assembly area—for alleviating this problem. Container capping is an ingest-time operation that reduces chunk fragmentation at the cost of forfeiting some deduplication, while using a forward assembly area is a new restore-time caching and prefetching technique that exploits the perfect knowledge of future chunk accesses available when restoring a backup to reduce the amount of RAM required for a given level of caching at restore time.

We show that using a larger cache per stream—we see continuing benefits even up to 8 GB—can produce up to a 5–16X improvement, that giving up as little as 8% deduplication with capping can yield a 2–6X improvement, and that using a forward assembly area is strictly superior to LRU, able to yield a 2–4X improvement while holding the RAM budget constant.

Available Media

12:05 p.m.–2:00 p.m. Thursday

Lunch On Your Own

2:00 p.m.–3:30 p.m. Thursday

Work-in-Progress Reports (WiPs)

Imperial Ballroom

Session Chair: Joseph Tucek, HP Labs

The list of accepted Work-in-Progress Reports (WiPs) is available here.

FAST '13 Work in Progress Reports

Available Media

  • Read more about FAST '13 Work in Progress Reports

3:30 p.m.–4:00 p.m. Thursday

Break

 Market Street Foyer
 
4:00 p.m.–5:30 p.m. Thursday

Something for Everyone

Imperial Ballroom

Session Chair: Ethan L. Miller, University of California, Santa Cruz and Pure Storage

Shroud: Ensuring Private Access to Large-Scale Data in the Data Center

Jacob R. Lorch, Bryan Parno, and James Mickens, Microsoft Research; Mariana Raykova, IBM Research; Joshua Schiffman, AMD

Recent events have shown online service providers the perils of possessing private information about users. Encrypting data mitigates but does not eliminate this threat: the pattern of data accesses still reveals information. Thus, we present Shroud, a general storage system that hides data access patterns from the servers running it, protecting user privacy. Shroud functions as a virtual disk with a new privacy guarantee: the user can look up a block without revealing the block’s address. Such a virtual disk can be used for many purposes, including map lookup, microblog search, and social networking.

Shroud aggressively targets hiding accesses among hundreds of terabytes of data. We achieve our goals by adapting oblivious RAM algorithms to enable large-scale parallelization. Specifically, we show, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency. Our evaluation combines large-scale emulation with an implementation on secure coprocessors and suggests that these adaptations bring private data access closer to practicality.

Available Media

Getting Real: Lessons in Transitioning Research Simulations into Hardware Systems

Mohit Saxena, Yiying Zhang, Michael M. Swift, Andrea C. Arpaci Dusseau, and Remzi H. Arpaci Dusseau, University of Wisconsin—Madison

Flash-based solid-state drives have revolutionized storage with their high performance. Their sophisticated internal mechanisms have led to a plethora of research on how to optimize applications, file systems, and internal SSD designs. Due to the closed nature of commercial devices though, most research on the internals of an SSD,s uch as enhanced flash-translation layers, is performed using simulation or emulation. Without implementation in real devices, it can be difficult to judge the true benefit of the proposed designs.

In this paper, we describe our efforts to implement two new SSD designs that change both the internal workings of the device and its interface to the host operating system. Using the OpenSSD Jasmine board, we develop a prototype of FlashTier’s Solid State Cache (SSC) and of the Nameless Write SSD. While the flash-translation layer changes were straightforward, we discovered unexpected complexities in implementing extensions to the storage interface.

We describe our implementation process and extract a set of lessons applicable to other SSD prototypes. With our prototype we validate the performance claims of FlashTier and show a 45-52% performance improvement over caching with an SSD and a 90% reduction in erases.

Available Media

To Zip or Not to Zip: Effective Resource Usage for Real-Time Compression

Danny Harnik, Ronen Kat, Oded Margalit, Dmitry Sotnikov, and Avishay Traeger, IBM Research—Haifa

Real-time compression for primary storage is quickly becoming widespread as data continues to grow exponentially, but adding compression on the data path consumes scarce CPU and memory resources on the storage system. Our work aims to mitigate this cost by introducing methods to quickly and accurately identify the data that will yield significant space savings when compressed.

The first level of filtering that we employ is at the dataset level (e.g., volume or file system), where we estimate the overall compressibility of the data at rest. According to the outcome, we may choose to enable or disable compression for the entire data set, or to employ a second level of finer-grained filtering. The second filtering scheme examines data being written to the storage system in an online manner and determines its compressibility.

The first-level filtering runs in mere minutes while providing mathematically proven guarantees on its estimates. In addition to aiding in selecting which volumes to compress, it has been released as a public tool, allowing potential customers to determine the effectiveness of compression on their data and to aid in capacity planning. The second-level filtering has shown significant CPU savings (up to 35%) while maintaining compression savings (within 2%).

Available Media


5:30 p.m.–7:30 p.m. Thursday

Poster Session and Reception II

Regency Ballroom

Session Chair: Nitin Agrawal, NEC Labs

The list of accepted posters is available here.

 

Friday, February 15, 2013

9:00 a.m.–10:30 a.m. Friday

Flash and SSDs

Imperial Ballroom

Session Chair: Sam H. Noh, Hongik University

LDPC-in-SSD: Making Advanced Error Correction Codes Work Effectively in Solid State Drives

Kai Zhao, Rensselaer Polytechnic Institute; Wenzhe Zhao and Hongbin Sun, Xi'an Jiaotong University; Tong Zhang, Rensselaer Polytechnic Institute; Xiaodong Zhang, The Ohio State University; Nanning Zheng, Xi'an Jiaotong University

Conventional error correction codes (ECCs), such as the commonly used BCH code, have become increasingly inadequate for solid state drives (SSDs) as the capacity of NAND flash memory continues to increase and its reliability continues to degrade. It is highly desirable to deploy a much more powerful ECC, such as low-density parity-check (LDPC) code, to significantly improve the reliability of SSDs. Although LDPC code has had its success in commercial hard disk drives, to fully exploitits error correction capability in SSDs demands unconventional fine-grained flash memory sensing, leading to an increased memory read latency. To address this important but largely unexplored issue, this paper presents three techniques to mitigate the LDPC-induced response time delay so that SSDs can benefit its strong error correction capability to the full extent. We quantitatively evaluate these techniques by carrying out trace-based SSD simulations with runtime characterization of NAND flash memory reliability and LDPC code decoding. Our study based on intensive experiments shows that these techniques used in an integrated way in SSDs can reduce the worst-case system read response time delay from over 100% down to below 20%. With our proposed techniques, a strong ECC alternative can be used in NAND flash memory to retain its reliability to respond the continuous cost reduction, and its relatively small increase of response time delay is acceptable to mainstream application users, considering a huge gain in SSD capacity, its reliability, and the price reduction.

Available Media

Extending the Lifetime of Flash-based Storage through Reducing Write Amplification from File Systems

Youyou Lu, Jiwu Shu, and Weimin Zheng, Tsinghua University

Flash memory has gained in popularity as storage devices for both enterprise and embedded systems because of its high performance, low energy and reduced cost. The endurance problem of flash memory, however, is still a challenge and is getting worse as storage density increases with the adoption of multi-level cells (MLC). Prior work has addressed wear leveling and data reduction, but there is significantly less work on using the file system to improve flash lifetimes. Some common mechanisms in traditional file systems, such as journaling, metadata synchronization, and page-aligned update, can induce extra write operations and aggravate the wear of flash memory. This problem is called write amplification from file systems.

In order to mitigate write amplification, we propose an object-based flash translation layer design (OFTL), in which mechanisms are co-designed with flash memory. By leveraging page metadata, OFTL enables lazy persistence of index metadata and eliminates journals while keeping consistency. Coarse-grained block state maintenance reduces persistent free space management overhead. With byte-unit access interfaces, OFTL is able to compact and co-locate the small updates with metadata to further reduce updates. Experiments show that an OFTL-based system, OFSS, offers a write amplification reduction of 47.4% ˜ 89.4% in SYNC mode and 19.8% ˜ 64.0% in ASYNC mode compared with ext3, ext2, and btrfs on an up-to-date page-level FTL.

Available Media

Understanding the Robustness of SSDs under Power Fault

Mai Zheng, The Ohio State University; Joseph Tucek, HP Labs; Feng Qin, The Ohio State University; Mark Lillibridge, HP Labs

Modern storage technology (SSDs, No-SQL databases, commoditized RAID hardware, etc.) bring new reliability challenges to the already complicated storage stack. Among other things, the behavior of these new components during power faults—which happen relatively frequently in data centers—is an important yet mostly ignored issue in this dependability-critical area. Understanding how new storage components behave under power fault is the first step towards designing new robust storage systems.

In this paper, we propose a new methodology to expose reliability issues in block devices under power faults. Our framework includes specially-designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our testing framework, we test fifteen commodity SSDs from five different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.

Available Media


10:30 a.m.–11:00 a.m. Friday

Break

 Market Street Foyer
 
11:00 a.m.–12:20 p.m. Friday

Performance Improvements and Measurements

Imperial Ballroom

Session Chair: Kiran-Kumar Muniswamy-Reddy, Amazon.com

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Ji Yong Shin, Cornell University; Mahesh Balakrishnan, Microsoft Research; Tudor Marian, Google; Hakim Weatherspoon, Cornell University

Disk contention is increasingly a significant problem for cloud storage, as applications are forced to co-exist on machines and share physical disk resources. Disks are notoriously sensitive to contention; a single application’s random I/O is sufficient to reduce the throughput of a disk array by an order of magnitude, disrupting every other application running on the same array. Log-structured storage designs can alleviate write-write contention between applications by sequentializing all writes, but have historically suffered from read-write contention triggered by garbage collection (GC) as well as application reads. Gecko is a novel log-structured design that eliminates read-write contention by chaining together a small number of drives into a single log, effectively separating the tail of the log (where writes are appended) from its body. As a result, writes proceed to the tail drive without contention from either GC reads or first-class reads, which are restricted to the body of the log with the help of a tail-specific caching policy. Gecko trades-off maximum contention-free sequential throughput from multiple drives in exchange for a stable and predictable maximum throughput from a single uncontended drive, and achieves better performance compared to native log-structured or RAID based systems for most cases. Our in-kernel implementation provides random write bandwidth to applications of 60 to 120MB/s, despite concurrent GC activity, application reads, and an adversarial workload.

Available Media

Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions

James S. Plank, University of Tennessee; Kevin M. Greenan, EMC Backup Recovery Systems Division; Ethan L. Miller, University of California, Santa Cruz

This paper has been removed because of a dispute over its contents. Please see this memo from February 17, 2015, for more information.

Available Media

Virtual Machine Workloads: The Case for New NAS Benchmarks

Vasily Tarasov, Stony Brook University; Dean Hildebrand, IBM Research—Almaden; Geoff Kuenning, Harvey Mudd College; Erez Zadok, Stony Brook University

Network Attached Storage (NAS) and Virtual Machines (VMs) are widely used in data centers thanks to their manageability, scalability, and ability to consolidate resources. But the shift from physical to virtual clients drastically changes the I/O workloads seen on NAS servers, due to guest file system encapsulation in virtual disk images and the multiplexing of request streams from different VMs. Unfortunately, current NAS workload generators and benchmarks produce workloads typical to physical machines.

This paper makes two contributions. First, we studied the extent to which virtualization is changing existing NAS workloads. We observed significant changes, including the disappearance of file system meta-data operations at the NAS layer, changed I/O sizes, and increased randomness. Second, we created a set of versatile NAS benchmarks to synthesize virtualized workloads. This allows us to generate accurate virtualized workloads without the effort and limitations associated with setting up a full virtualized environment. Our experiments demonstrate that the relative error of our virtualized benchmarks, evaluated across 11 parameters, averages less than 10%.

Available Media

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us