Virtual Machine Workloads: The Case for New {NAS} Benchmarks

Vasily Tarasov; Dean Hildebrand; Geoff Kuenning; Erez Zadok; Nanning Zheng; Tong Zhang; Cezary Dubnicki

twitter

FAST '13 Technical Sessions

To access a presentation's content, please click on its title below.

Proceedings Front Matter:
Cover Page | Title Page and List of Organizers | Message from the Program Co-Chairs

The full FAST '13 conference proceedings (in PDF, EPUB, and MOBI) and the table of contents (in PDF), which contain papers entitled SD Codes: Erasure Codes Designed for How Storage Systems Really Fail and Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions, which were posted here, have been taken down due to a dispute over the two papers' contents. Please see this memo from February 17, 2015, for more information. Note that all other FAST '13 papers are available individually below.

Attendee Files

FAST '13 Errata Slip (PDF)

FAST '13 Errata Slip (EPUB)

FAST '13 Errata Slip (MOBI)

Wednesday, February 13, 2013

8:45 a.m.–9:00 a.m.	Wednesday
Opening Remarks and Best Paper Awards Imperial Ballroom Program Co-Chairs: Keith A. Smith, NetApp, and Yuanyuan Zhou, University of California, San Diego FAST '13 Opening Remarks & Awards Available Media Read more about FAST '13 Opening Remarks & Awards
9:00 a.m.–10:30 a.m.	Wednesday
File Systems Imperial Ballroom Session Chair: Ric Wheeler, Red Hat ffsck: The Fast File System Checker Ao Ma, EMC Corporation and University of Wisconsin—Madison; Charlotte Dragga, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison Crash failures, hardware errors, and file system bugs can corrupt file systems and cause data loss, despite the presence of journals and similar preventive techniques. While consistency checkers such as fsck can detect this corruption and restore a damaged image to a usable state, they are generally created as an afterthought, to be run only at rare intervals. Thus, checkers operate slowly, causing significant downtime for large scale storage systems when they are needed. We address this dilemma by treating the checker as a key component of the overall file system (and not merely a peripheral add-on). To this end, we present a modiﬁed ext3 file system, rext3, to directly support the fast file system checker, ffsck. The rext3 file system co-locates and self-identifies its metadata blocks, removing the need for costly seeks and tree traversals during checking. These modiﬁcations to the file system allow ffsck to scan and repair the file system at rates approaching the full sequential bandwidth of the underlying device. In addition, we demonstrate that rext3 performs competitively with ext3 in most cases and exceeds it in handling random reads and large writes. Available Media Building Workload-Independent Storage with VT-Trees Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok, Stony Brook University As the Internet and the amount of data grows, the variability of data sizes grows too—from small MP3 tags to large VM images. With applications using increasingly more complex queries and larger data-sets, data access patterns have become more complex and randomized. Current storage systems focus on optimizing for one band of workloads at the expense of other workloads due to limitations in existing storage system data structures. We designed a novel workload-independent data structure called the VT-tree which extends the LSM-tree to efficiently handle sequential and file-system workloads. We designed a system based solely on VT-trees which offers concurrent access to data via file system and database APIs, transactional guarantees, and consequently provides efficient and scalable access to both large and small data items regardless of the access pattern. Our evaluation shows that our user-level system has 2–6.6 better performance for random-write workloads and only a small average overhead for other workloads. Available Media A Study of Linux File System Evolution Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Shan Lu, University of Wisconsin—Madison Awarded Best Paper! We conduct a comprehensive study of ﬁle-system code evolution. By analyzing eight years of Linux ﬁle-system changes across 5079 patches, we derive numerous new (and sometimes surprising) insights into the ﬁle-system development process; our results should be useful for both the development of ﬁle systems themselves as well as the improvement of bug-ﬁnding tools. Available Media
10:30 a.m.–11:00 a.m.	Wednesday
Break Market Street Foyer
11:00 a.m.–12:20 p.m.	Wednesday
Caching Imperial Ballroom Session Chair: Eno Thereska, Microsoft Research Write Policies for Host-side Flash Caches Ricardo Koller, Florida International University and VMware; Leonardo Marmol and Raju Rangaswami, Florida International University; Swaminathan Sundararaman and Nisha Talagala, FusionIO; Ming Zhao, Florida International University Host-side ﬂash-based caching offers a promising new direction for optimizing access to networked storage. Current work has argued for using host-side ﬂash primarily as a read cache and employing a write-through policy which provides the strictest consistency and durability guarantees. However, write-through requires synchronous updates over the network for every write. For write-mostly or write-intensive workloads, it significantly under-utilizes the high-performance ﬂash cache layer. The write-back policy, on the other hand, better utilizes the cache for workloads with signiﬁcant write I/O requirements. However, conventional write-back performs out-of-order eviction of data and unacceptably sacriﬁces data consistency at the network storage. We develop and evaluate two consistent write-back caching policies, ordered and journaled, that are designed to perform increasingly better than write-through. These policies enable new trade-off points across performance, data consistency, and data staleness dimensions. Using benchmark workloads such as PostMark, TPC-C, Filebench, and YCSB we evaluate the new write policies we propose alongside conventional write-through and write-back. We ﬁnd that ordered write-back performs better than write-through. Additionally, we ﬁnd that journaled write-back can trade-off staleness for performance, approaching, and in some cases, exceeding conventional write-back performance. Finally, a variant of journaled write-back that utilizes consistency hints from the application can provide straight forward application-level storage consistency, a stricter form of consistency than the transactional consistency provided by write-through. Available Media Warming Up Storage-Level Caches with Bonﬁre Yiying Zhang, University of Wisconsin—Madison; Gokul Soundararajan, Mark W. Storer, Lakshmi N. Bairavasundaram, and Sethuraman Subbiah, NetApp; Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison Large caches in storage servers have become essential for meeting service levels required by applications. These caches need to be warmed with data often today due to various scenarios including dynamic creation of cache space and server restarts that clear cache contents. When large storage caches are warmed at the rate of application I/O, warmup can take hours or even days, thus affecting both application performance and server load over a long period of time. We have created Bonﬁre, a mechanism for accelerating cache warmup. Bonﬁre monitors storage server workloads, logs important warmup data, and efﬁciently preloads storage-level caches with warmup data. Bonﬁre is based on our detailed analysis of block-level data-center traces that provides insights into heuristics for warmup as well as the potential for efﬁcient mechanisms. We show through both simulation and trace replay that Bonﬁre reduces both warmup time and backend server load significantly, compared to a cache that is warmed up on demand. Available Media Unioning of the Buffer Cache and Journaling Layers with Non-volatile Memory Eunji Lee and Hyokyung Bahn, Ewha University; Sam H. Noh, Hongik University Awarded Best Paper! Journaling techniques are widely used in modern file systems as they provide high reliability and fast recovery from system failures. However, it reduces the performance benefit of buffer caching as journaling accounts for a bulk of the storage writes in real system environments. In this paper, we present a novel buffer cache architecture that subsumes the functionality of caching and journaling by making use of non-volatile memory such as PCM or STT-MRAM. Specifically, our buffer cache supports what we call the in-place commit scheme. This scheme avoids logging, but still provides the same journaling effect by simply altering the state of the cached block to frozen. As a frozen block still performs the function of caching, we show that in-place commit does not degrade cache performance. We implement our scheme on Linux 2.6.38 and measure the throughput and execution time of the scheme with various file I/O benchmarks. The results show that our scheme improves I/O performance by 76% on average and up to 240% compared to the existing Linux buffer cache with ext4 without any loss of reliability. Available Media
12:20 p.m.–2:00 p.m.	Wednesday
Conference Luncheon Regency Ballroom Presentation of the FAST '13 Test of Time Award
2:00 p.m.–3:20 p.m.	Wednesday
Protecting Your Data Imperial Ballroom Session Chair: Cheng Huang, Microsoft Research Memory Efficient Sanitization of a Deduplicated Storage System Fabiano C. Botelho, Philip Shilane, Nitin Garg, and Windsor Hsu, EMC Backup Recovery Systems Division Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challenging in deduplicated storage systems because each piece of data on the physical media could be referred to by multiple namespace objects. For large storage systems, where available memory is a small fraction of storage capacity, standard techniques for tracking data references will not ﬁt in memory, and we discuss multiple sanitization techniques that trade-off I/O and memory requirements. We have three key contributions. First, we provide an understanding of the threat model and what is required to sanitize a deduplicated storage system ascompared to a device. Second, we have designed a memory efﬁcient algorithm using perfect hashing that only requires from 2.54 to 2.87 bits per reference (98% savings) while minimizing the amount of I/O. Third, we present acomplete sanitization design for EMC Data Domain. Available Media SD Codes: Erasure Codes Designed for How Storage Systems Really Fail James S. Plank, University of Tennessee; Mario Blaum and James L. Hafner, IBM Almaden Research Center This paper has been removed because of a dispute over its contents. Please see this memo from February 17, 2015, for more information. Available Media HARDFS: Hardening HDFS with Selective and Lightweight Versioning Thanh Do, Tyler Harter, and Yingchao Liu, University of Wisconsin—Madison; Haryadi S. Gunawi, University of Chicago; Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison We harden the Hadoop Distributed File System (HDFS) against fail-silent (non fail-stop) behaviors that result from memory corruption and software bugs using a new approach: selective and lightweight versioning (SLEEVE). With this approach, actions performed by important subsystems of HDFS (e.g., namespace management) are checked by a second implementation of the subsystem that uses lightweight, approximate data structures. We show that HARDFS detects and recovers froma wide range of fail-silent behaviors caused by random bit ﬂips, targeted corruptions, and real software bugs. In particular, HARDFS handles 90% of the fail-silentfaults that result from random memory corruption and correctly detects and recovers from 100% of 78 targeted corruptions and 5 real-world bugs. Moreover, it recovers orders of magnitude faster than full reboot by using micro-recovery. The extra protection in HARDFS incurs minimal performance and space overheads. Available Media
3:20 p.m.–3:50 p.m.	Wednesday
Break Market Street Foyer
3:50 p.m.–5:20 p.m.	Wednesday
Big Systems, Big Challenges Imperial Ballroom Session Chair: Daniel Peek, Facebook Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines Devesh Tiwari, North Carolina State University; Simona Boboila, Northeastern University; Sudharshan Vazhkudai and Youngjae Kim, Oak Ridge National Laboratory; Xiaosong Ma, North Carolina State University; Peter Desnoyers, Northeastern University; Yan Solihin, North Carolina State University Modern scientiﬁc discovery is increasingly driven by large-scale supercomputing simulations, followed by data analysis tasks. These data analyses are either performed ofﬂine, on smaller-scale clusters, or on the supercomputer itself. Unfortunately, these techniques suffer from performance and energy inefﬁciencies due to increased data movement between the compute and storage subsystems. Therefore, we propose Active Flash, an insitu scientiﬁc data analysis approach, wherein data analysis is conducted on the solid-state device (SSD), wherethe data already resides. Our performance and energy models show that Active Flash has the potential to address many of the aforementioned concerns without degrading HPC simulation performance. In addition, we demonstrate an Active Flash prototype built on a commercial SSD controller, which further reafﬁrms the viability of our proposal. Available Media MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, University of Toronto and NetApp; Gokul Soundararajan, NetApp; Cristiana Amza, University of Toronto Distributed ﬁle systems built for data analytics and enterprise storage systems have very different functionality requirements. For this reason, enabling analytics on enterprise data commonly introduces a separate analytics storage silo. This generates additional costs, and inefﬁciencies in data management, e.g., whenever data needsto be archived, copied, or migrated across silos. MixApart uses an integrated data caching and scheduling solution to allow MapReduce computations to analyze data stored on enterprise storage systems. The front-end caching layer enables the local storage performance required by data analytics. The shared storage back-end simpliﬁes data management. We evaluate MixApart using a 100-core Amazon EC2 cluster with micro-benchmarks and production workload traces. Our evaluation shows that MixApart provides (i) up to 28% faster performance than the traditional ingest then-compute workﬂows used in enterprise IT analytics, and (ii) comparable performance to an ideal Hadoop setup without data ingest, at similar cluster sizes. Available Media Horus: Fine-Grained Encryption-Based Security for Large-Scale Storage Yan Li, Nakul Sanjay Dhotre, and Yasuhiro Ohara, University of California, Santa Cruz; Thomas M. Kroeger, Sandia National Laboratories; Ethan L. Miller and Darrell D. E. Long, University of California, Santa Cruz With the growing use of large-scale distributed systems, the likelihood that at least one node is compromised is increasing. Large-scale systems that process sensitive data such as geographic data with defense implications, drug modeling, nuclear explosion modeling, and private genomic data would beneﬁt greatly from strong security for their storage. Nevertheless, many high performance computing (HPC), cloud, or secure content delivery network (SCDN) systems that handle such data still store them unencrypted or use simple encryption schemes, relying heavily on physical isolation to ensure conﬁdentiality, providing little protection against compromised computers or malicious insiders. Moreover, current encryption solutions cannot efﬁciently provide ﬁne-grained encryption for large datasets. Available Media
5:20 p.m.–7:20 p.m.	Wednesday
Poster Session and Reception I Regency Ballroom Session Chair: Nitin Agrawal, NEC Labs The list of accepted posters is available here.

Thursday, February 14, 2013

9:00 a.m.–10:15 a.m.	Thursday
Keynote Address Imperial Ballroom Disruptive Innovation: Data Domain Experience Kai Li, Princeton University Data Domain is an example of a storage company with disruptive innovation during the past decade. The company started in 2001 with the mission to create and deploy deduplication storage products to replace tapes in data centers. The company went public in 2007 and was acquired by EMC Corporation in 2009. The revenue of the Data Domain product line has successfully disrupted the tape automation market. How was Data Domain created? What makes Data Domain successful? In this talk, I will share my experience with the conference attendees to answer these two questions and also present my views on the creation of disruptive innovations, the relationship between academic research and innovation, and why most large companies are incapable of disruptive innovations. Data Domain is an example of a storage company with disruptive innovation during the past decade. The company started in 2001 with the mission to create and deploy deduplication storage products to replace tapes in data centers. The company went public in 2007 and was acquired by EMC Corporation in 2009. The revenue of the Data Domain product line has successfully disrupted the tape automation market. How was Data Domain created? What makes Data Domain successful? In this talk, I will share my experience with the conference attendees to answer these two questions and also present my views on the creation of disruptive innovations, the relationship between academic research and innovation, and why most large companies are incapable of disruptive innovations. Available Media Read more about Disruptive Innovation: Data Domain Experience
10:15 a.m.–10:45 a.m.	Thursday
Break Market Street Foyer
10:45 a.m.–12:05 p.m.	Thursday
Deduplication Imperial Ballroom Session Chair: Fred Douglis, EMC Backup Recovery Systems Division Concurrent Deletion in a Distributed Content-Addressable Storage System with Global Deduplication Przemyslaw Strzelczak, Elzbieta Adamczyk, Urszula Herman-Izycka, Jakub Sakowicz, Lukasz Slusarczyk, Jaroslaw Wrona, and Cezary Dubnicki, 9LivesData, LLC Scalable, highly reliable distributed systems supporting data deduplication have recently become popular for storing backup and archival data. One of the important requirements for backup storage is the ability to delete data selectively. Unlike in traditional storage systems, data deletion in distributed systems with deduplication is a major challenge because deduplication leads to multiple owners of data chunks. Moreover, system conﬁguration changes often due to node additions, deletions and failures. Expected high performance, high availability and low impact of deletion on regular user operations additionally complicate identiﬁcation and reclamation of unnecessary blocks. This paper describes a deletion algorithm for a scalable, content-addressable storage with global deduplication. The deletion is concurrent: user reads and writes can proceed in parallel with deletion with only minor restrictions established to make reclamation feasible. Moreover, our approach allows for deduplication of user writes during deletion. We extend traditional distributed reference counting to deliver a failure-tolerant deletion that accommodates not only deduplication, but also the dynamic nature of a scalable system and its physical resource constraints. The proposed algorithm has been veriﬁed with an implementation in a commercial deduplicating storage system. The impact of deletion on user operations is conﬁgurable. Using a default setting that grants deletion maximum 30% of system resources running the deletion reduces end performance by not more that 30%. This impact can be reduced to less than 5% when deletion is given only minimal resources. Available Media File Recipe Compression in Data Deduplication Systems Dirk Meister, André Brinkmann, and Tim Süß, Johannes Gutenberg University Mainz Data deduplication systems discover and exploit redundancies between different data blocks. The most common approach divides data into chunks and identiﬁes redundancies via ﬁngerprints. The ﬁle content can be rebuilt by combining the chunk ﬁngerprints which a restored sequentially in a ﬁle recipe. The corresponding ﬁle recipe data can occupy a signiﬁcant fraction of the total disk space, especially if the deduplication ratio is very high. We propose a combination of efﬁcient and scalable compression schemes to shrink the ﬁle recipes’ size. A trace-based simulation shows that these methods can compress ﬁle recipes by up to 93%. Available Media Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication Mark Lillibridge and Kave Eshghi, HP Labs; Deepavali Bhagwat, HP Storage Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime of a system. We study three techniques—increasing cache size, container capping, and using a forward assembly area—for alleviating this problem. Container capping is an ingest-time operation that reduces chunk fragmentation at the cost of forfeiting some deduplication, while using a forward assembly area is a new restore-time caching and prefetching technique that exploits the perfect knowledge of future chunk accesses available when restoring a backup to reduce the amount of RAM required for a given level of caching at restore time. We show that using a larger cache per stream—we see continuing beneﬁts even up to 8 GB—can produce up to a 5–16X improvement, that giving up as little as 8% deduplication with capping can yield a 2–6X improvement, and that using a forward assembly area is strictly superior to LRU, able to yield a 2–4X improvement while holding the RAM budget constant. Available Media
12:05 p.m.–2:00 p.m.	Thursday
Lunch On Your Own
2:00 p.m.–3:30 p.m.	Thursday
Work-in-Progress Reports (WiPs) Imperial Ballroom Session Chair: Joseph Tucek, HP Labs The list of accepted Work-in-Progress Reports (WiPs) is available here. FAST '13 Work in Progress Reports Available Media Read more about FAST '13 Work in Progress Reports
3:30 p.m.–4:00 p.m.	Thursday
Break Market Street Foyer
4:00 p.m.–5:30 p.m.	Thursday
Something for Everyone Imperial Ballroom Session Chair: Ethan L. Miller, University of California, Santa Cruz and Pure Storage Shroud: Ensuring Private Access to Large-Scale Data in the Data Center Jacob R. Lorch, Bryan Parno, and James Mickens, Microsoft Research; Mariana Raykova, IBM Research; Joshua Schiffman, AMD Recent events have shown online service providers the perils of possessing private information about users. Encrypting data mitigates but does not eliminate this threat: the pattern of data accesses still reveals information. Thus, we present Shroud, a general storage system that hides data access patterns from the servers running it, protecting user privacy. Shroud functions as a virtual disk with a new privacy guarantee: the user can look up a block without revealing the block’s address. Such a virtual disk can be used for many purposes, including map lookup, microblog search, and social networking. Shroud aggressively targets hiding accesses among hundreds of terabytes of data. We achieve our goals by adapting oblivious RAM algorithms to enable large-scale parallelization. Speciﬁcally, we show, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency. Our evaluation combines large-scale emulation with an implementation on secure coprocessors and suggests that these adaptations bring private data access closer to practicality. Available Media Getting Real: Lessons in Transitioning Research Simulations into Hardware Systems Mohit Saxena, Yiying Zhang, Michael M. Swift, Andrea C. Arpaci Dusseau, and Remzi H. Arpaci Dusseau, University of Wisconsin—Madison Flash-based solid-state drives have revolutionized storage with their high performance. Their sophisticated internal mechanisms have led to a plethora of research on how to optimize applications, ﬁle systems, and internal SSD designs. Due to the closed nature of commercial devices though, most research on the internals of an SSD,s uch as enhanced ﬂash-translation layers, is performed using simulation or emulation. Without implementation in real devices, it can be difﬁcult to judge the true beneﬁt of the proposed designs. In this paper, we describe our efforts to implement two new SSD designs that change both the internal workings of the device and its interface to the host operating system. Using the OpenSSD Jasmine board, we develop a prototype of FlashTier’s Solid State Cache (SSC) and of the Nameless Write SSD. While the ﬂash-translation layer changes were straightforward, we discovered unexpected complexities in implementing extensions to the storage interface. We describe our implementation process and extract a set of lessons applicable to other SSD prototypes. With our prototype we validate the performance claims of FlashTier and show a 45-52% performance improvement over caching with an SSD and a 90% reduction in erases. Available Media To Zip or Not to Zip: Effective Resource Usage for Real-Time Compression Danny Harnik, Ronen Kat, Oded Margalit, Dmitry Sotnikov, and Avishay Traeger, IBM Research—Haifa Real-time compression for primary storage is quickly becoming widespread as data continues to grow exponentially, but adding compression on the data path consumes scarce CPU and memory resources on the storage system. Our work aims to mitigate this cost by introducing methods to quickly and accurately identify the data that will yield signiﬁcant space savings when compressed. The ﬁrst level of ﬁltering that we employ is at the dataset level (e.g., volume or ﬁle system), where we estimate the overall compressibility of the data at rest. According to the outcome, we may choose to enable or disable compression for the entire data set, or to employ a second level of ﬁner-grained ﬁltering. The second ﬁltering scheme examines data being written to the storage system in an online manner and determines its compressibility. The ﬁrst-level ﬁltering runs in mere minutes while providing mathematically proven guarantees on its estimates. In addition to aiding in selecting which volumes to compress, it has been released as a public tool, allowing potential customers to determine the effectiveness of compression on their data and to aid in capacity planning. The second-level ﬁltering has shown signiﬁcant CPU savings (up to 35%) while maintaining compression savings (within 2%). Available Media
5:30 p.m.–7:30 p.m.	Thursday
Poster Session and Reception II Regency Ballroom Session Chair: Nitin Agrawal, NEC Labs The list of accepted posters is available here.

Friday, February 15, 2013

9:00 a.m.–10:30 a.m.	Friday
Flash and SSDs Imperial Ballroom Session Chair: Sam H. Noh, Hongik University LDPC-in-SSD: Making Advanced Error Correction Codes Work Effectively in Solid State Drives Kai Zhao, Rensselaer Polytechnic Institute; Wenzhe Zhao and Hongbin Sun, Xi'an Jiaotong University; Tong Zhang, Rensselaer Polytechnic Institute; Xiaodong Zhang, The Ohio State University; Nanning Zheng, Xi'an Jiaotong University Conventional error correction codes (ECCs), such as the commonly used BCH code, have become increasingly inadequate for solid state drives (SSDs) as the capacity of NAND ﬂash memory continues to increase and its reliability continues to degrade. It is highly desirable to deploy a much more powerful ECC, such as low-density parity-check (LDPC) code, to signiﬁcantly improve the reliability of SSDs. Although LDPC code has had its success in commercial hard disk drives, to fully exploitits error correction capability in SSDs demands unconventional ﬁne-grained ﬂash memory sensing, leading to an increased memory read latency. To address this important but largely unexplored issue, this paper presents three techniques to mitigate the LDPC-induced response time delay so that SSDs can beneﬁt its strong error correction capability to the full extent. We quantitatively evaluate these techniques by carrying out trace-based SSD simulations with runtime characterization of NAND ﬂash memory reliability and LDPC code decoding. Our study based on intensive experiments shows that these techniques used in an integrated way in SSDs can reduce the worst-case system read response time delay from over 100% down to below 20%. With our proposed techniques, a strong ECC alternative can be used in NAND ﬂash memory to retain its reliability to respond the continuous cost reduction, and its relatively small increase of response time delay is acceptable to mainstream application users, considering a huge gain in SSD capacity, its reliability, and the price reduction. Available Media Extending the Lifetime of Flash-based Storage through Reducing Write Amplification from File Systems Youyou Lu, Jiwu Shu, and Weimin Zheng, Tsinghua University Flash memory has gained in popularity as storage devices for both enterprise and embedded systems because of its high performance, low energy and reduced cost. The endurance problem of ﬂash memory, however, is still a challenge and is getting worse as storage density increases with the adoption of multi-level cells (MLC). Prior work has addressed wear leveling and data reduction, but there is signiﬁcantly less work on using the ﬁle system to improve ﬂash lifetimes. Some common mechanisms in traditional ﬁle systems, such as journaling, metadata synchronization, and page-aligned update, can induce extra write operations and aggravate the wear of ﬂash memory. This problem is called write ampliﬁcation from ﬁle systems. In order to mitigate write ampliﬁcation, we propose an object-based ﬂash translation layer design (OFTL), in which mechanisms are co-designed with ﬂash memory. By leveraging page metadata, OFTL enables lazy persistence of index metadata and eliminates journals while keeping consistency. Coarse-grained block state maintenance reduces persistent free space management overhead. With byte-unit access interfaces, OFTL is able to compact and co-locate the small updates with metadata to further reduce updates. Experiments show that an OFTL-based system, OFSS, offers a write ampliﬁcation reduction of 47.4% ˜ 89.4% in SYNC mode and 19.8% ˜ 64.0% in ASYNC mode compared with ext3, ext2, and btrfs on an up-to-date page-level FTL. Available Media Understanding the Robustness of SSDs under Power Fault Mai Zheng, The Ohio State University; Joseph Tucek, HP Labs; Feng Qin, The Ohio State University; Mark Lillibridge, HP Labs Modern storage technology (SSDs, No-SQL databases, commoditized RAID hardware, etc.) bring new reliability challenges to the already complicated storage stack. Among other things, the behavior of these new components during power faults—which happen relatively frequently in data centers—is an important yet mostly ignored issue in this dependability-critical area. Understanding how new storage components behave under power fault is the ﬁrst step towards designing new robust storage systems. In this paper, we propose a new methodology to expose reliability issues in block devices under power faults. Our framework includes specially-designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our testing framework, we test ﬁfteen commodity SSDs from ﬁve different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that thirteen out of the ﬁfteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure. Available Media
10:30 a.m.–11:00 a.m.	Friday
Break Market Street Foyer
11:00 a.m.–12:20 p.m.	Friday
Performance Improvements and Measurements Imperial Ballroom Session Chair: Kiran-Kumar Muniswamy-Reddy, Amazon.com Gecko: Contention-Oblivious Disk Arrays for Cloud Storage Ji Yong Shin, Cornell University; Mahesh Balakrishnan, Microsoft Research; Tudor Marian, Google; Hakim Weatherspoon, Cornell University Disk contention is increasingly a signiﬁcant problem for cloud storage, as applications are forced to co-exist on machines and share physical disk resources. Disks are notoriously sensitive to contention; a single application’s random I/O is sufﬁcient to reduce the throughput of a disk array by an order of magnitude, disrupting every other application running on the same array. Log-structured storage designs can alleviate write-write contention between applications by sequentializing all writes, but have historically suffered from read-write contention triggered by garbage collection (GC) as well as application reads. Gecko is a novel log-structured design that eliminates read-write contention by chaining together a small number of drives into a single log, effectively separating the tail of the log (where writes are appended) from its body. As a result, writes proceed to the tail drive without contention from either GC reads or ﬁrst-class reads, which are restricted to the body of the log with the help of a tail-speciﬁc caching policy. Gecko trades-off maximum contention-free sequential throughput from multiple drives in exchange for a stable and predictable maximum throughput from a single uncontended drive, and achieves better performance compared to native log-structured or RAID based systems for most cases. Our in-kernel implementation provides random write bandwidth to applications of 60 to 120MB/s, despite concurrent GC activity, application reads, and an adversarial workload. Available Media Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions James S. Plank, University of Tennessee; Kevin M. Greenan, EMC Backup Recovery Systems Division; Ethan L. Miller, University of California, Santa Cruz This paper has been removed because of a dispute over its contents. Please see this memo from February 17, 2015, for more information. Available Media Virtual Machine Workloads: The Case for New NAS Benchmarks Vasily Tarasov, Stony Brook University; Dean Hildebrand, IBM Research—Almaden; Geoff Kuenning, Harvey Mudd College; Erez Zadok, Stony Brook University Network Attached Storage (NAS) and Virtual Machines (VMs) are widely used in data centers thanks to their manageability, scalability, and ability to consolidate resources. But the shift from physical to virtual clients drastically changes the I/O workloads seen on NAS servers, due to guest ﬁle system encapsulation in virtual disk images and the multiplexing of request streams from different VMs. Unfortunately, current NAS workload generators and benchmarks produce workloads typical to physical machines. This paper makes two contributions. First, we studied the extent to which virtualization is changing existing NAS workloads. We observed signiﬁcant changes, including the disappearance of ﬁle system meta-data operations at the NAS layer, changed I/O sizes, and increased randomness. Second, we created a set of versatile NAS benchmarks to synthesize virtualized workloads. This allows us to generate accurate virtualized workloads without the effort and limitations associated with setting up a full virtualized environment. Our experiments demonstrate that the relative error of our virtualized benchmarks, evaluated across 11 parameters, averages less than 10%. Available Media

sponsors

twitter

usenix conference policies

You are here

connect with us

FAST '13 Technical Sessions

Wednesday, February 13, 2013

Break

Break

Thursday, February 14, 2013

Break

Lunch On Your Own

Break