INFLOW '16 Workshop Program

All sessions will be held in Grand Ballroom EF unless otherwise noted.

Papers are available for download below to registered attendees now and to everyone beginning November 1, 2016. Paper abstracts are available to everyone now. Copyright to the individual works is retained by the author[s].

Downloads for Registered Attendees

Tuesday, November 1, 2016

8:00 am–9:00 am

Continental Breakfast

Grand Ballroom Prefunction Area

9:00 am–10:00 am

Keynote Address I

Non-Volatile Main Memory: Potentials, Problems, and Perils

Steve Swanson, University of California, San Diego

Non-volatile main memory technologies promise large improvements in memory density (relative to DRAM) and storage performance (relative to flash), but realizing those improvements will require changes in computer system components ranging from processor architectures and operating systems to programming languages and network protocols. The breadth and depth of technical challenges and opportunities that these memories offer makes them exciting but also complicates their path to wide-spread use. I will survey some of the problems they raise and describe some of the solutions my group and others have proposed.

10:00 am–10:30 am

Break with Refreshments

Grand Ballroom Prefunction Area

10:30 am–12:00 pm

Persistence of Memory

Couture: Tailoring STT-MRAM for Persistent Main Memory

Mustafa Shihab, The University of Texas at Dallas; Jie Zhang, Yonsei University; Shuwen Gao, Intel; Joseph Callenes–Sloan, The University of Texas at Dallas; Myoungsoo Jung, Yonsei University

Available Media

Modern computer systems rely extensively on dynamic random-access memory (DRAM) to bridge the performance gap between on-chip cache and secondary storage. However, continuous process scaling has exposed DRAM to high off-state leakage and excessive power consumption from frequent refresh operations. Spintransfer torque magnetoresistive RAM (STT-MRAM) is a plausible replacement for DRAM, given its high endurance and near-zero leakage. However, conventional STT-MRAM cannot directly substitute DRAM due to its large cell space area and the high latency and energy costs for writes. In this work, we present Couture – a main memory design using tailored STT-MRAM that can offer a storage density comparable to DRAM and high performance with low-power consumption. In addition, we propose an intelligent data scrubbing method (iScrub) to ensure data integrity with minimum overhead. Our evaluation results show that, equipped with the iScrub policy, our proposed Couture can achieve up to 23% performance improvement, while consuming 18% less energy, on average, compared to a contemporary DRAM.

ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture

Jie Zhang, Miryeong Kwon, Chanyoung Park, Myoungsoo Jung, and Songkuk Kim, Yonsei University

Available Media

Spin-Transfer Torque Magnetoresistive RAM (STTMRAM) is being intensively explored as a promising on-chip last-level cache (LLC) replacement for SRAM, thanks to its low leakage power and high storage capacity. However, the write penalties imposed by STT-MRAM challenges its incarnation as a successful LLC by deteriorating its performance and energy efficiency. This write performance characteristic unfortunately makes STT-MRAM unable to straightforwardly substitute SRAM in many computing systems. In this paper, we propose a hybrid non-uniform cache architecture (NUCA) by employing STT-MRAM as a read-oriented on-chip storage. The key observation here is that many cache lines in LLC are only touched by read operations without any further write updates. These cache lines, referred to as singular-writes, can be internally migrated from SRAM to STT-MRAM in our hybrid NUCA. Our approach can significantly improve the system performance by avoiding many cache read misses with the larger STT-MRAM cache blocks, while it maintains the cache lines requiring write updates in the SRAM cache. Our evaluation results show that, by utilizing the read-oriented STT-MRAM storage, our hybrid NUCA can better the performance of a conventional SRAM-only NUCA and a dead block aware STTMRAM NUCA by 30% and 60% with 45% and 8% lower energy values, respectively.

NVMOVE: Helping Programmers Move to Byte-Based Persistence

Himanshu Chauhan, The University of Texas at Austin; Irina Calciu, VMware; Vijay Chidambaram, The University of Texas at Austin; Eric Schkufza, VMware; Onur Mutlu, ETH Zurich; Pratap Subrahmanyam, VMware

Available Media

Programmers can utilize the upcoming non-volatile memory (NVM) technology in various ways. One appealing way is to directly store critical application data structures in NVM instead of serializing them to block-storage. Changing legacy code to achieve this, however, is laborious and prone to bugs. We present NVMOVE, a tool that simplifies this transition by analyzing the source code and automatically identifying persistent types, types that are serialized and persisted. Aided by this tool, programmers can modify their applications to allocate such persistent types on the non-volatilememory heap. Upon analyzing Redis, a key-value store with 122 struct types, NVMOVE identifies 25 types as persistent, with no false negatives and 11 false positives. We evaluate the benefits of NVMOVE by moving the identified persistent types in Redis onto a non-volatile memory heap. Redis modified in this manner offers full persistence of data, and performs within 78% of Redis with no persistence, achieving more than 2x the performance of Redis that performs logging on SSDs.

12:00 pm–2:00 pm

Luncheon for Workshop Attendees

Harbor Ballroom

2:00 pm–3:00 pm

Keynote Address II

The New Storage Applications: Lots of Data, New Hardware and Machine Intelligence

Nisha Talagala, Parallel Machines

We are experiencing today the intersection of multiple trends, each of which changes the storage and data landscape in powerful ways. Machines and applications generate massive quantities of new data whose value degrades over time unless it can be efficiently turned into insight. Hardware innovations, from CPU/DRAM scaling to Flash/Persistent Memory, enable data analytics ranging from classic database queries to machine learning and deep learning. New applications such as Flink, Spark, Beam, etc. have emerged for batch and real-time data processing while in-memory database technologies have also expanded to address these workloads. This talk will describe these trends, the applications and operating system stacks, the problems being solved, and new technical opportunities and challenges.

3:00 pm–3:30 pm

Break with Refreshments

Grand Ballroom Prefunction Area

3:30 pm–5:00 pm

Systems, Mostly

Enabling NVM for Data-Intensive Scientific Services

Philip Carns, John Jenkins, Sangmin Seo, Shane Snyder, and Robert B. Ross, Argonne National Laboratory; Charles D. Cranor, Carnegie Mellon University; Scott Atchley, Oak Ridge National Laboratory; Torsten Hoefler, ETH Zurich

Available Media

Specialized, transient data services are playing an increasingly prominent role in data-intensive scientific computing. These services offer flexible, on-demand pairing of applications with storage hardware using semantics that are optimized for the problem domain. Concurrent with this trend, upcoming scientific computing and big data systems will be deployed with emerging non-volatile memory (NVM) technology to achieve the highest possible price/productivity ratio. Clearly, therefore, we must develop techniques to facilitate the confluence of specialized data services and NVM technology.

In this work we explore how to enable the composition of NVM resources within transient distributed services while still retaining their essential performance characteristics. Our approach involves eschewing the conventional shared file system model and instead projecting NVM devices as remote microservices that leverage user-level threads, remote procedure call (RPC) services, remote direct memory access (RDMA) enabled network transports, and persistent memory libraries in order to maximize performance. We describe a prototype system that incorporates these concepts, evaluate its performance for key workloads on an exemplar system, and discuss how the system can be leveraged as a component of future data-intensive architectures.

ElCached: Elastic Multi-Level Key-Value Cache

Rahman Lavaee, University of Rochester; Stephen Choi and Yang-Suk Kee, Samsung Memory Solutions Lab; Chen Ding, University of Rochester

Available Media

Today’s cloud service providers (CSPs) use in-memory caching engines to improve application performance and server revenue. However, these caching engines exhibit poor scaling, mainly because of high DRAM cost and energy consumption. On the other hand, the increasing use of multi-tenancy requires effective and optimal resource provisioning.

In this paper, we introduce ElCached, a multi-level key-value cache based on Memcached. ElCached employs low-cost NAND flash memory as a lower layer of caching. ElCached uses the reuse distance model to predict miss ratio, with high accuracy, under all storage capacity limits. The miss ratio prediction allows ElCached to find the best resource allocation under multi-tenant settings. We evaluate Elcached on workloads emulating real-world applications. Our multi-tenant experiment indicates that compared to a proportional allocation technique, ElCached can reduce the cost by up to 26%, while delivering lower average latency. Meanwhile, by utilizing more flash storage, ElCached can reduce the total memory consumption almost by half.

On the Impact of Garbage Collection on Flash-Based SSD Endurance

Robin Verschoren and Benny Van Houdt, University of Antwerp

Available Media

Garbage collection has a profound impact on the write amplification in flash-based SSDs, which in turn may significantly reduce its life span. The unequal wear of data blocks further contributes to this reduced life span. In this paper we study two performance measures: the SSD endurance which assesses the life span of an SSD and the PE fairness which is a measure for the degree of unequal wear.

We demonstrate, using a mean field model and simulation, how these measures are affected by the garbage collection algorithm, spare factor, etc. Numerical results indicate that under uniform random writes there is no need to implement a wear leveling technique. For hot and cold data we see that design choices that lower the PE fairness may still result in a higher SSD endurance, which suggests that one should not emphasize too much on equaling the wear.