All sessions will be held in the Santa Clara Ballroom unless otherwise noted.
Papers are available for download below to registered attendees now and to everyone beginning July 10, 2017. Paper abstracts are available to everyone now. Copyright to the individual works is retained by the author[s].
Downloads for Registered Attendees
(Sign in to your USENIX account to download these files.)
Monday, July 10, 2017
8:00 am–9:00 am
Program Co-Chairs: Marcos Aguilera, VMware; Angela Demke Brown, University of Toronto
9:15 am–10:30 am
Distributed and Data Center Storage
Session Chair: Daniel Ellard, Raytheon BBN Technologies
Practical Web-based Delta Synchronization for Cloud Storage Services
He Xiao and Zhenhua Li, Tsinghua University; Ennan Zhai, Yale University; Tianyin Xu, UCSD
Understanding Rack-Scale Disaggregated Storage
Sergey Legtchenko, Hugh Williams, Kaveh Razavi, Austin Donnelly, Richard Black, Andrew Douglas, Nathanael Cheriere, Daniel Fryer, Kai Mast, Angela Demke Brown, Ana Klimovic, Andy Slowey, and Antony Rowstron, Microsoft Research
Disaggregation of resources in the data center, especially at the rack-scale, offers the opportunity to use valuable resources more efficiently. It is common that mass storage racks in large-scale clouds are filled with servers with Hard Disk Drives (HDDs) attached directly to each of them, either using SATA or SAS depending on the number of HDDs.
What does disaggregated storage mean for these racks? We define four categories of in-rack disaggregation: complete, dynamic elastic, failure, and configuration disaggregation. We explore the benefits and impact of these design points by building a highly flexible research storage fabric, that allows us to build example systems that embody the four designs.
BARNS: Towards Building Backup and Recovery for NoSQL Databases
Atish Kathpal and Priya Sehgal, NetApp
While NoSQL databases are gaining popularity for business applications, they pose unique challenges towards backup and recovery. Our solution, BARNS addresses these challenges, namely taking: a) cluster consistent backup and ensuring repair free restore, b) storage efficient backups, and c) topology oblivious backup and restore. Due to eventual consistency semantics of these databases, traditional database backup techniques of performing quiesce do not guarantee cluster consistent backup. Moreover, taking crash consistent backup increases recovery time due to the need for repairs. In this paper, we provide detailed solutions for taking backup of two popular, but architecturally different NoSQL DBs, Cassandra and MongoDB, when hosted on shared storage. Our solution leverages database distribution and partitioning knowledge along with shared storage features such as snapshots, clones to efficiently perform backup and recovery of NoSQL databases. Our solution gets rid of replica copies, thereby saving ~66% backup space (under 3x replication). Our preliminary evaluation shows that we require a constant restore time of ~2-3 mins, independent of backup dataset and cluster size.
10:30 am–11:00 am
Break with Refreshments
11:00 am–11:50 am
Session Chair: Song Jiang, University of Texas at Arlington
Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory
Virendra J. Marathe, Margo Seltzer, Steve Byan, and Tim Harris, Oracle Labs
We report our experience building and evaluating pmemcached, a version of memcached ported to byte-addressable persistent memory. Persistent memory is expected to not only improve overall performance of applications’ persistence tier, but also vastly reduce the “warm up” time needed for applications after a restart. We decided to test this hypothesis on memcached, a popular key-value store. We took the extreme view of persisting memcached’s entire state, resulting in a virtually instantaneous warm up phase. Since memcached is already optimized for DRAM, we expected our port to be a straightforward engineering effort. However, the effort turned out to be surprisingly complex during which we encountered several non-trivial problems that challenged the boundaries of memcached’s architecture. We detail these experiences and corresponding lessons learned.
Efficient Memory Mapped File I/O for In-Memory File Systems
Jungsik Choi, Sungkyunkwan University; Jiwon Kim, ATTO Research; Hwansoo Han, Sungkyunkwan University
Recently, with the emergence of low-latency NVM storage, software overhead has become a greater bottleneck than storage latency, and memory mapped file I/O has gained attention as a means to avoid software overhead. However, according to our analysis, memory mapped file I/O incurs a significant amount of additional overhead. To utilize memory mapped file I/O to its true potential, such overhead should be alleviated. We propose map-ahead, mapping cache, and extended madvise techniques to maximize the performance of memory mapped file I/O on lowlatency NVM storage systems. This solution can avoid both page fault overhead and page table entry construction overhead. Our experimental results show throughput improvements of 38–70% in microbenchmarks and performance improvements of 6–18% in real applications compared to existing memory mapped I/O mechanisms.
11:50 am–12:40 pm
File System Reliability
Session Chair: Keith Smith, NetApp
CrashMonkey: A Framework to Automatically Test File-System Crash Consistency
Ashlie Martinez and Vijay Chidambaram, University of Texas at Austin
Modern file systems employ complex techniques to ensure that they can recover efficiently in the event of a crash. However, there is little infrastructure for systematically testing crash consistency in file systems. We introduce CrashMonkey, a simple, flexible, file-system-agnostic test framework to systematically check file systems for inconsistencies if a failure occurs during a filesystem operation. CrashMonkey is modular and flexible, allowing the users to easily specify different test workloads and custom consistency checks.
Understanding the Fault Resilience of File System Checkers
Om Rameshwar Gatla and Mai Zheng, New Mexico State University
File system checkers serve as the last line of defense to recover a corrupted file system back to a consistent state. Therefore, their reliability is critically important. Motivated by real accidents, in this paper we study the behavior of file system checkers under faults. We systematically inject emulated faults to interrupt the checkers and examine the impact on the file system images. In doing so, we answer two important questions: Does running the checker after an interrupted-check successfully return the file system to a correct state? If not, what goes wrong? Our results show that there are vulnerabilities in popular file system checkers which could lead to unrecoverable data loss under faults.
12:40 pm–2:15 pm
Luncheon for Workshop Attendees
2:15 pm–3:30 pm
Session Chair: Marcos Aguilera, VMware
DeclStore: Layering Is for the Faint of Heart
Noah Watkins, Michael A. Sevilla, Ivo Jimenez, Kathryn Dahlgren, Peter Alvaro, Shel Finkelstein, and Carlos Maltzahn, UC Santa Cruz
Popular storage systems support diverse storage abstractions by providing important disaggregation benefits. Instead of maintaining a separate system for each abstraction, unified storage systems, in particular, support standard file, block, and object abstractions so the same hardware can be used for a wider range and a more flexible mix of applications. As large-scale unified storage systems continue to evolve to meet the requirements of an increasingly diverse set of applications and next-generation hardware, de jure approaches of the past—based on standardized interfaces—are giving way to domain-specific interfaces and optimizations. While promising, the ad-hoc strategies characteristic of current approaches to co-design are untenable.
Storage on Your SmartPhone Uses More Energy Than You Think
Jayashree Mohan, Dhathri Purohith, Matthew Halpern, Vijay Chidambaram, and Vijay Janapa Reddi, University of Texas at Austin
Energy consumption is a key concern for mobile devices. Prior research has focused on the screen and the network as the major sources of energy consumption. Through carefully designed measurement-based experiments, we show that for certain storage-intensive workloads, the storage subsystem on an Android smartphone consumes a significant amount of energy (36%), on par with screen energy consumption. We analyze the energy consumption of different storage primitives, such as sequential and random writes, on two popular mobile file systems, ext4 and F2FS. In addition, since most Android applications use SQLite for storage, we analyze the energy consumption of different SQLite operations. We present several interesting results from our analysis: for example, random writes consume 15× higher energy than sequential writes, and that F2FS consumes half the energy as ext4 for most workloads. We believe our results contribute useful design guidelines for the developers of energy-efficient mobile file systems.
Vulnerability Analysis of On-Chip Access-Control Memory
Chintan Chavda and Ethan C. Ahn, University of Texas at San Antonio; Yu-Sheng Chen, Industrial Technology Research Institute; Youngjae Kim, Sogang University; Kalidas Ganesh and Junghee Lee, University of Texas at San Antonio
Encryption is often employed to protect sensitive information stored in memory and storage. It is the most powerful countermeasure against data breach, but it has performance overhead. As a low-cost alternative to encryption, an access-control memory (ACM) has been introduced, which integrates an access-control mechanism with memory. While ACM minimizes the performance overhead of encryption, it provides similar levels of security as to encryption method. ACM reveals information only when the access codes are correct. However, if an adversary attempts to access data directly from memory cells through a physical attack without going through a standard interface, the vulnerability could occur. This paper discusses feasibility and countermeasures for physical attacks, including fault injection attack, power analysis attack, chip modification, microprobing, and imaging for ACM. Moreover, as a concrete example of ACM, we compare the security aspects of SSDs when the write buffers in the SSDs employ ACM with emerging non-volatile memories such as STTRAM, PRAM, and RRAM.
3:30 pm–4:00 pm
Break with Refreshments
4:00 pm–5:10 pm
Wild and Crazy Ideas
Session Chair: Angela Demke Brown, University of Toronto
Lightweight KV-based Distributed Store for Datacenters
Chanwoo Chung, Massachusetts Institute of Technology; Jinhyung Koo, Inha University; Arvind, Massachusetts Institute of Technology; Sungjin Lee, Daegu Gyeongbuk Institute of Science & Technology
A great deal of digital data is generated every day by content providers, end-users, and even IoT sensors. This data is stored in and managed by thousands of distributed storage nodes, each comprised of a power-hungry x86 Xeon server with a huge amount of DRAM and an array of HDDs or SSDs grouped by RAID. Such clusters take up a large amount of space in datacenters and require a lot of electricity and cooling facilities. Therefore, packing as much data as possible into a smaller datacenter space and managing it in an energy- and performance-efficient manner can result in enormous savings.
POSIX is Dead! Long Live... errr... What Exactly?
Erez Zadok, Stony Brook University; Dean Hildebrand, IBM Research - Almaden; Geoff Kuenning, Harvey Mudd College; Keith A. Smith, NetApp
The POSIX system call interface is nearly 30 years old. It was designed for serialized data access to local storage, using single computers with single CPUs: today’s computers are much faster, more complex, and require access to remote data. Security was not a major concern in POSIX’s design, leading to numerous TOCTTOU attacks over the years and forcing programmers to work around these limitations. Serious bugs are still being discovered, bugs that could have been averted with a more secure POSIX API design. POSIX’s main programming model expects users to issue one synchronous call at a time and wait for its results before issuing the next. Today’s programmers expect to issue many asynchronous requests at a time to improve overall throughput. POSIX’s synchronous one-at-a-time API is particularly bad for accessing remote and cloud objects, where high latencies dominate.
Open Addressable Device Tiers
Andy Kowles, Seagate
The lack of determinism and the opaque nature of the existing Media Cache (MC) based Drive Managed Shingled Magnetic Recording (DM-SMR) and Host Aware (HA-SMR) designs have proven fatal for broad acceptance of shingled disks in enterprise storage systems. While the improved density and relatively high quality of SMR is welcome, systems designers cannot accept the latency variances and modal slowdowns inherent to SMR-related autonomous behavior, such as cache cleaning. This proposal aims to publicly fix that by exposing the MC (or other internal device tiers) via a simple block addressing scheme for system software to utilize. Tail latencies and overall performance for data the system designers designate can also be improved by creative use of the Openly Addressed Tiers (OATs) scheme presented.
Snapshot Judgements: Obtaining Data Insights without Tracing
Wenxuan Wang, Emory University; Ian F. Adams, Intel Labs; Avani Wildani, Emory University
Metadata snapshots are a favored method for gaining filesystem insights due to their small size and relative ease of acquisition compared to access traces. Since snapshots do not include an access history; typically they are used for relatively simple analyses such as file lifetime and size distributions, and researchers still gather and store full block or file access traces for any higher level analysis such as cache prediction or scheduling variable replication. We claim that one can gain rich insights into file system and user behavior by clustering metadata snapshots and comparing the entropy within clusters to the entropy within natural partitions such as directory hierarchies or single attributes. We have preliminary results indicating that agglomerative clustering methods produce groups of data with high information purity, which may be a sign of functional correlation.
Belief-Based Storage Systems
Dusan Ramljak and Krishna Kant, Temple University
The current data growth and consequent increase in complexity of its usage patterns indicate that intelligent management of data storage is becoming ever more crucial . We rely on the claim that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many high-performance applications, offers significant potential to enable many I/O optimizations. We developed a compact flexible caching and pre-fetching framework that could, potentially address any imposed reliability, performance, energy efficiency requirement and have the ability to add any relevant information. Here, we discuss possible ways to extend this framework towards belief based storage systems.
Fighting for a Niche: An Evolutionary Model of Storage
Avani Wildani, Emory University
We know intuitively that if two devices, such as HDVD and Blu-Ray discs, have no clear difference in terms of cost, speed, or capacity, one will eventually leave the market. In evolutionary biology, this principle is termed “competitive exclusion”: if species occupy the same environmental niche, one will eventually outcompete the others (though predicting which will live is out of scope, we can see that the niche can only support one). Our wild and crazy plan is to project the future of storage systems by tracking the co-evolution of devices along with the underlying storage marketplace, or our “niche.”
Big Data Gets Bigger: What about Data Cleaning Analytics as a Storage Service?
Ayat Fekry, University of Cambridge
The success of big data solutions principally rely on the timely extraction of valuable insights from data. This will continue to become more challenging due to the growths in data volume without corresponding increase in velocity. We advocate that storage systems of the future should include functionality to detect and harness fundamental data characteristics such as similarity and correlation. This has the potential to optimize storage space, reduce amount of processing needed for further information extraction, and save I/O and network communications.
6:00 pm–7:00 pm
Joint Poster Session and Happy Hour with HotCloud
Sponsored by NetApp
The poster session will feature posters by authors of all papers presented at both the HotCloud and HotStorage workshops, including the HotStorage Wild and Crazy Ideas (WACI).
Tuesday, July 11, 2017
8:00 am–9:00 am
9:00 am-10:30 am
Shared Keynote Address with HotCloud '17
Santa Clara Ballroom
Edge Computing: Vision and Challenges
Mahadev Satyanarayanan, School of Computer Science, Carnegie Mellon University
Edge computing is new paradigm in which the resources of a small data center are placed at the edge of the Internet, in close proximity to mobile devices, sensors, and end users. Terms such as "cloudlets," "micro data centers," "fog, nodes" and "mobile edge cloud" have been used in the literature to refer to these edge-located computing entities. Located just one wireless hop away from associated mobile devices and sensors, they offer ideal placement for low-latency offload infrastructure to support emerging applications. They are optimal sites for aggregating, analyzing and distilling bandwidth-hungry sensor data from devices such as video cameras. In the Internet of Things, they offer a natural vantage point for organizational access control, privacy, administrative autonomy and responsive analytics. In vehicular systems, they mark the junction between the well-connected inner world of a moving vehicle and its tenuous reach into the cloud. For cloud computing, they enable fallback cloud services in hostile environments. Significant industry investments are already starting to be made in edge computing. This talk will examine why edge computing is a fundamentally disruptive technology, and will explore some of the challenges and opportunities that it presents to us.
Mahadev Satyanarayanan, Carnegie Mellon University
Satya is the Carnegie Group Professor of Computer Science at Carnegie Mellon University. He received the PhD in Computer Science from Carnegie Mellon, after Bachelor's and Master's degrees from the Indian Institute of Technology, Madras. He is a Fellow of the ACM and the IEEE. He was the founding Program Chair of the HotMobile series of workshops, the founding Editor-in-Chief of IEEE Pervasive Computing, the founding Area Editor for the Synthesis Series on Mobile and Pervasive Computing, and the founding Program Chair of the First IEEE Symposium on Edge Computing. He was the founding director of Intel Research Pittsburgh, and was an Advisor to Maginatics, which has created a cloud-based realization of the AFS vision and was acquired by EMC in 2014.
10:30 am–11:00 am
Break with Refreshments
11:00 am–12:40 pm
Translation Layers and SSD Arrays
Session Chair: Michael Wei, VMware
Virtual Guard: A Track-Based Translation Layer for Shingled Disks
Mansour Shafaei and Peter Desnoyers, Northeastern University
Virtual Guard (Vguard) is a track-based static mapping translation layer for shingled magnetic recording (SMR) drives. Data is written in-place by caching data from the next track in the shingling direction, allowing direct overwrite of sectors in the target track. This enables Vguard to take advantage of track-level locality, nearly eliminating cleaning for many workloads. We compare performance of Vguard to an available drive-managed SMR drive analyzed and modeled in previous research. Vguard reduces the 99.9% latency by 15× for real-world traces, and maximum latency by 32% for synthetic random write workloads.
Improving Flash Storage Performance by Caching Address Mapping Table in Host Memory
Wookhan Jeong, Hyunsoo Cho, Yongmyung Lee, Jaegyu Lee, Songho Yoon, Jooyoung Hwang, and Donggi Lee, S/W Development Team, Memory Business, Samsung Electronics Co., Ltd.
NAND flash memory based storage devices use Flash Translation Layer (FTL) to translate logical addresses of I/O requests to corresponding flash memory addresses. Mobile storage devices typically have RAM with constrained size, thus lack in memory to keep the whole mapping table. Therefore, mapping tables are partially retrieved from NAND flash on demand, causing random-read performance degradation.
In order to improve random read performance, we propose HPB (Host Performance Booster) which uses host system memory as a cache for FTL mapping table. By using HPB, FTL data can be read from host memory faster than from NAND flash memory. We define transactional protocols between host device driver and storage device to manage the host side mapping cache. We implement HPB on Galaxy S7 smartphone with UFS device. HPB is shown to have a performance improvement of 58 - 67% for random read workload.
Managing Array of SSDs When the Storage Device Is No Longer the Performance Bottleneck
Byungseok Kim, Jaeho Kim, and Sam H. Noh, UNIST
With the advent of high performing NVMe SSDs, the bottleneck of system performance is shifting away from the traditional storage device. In particular, the I/O stack software layers have already been recognized as a heavy burden on the overall I/O. Efforts to alleviate this burden have been considered. Recently, the spotlight has been on the CPU. With computing capacity as well as the means to get the data to the processor now being limited, recent studies have suggested that processing power be pushed into where the data is residing. With devices such as 3D XPoint in the horizon, this phenomenon is expected to be aggravated.
In this paper, we focus on another component related to such changes. In particular, it has been observed that the bandwidth of the network that connects clients to storage servers is now being surpassed by storage bandwidth. Figure 1 shows the changes that are happening. We observe that the changes in the storage interface is allowing storage bandwidth to surpass that of the network. As shown in Table 1, recent developments in SSDs have resulted in individual SSDs providing read and write bandwidth in the 5GB/s and 3GB/s range, respectively, which surpasses or is close to that of 10/25/40GbE (Gigabit Ethernet) that comprise the majority of networks being supported today.
Based on this observation, in this paper, we revisit the organization of disk arrays. Specifically, we target write performance in all-flash arrays, which we interchangeably refer to as SSD arrays, that are emerging as a solution for high-end storage. As shown in Table 2, most major storage vendors carry such a solution and these products employ plenty of SSDs to achieve large capacity and high performance. Figure 2 shows how typical all-flash arrays would be connected to the network and the host. Our goal is to provide high, sustained, and consistent write performance in such a storage environment.
Parity-Stream Separation and SLC/MLC Convertible Programming for Life Span and Performance Improvement of SSD RAIDs
Yoohyuk Lim, Jaemin Lee, Cassiano Campes, and Euiseong Seo, Sungkyunkwan University
To reduce the performance and lifespan loss caused by the partial-stripe writes in SSD RAIDs, we propose two schemes: parity-stream separation and SLC/MLC convertible programming. Parity-stream separation splits the parity block stream from the data block stream to decrease valid page copy during garbage collection. In the convertible programming scheme, the flash memory blocks that are allocated for parity data are programmed in SLC mode to reduce the wear caused by programming stress, while the other flash memory blocks are written in MLC mode as usual. Evaluation shows that our scheme decreased garbage collection overhead by up to 58% and improved lifespan by up to 54%, assuming that the MLC write stress was 3.5 times that of the SLC.
12:40 pm–2:00 pm
Luncheon for Workshop Attendees
2:00 pm–3:15 pm
Session Chair: Vijay Chidambaram, The University of Texas at Austin
Enabling NVMe WRR support in Linux Block Layer
Kanchan Joshi, Kaushal Yadav, and Praval Choudhary, Samsung Semiconductors India R&D, India
There is need of differentiated I/O service when applications with diverse performance-needs share a storage-device. NVMe specification provides a method called Weighted-Round-Robin-with-urgent-priority (WRR) which can help in providing such differentiated I/O service. In Round-Robin arbitration all I/O queues are treated to be of equal priority, leading to symmetric I/O processing. While in WRR arbitration, queues can be marked urgent, high, medium or low, with provision for different weightage for each category. Onus is on host to associate priority with I/O queues and define weights.
We find that very little has been done in current Linux ecosystem when it comes to supporting WRR and making benefits reach to application. In this paper we pro-pose a method that introduces WRR support in Linux NVMe driver. This method delivers WRR capability to applications without the need of rebuilding them. Un-like affinity-based approach, it does not limit compute-ability of application. Our results demonstrate that modified driver indeed provides differentiated I/O performance among applications. Proposed work modifies only NVMe driver and is generic enough to be included in mainstream Linux kernel for supporting WRR.
IOPriority: To The Device and Beyond
Adam Manzanares, Filip Blagojevic, and Cyril Guyot, Western Digital Research
In large scale data centers, controlling tail latencies of IO requests keeps storage performance bounded and predictable, which is critical for infrastructure resource planning. This work provides a transparent mechanism for applications to pass prioritized IO commands to storage devices. As a consequence, we observe much shorter tail latencies for prioritized IO while impacting nonprioritized IO in a reasonable manner. We also provide a detailed description of the changes we made to the Linux Kernel that enable applications to pass IO priorities to a storage device. Our results show that passing priorities to the storage device is capable of decreasing tail latencies by a factor of 10x while decreasing IOPS minimally.
Request-aware Cooperative I/O Scheduling for Scale-out Database Applications
Hyungil Jo and Sung-hun Kim, Sungkyunkwan University; Sangwook Kim, Apposha; Jinkyu Jeong and Joonwon Lee, Sungkyunkwan University
Interactive data center applications suffer from the tail latency problem. Since most modern data center applications take the sharded architecture to serve scale-out services, a request comprises multiple sub-requests handled in individual back-end nodes. Depending on the state of each back-end node, a node may issue multiple I/Os for a single sub-request. Since traditional I/O scheduling operates in an application-agnostic manner, it sometimes causes a long latency gap between the responses of sub-requests, thereby delaying the response to endusers. In this paper, we propose a request-aware cooperative I/O scheduling scheme to reduce the tail latency of a database application. Our proposed scheme captures request arrival order at the front-end of an application and exploits it to make a decision for I/O scheduling in individual back-end nodes. We implemented a prototype based on MongoDB and the Linux kernel and evaluated it with a read-intensive scan workload. Experimental results show that our proposed scheme effectively reduces the latency gap between sub-requests, thereby reducing the tail latency.
3:15 pm–3:45 pm
Break with Refreshments
3:45 pm–5:25 pm
Session Chair: Nisha Talagala, Parallel Machines
CC-Log: Drastically Reducing Storage Requirements for Robots Using Classification and Compression
Santiago Gonzalez, Vijay Chidambaram, Jivko Sinapov, and Peter Stone, University of Texas at Austin
Modern robots collect a wealth of rich sensor data during their operation. While such data allows interesting analysis and sophisticated algorithms, it is simply infeasible to store all the data that is generated. However, collecting only samples of the data greatly minimizes the usefulness of the data. We present CC-LOG, a new logging system built on top of the widely-used Robot Operating System that uses a combination of classification and compression techniques to reduce storage requirements. Experiments using the Building-Wide Intelligence Robot, a mobile autonomous mobile platform capable of operating for long periods of time in human-inhabited environments, showed that our proposed system can reduce storage requirements by more than an order of magnitude. Our results indicate that there is significant unrealized potential in optimizing infrastructure commonly used in robotics applications and research.
Customizing Progressive JPEG for Efficient Image Storage
Eddie Yan, Kaiyuan Zhang, and Xi Wang, University of Washington; Karin Strauss, Microsoft Research; Luis Ceze, University of Washington
Modern image storage services, especially those associated with social media services, host massive collections of images. These images are often replicated at many different resolutions to support different devices and contexts, incurring substantial capacity overheads. One approach to alleviate these overheads is to resize them at request time. However, this approach can be inefficient, as reading full-size source images for resizing uses more bandwidth than reading pre-resized images. We propose repurposing the progressive JPEG standard and customizing the organization of image data to reduce the bandwidth overheads of dynamic resizing. We show that at a PSNR of 32 dB, dynamic resizing with progressive JPEG provides 2.5× read data savings over baseline JPEG, and that progressive JPEG with customized encode parameters can further improve these savings (up to 5.8× over the baseline). Finally, we characterize the decode overheads of progressive JPEG to assess the feasibility of directly decoding progressive JPEG images on energy-limited devices. Our approach does not require modifications to current JPEG software stacks.
Addressing the Dark Side of Vision Research: Storage
Vishakha Gupta-Cledat, Luis Remis, and Christina R Strong, Intel Labs
Data access is swiftly becoming a bottleneck in visual data processing, providing an opportunity to influence the way visual data is treated in the storage system. To foster this discussion, we identify two key areas where storage research can strongly influence visual processing run-times: efficient metadata storage and new storage formats for visual data. We propose a storage architecture designed for efficient visual data access that exploits next generation hardware and give preliminary results showing how it enables efficient vision analytics.
Canopus: Enabling Extreme-Scale Data Analytics on Big HPC Storage via Progressive Refactoring
Tao Lu, New Jersey Institute of Technology; Eric Suchyta, Jong Choi, Norbert Podhorszki, and Scott Klasky, Oak Ridge National Laboratory; Qing Liu, New Jersey Institute of Technology; Dave Pugmire and Matt Wolf, Oak Ridge National Laboratory; Mark Ainsworth, Brown University
High accuracy scientific simulations on high performance computing (HPC) platforms generate large amounts of data. To allow data to be efficiently analyzed, simulation outputs need to be refactored, compressed, and properly mapped onto storage tiers. This paper presents Canopus, a progressive data management framework for storing and analyzing big scientific data. Canopus allows simulation results to be refactored into a much smaller dataset along with a series of deltas with fairly low overhead. Then, the refactored data are compressed, mapped, and written onto storage tiers. For data analytics, refactored data are selectively retrieved to restore data at a specific level of accuracy that satisfies analysis requirements. Canopus enables end users to make trade-offs between analysis speed and accuracy on-the-fly. Canopus is demonstrated and thoroughly evaluated using blob detection on fusion simulation data.