7:00 am–8:30 am |
Monday |
Breakfast
Pfisterei |
8:00 am–9:00 am |
Monday |
Badge pickup
Untere Aula |
8:45 am–9:00 am |
Monday |
Program Chair: George Candea, École Polytechnique Fédérale de Lausanne (EPFL)
|
9:00 am–10:30 am |
Monday |
Martin Maas, University of California, Berkeley, and Oracle Labs, Cambridge;Tim Harris, Oracle Labs, Cambridge; Krste Asanović and John Kubiatowicz, University of California, Berkeley Cloud systems such as Hadoop, Spark and Zookeeper are frequently written in Java or other garbage-collected languages. However, GC-induced pauses can have a significant impact on these workloads. Specifically, GC pauses can reduce throughput for batch workloads, and cause high tail-latencies for interactive applications.
In this paper, we show that distributed applications suffer from each node’s language runtime system making GC-related decisions independently. We first demonstrate this problem on two widely-used systems (Apache Spark and Apache Cassandra). We then propose solving this problem using a Holistic Runtime System, a distributed language runtime that collectively manages runtime
services across multiple nodes.
We present initial results to demonstrate that this Holistic GC approach is effective both in reducing the impact of GC pauses on a batch workload, and in improving GC-related tail-latencies in an interactive setting.
Ionel Gog, University of Cambridge; Jana Giceva, ETH Zürich; Malte Schwarzkopf, University of Cambridge; Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, and Manuel Costa, Microsoft Research; Derek G. Murray; Steven Hand; Michael Isard Many popular systems for processing “big data” are implemented in high-level programming languages with automatic memory management via garbage collection (GC). However, high object churn and large heap sizes put severe strain on the garbage collector. As a result, applications underperform significantly: GC increases the runtime of typical data processing tasks by up to 40%.
We propose to use region-based memory management instead of GC in distributed data processing systems. In these systems, many objects have clearly defined lifetimes. Hence, it is natural to allocate these objects in fate-sharing regions, obviating the need to scan a large heap. Regions can be memory-safe and could be inferred automatically. Our initial results show that region-based
memory management reduces emulated Naiad vertex runtime by 34% for typical data analytics jobs.
Joo Seong Jeong, Woo-Yeon Lee, Yunseong Lee, Youngseok Yang, Brian Cho, Byung-Gon Chun, Seoul National University Recent big data processing systems provide quick answers to users by keeping data in memory across a cluster. As a simple way to manage data in memory, the systems are deployed as long-running workers on a static allocation of the cluster resources. This simplicity comes at a cost: elasticity is lost. Using today’s resource managers such as YARN and Mesos, this severely reduces
the utilization of the shared cluster and limits the performance of such systems. In this paper, we propose Elastic Memory, an abstraction that can dynamically change the allocated memory resource to improve resource utilization and performance. With Elastic Memory, we outline how we enable elastic interactive query processing and machine learning.
|
10:30 am–11:00 am |
Monday |
Break with Refreshments
|
11:00 am–12:30 pm |
Monday |
Qi Zhu, National University of Defense Technology, China; Meng Zhu, University of Rochester; Bo Wu, Colorado School of Mines; Xipeng Shen, North Carolina State University; Kai Shen, University of Rochester; Zhiying Wang, National University of Defense Technology, China Idle CPUs may enter power-saving hardware sleeps by, for instance, lowering the operating voltage and flushing the caches. However, wakeup delays that reach one hundred Secs or more are disrupting the operations of fast devices like solid-state disks and tightly integrated accelerators. On the other hand, maximal power savings on modern multicores are only realized through continuous, simultaneous CPU sleeps. We argue that strong software engagement (at the OS and applications) is needed to maximize the power saving while maintaining the desired performance. Specifically, we present anticipatory CPU wakeups for latency-sensitive operations on fast devices. We also explore power-saving sleep shaping opportunities through non-work-conserving scheduling on smartphones and staged bursts on servers.
Baris Kasikci, École Polytechnique Fédérale de Lausanne (EPFL); Cristiano Pereira and Gilles Pokam, Intel Corporation; Benjamin Schubert, École Polytechnique Fédérale de Lausanne (EPFL); Madanlal Musuvathi, Microsoft Research; George Candea, École Polytechnique Fédérale de Lausanne (EPFL) One of the main reasons debugging is hard and time consuming is that existing debugging tools do not provide an explanation for the root causes of failures. Additionally, existing techniques either rely on expensive runtime recording or assume existence of a given program input that reliably reproduces the failure, which makes them hard to apply in production scenarios. Consequently, developers spend precious time chasing elusive bugs, resulting in productivity loss.
We propose a new debugging technique, called failure sketching, that provides the developer with a high-level explanation for the root cause of a failure. A failure sketch achieves this goal because: 1) it only contains program statements that cause a failure; 2) it shows which program properties differ between failing and successful executions. We argue that failure sketches can be built by combining in-house static analysis and crowdsourced dynamic analysis. For building a failure sketch, we do not assume that developers can reproduce the failure. We show preliminary evidence that failure sketches can significantly improve programmer productivity.
Tyler Dwyer and Alexandra Fedorova, Simon Fraser University To attain high program performance, a developer must be conscious to the many intricacies of hardware and organize their code accordingly. This however, is not an easy task. Often the hardware is unknown to developers, or, if it is known, it is difficult to control or account for. Developers struggle with this challenge by using hardware conscious algorithms, specialized programming
languages, or doing manual low-level optimizations.
We investigate the concept of instruction organization at a more general level. In particular, we investigate if a program, running on existing hardware, can be automatically reorganized according to a chosen organization metric. Further, if the reorganization can be done automatically, a program can then be reorganized during execution to adapt to changes in system resources, and
changing execution and data access patterns.
We use data locality as an organization metric with the goal of reducing data access latency and improving program performance.
|
12:30 pm–2:00 pm |
Monday |
Lunch
Pfisterei |
2:00 pm–3:30 pm |
Monday |
Antoine Kaufmann, Simon Peter, Thomas Anderson, and Arvind Krishnamurthy, University of Washington We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing subsystems. Yet even at high speeds, flexibility in packet handling is still important for security, performance isolation, and virtualization.
Thus, our proposal moves some of the packet processing traditionally done in software to the NIC DMA controller, where it can be done flexibly and at high speed. We show how FlexNIC can benefit widely used data center server applications, such as key-value stores.
Torsten Hoefler, ETH Zürich; Robert B. Ross, Argonne National Laboratory; Timothy Roscoe, ETH Zürich Sub-microsecond network and memory latencies require fast user-level access to local and remote storage. While user-level access to local storage has been demonstrated recently, it does currently not extend to serverless parallel systems in datacenter environments. We propose direct user-level access to remote storage in a distributed setting, unifying fast data access and high-performance remote memory access programming. We discuss a minimal hardware extension of the IOMMU to enable direct remote storage access. In order to maintain optimal performance in the system, we use epoch-based accesses to allow fine-tuning of atomicity, consistency, isolation, and durability semantics. We also address the problem of user-managed coherent caching. Finally, we briefly discuss the design of DiDAFS, a Distributed Direct Access File System that enables ecient data analytics use-cases such as buered producer-consumer synchronization and key-value stores as well as deeper integration of storage into high performance computing applications.
Ignacio Castro, IMDEA Networks Institute, International Computer Science Institute, and Open University of Catalonia; Aurojit Panda, University of California, Berkeley; Barath Raghavan, International Computer Science Institute; Scott Shenker, University of California, Berkeley, and International Computer Science Institute; Sergey Gorinsky, IMDEA Networks Institute While it is widely acknowledged that the Border Gateway Protocol (BGP) has many flaws, most of the proposed fixes focus solely on improving the stability and security of its path computation. However, because interdomain routing involves contracts between Autonomous Systems (ASes), this paper argues that contractual and routing issues should be tackled jointly. We propose Route Bazaar, a backward-compatible system for flexible Internet connectivity. Inspired by the decentralized construction of trust in cryptocurrencies, Route Bazaar uses a decentralized public ledger and cryptography to provide ASes with automatic means to form, establish, and verify end-to-end connectivity agreements.
|
3:30 pm–4:00 pm |
Monday |
Break with Refreshments
|
4:00 pm–6:30 pm |
Monday |
Free Time to Enjoy Kartause Ittingen
|
6:30 pm–8:00 pm |
Monday |
Dinner
Pfisterei |
8:00 pm–9:30 pm |
Monday |
Nuno Santos, Nuno O. Duarte, Miguel B. Costa, and Paulo Ferreira, INESC-ID and Instituto Superior Técnico, Universidade de Lisboa In certain usage scenarios, mobile devices are required to operate in some constrained manner. For example, when movies are being screened in movie theaters, all devices in the room must be muted. However, typical mobile devices operate in unrestricted mode, allowing users to control their configurations. As a result, it is hard to guarantee that mobile devices operate under certain
restrictions. In this paper, we present a security architecture that enables mobile applications to temporarily restrict the functionality of devices. To this end, we introduce a novel abstraction for mobile operating systems (MOS) called trust lease, which enables devices to safely switch between modes. We discuss the design implications that need to be addressed to implement this
primitive on modern MOSes.
Ang Chen, Hanjun Xiao, Andreas Haeberlen, and Linh Thi Xuan Phan, University of Pennsylvania We propose a new approach to fault tolerance that we call bounded-time recovery (BTR). BTR is intended for systems that need strong timeliness guarantees during normal operation but can tolerate short outages in an emergency, e.g., when they are under attack. We argue that BTR could be a good fit for many cyber-physical systems. We also sketch a technical approach to providing BTR, and we discuss some challenges that still remain.
Rayman Preet Singh, University of Waterloo; Chenguang Shen, University of California, Los Angeles; Amar Phanishayee, Aman Kansal, and Ratul Mahajan, Microsoft Research Applications using connected devices are difficult to develop today because they are constructed as monolithic silos, tightly coupled to sensing devices, and must implement all data sensing and inference logic, even as devices move or are temporarily disconnected. We present Beam, a framework and runtime for distributed inference-driven applications that (i) decouples applications, inference algorithms, and devices, (ii) handles environmental dynamics, and (iii) automatically splits sensing and inference logic across devices while optimizing resource usage. Using Beam, applications only specify "what should be sensed or inferred," without worrying about "how it is sensed or inferred." Beam simplifies application development and maximizes the utility of user-owned devices. It is time to end monolithic apps for connected devices.
|