HotCloud '20 Workshop Program

All the times listed below are in Pacific Daylight Time (PDT).

Papers are available for download below to registered attendees now and to everyone beginning July 13, 2020. Paper abstracts are available to everyone now. Copyright to the individual works is retained by the author[s].

Downloads for Registered Attendees
(Sign in to your USENIX account to download these files.)

Attendee Files 
HotCloud '20 Paper Archive (ZIP)
HotCloud '20 Attendee List (PDF)

Monday, July 13, 2020

7:00 am–7:15 am

Opening Remarks

Program Co-Chairs: Amar Phanishayee, Microsoft Research, and Ryan Stutsman, University of Utah

7:15 am–8:30 am

Cloud Gaming and ML Systems

Session Chairs: Aurojit Panda, New York University, and Shivaram Venkataraman, University of Wisconsin—Madison

A Cloud Gaming Framework for Dynamic Graphical Rendering Towards Achieving Distributed Game Engines

James Bulman and Peter Garraghan, Lancaster University

Available Media

Cloud gaming in recent years has gained growing success in delivering games-as-a-service by leveraging cloud resources. Existing cloud gaming frameworks deploy the entire game engine within Virtual Machines (VMs) due to the tight-coupling of game engine subsystems (graphics, physics, AI). The effectiveness of such an approach is heavily dependant on the cloud VM providing consistently high levels of performance, availability, and reliability. However this assumption is difficult to guarantee due to QoS degradation within, and outside of, the cloud - from system failure, network connectivity, to consumer datacaps - all of which may result in game service outage. We present a cloud gaming framework that creates a distributed game engine via loose-coupling the graphical renderer from the game engine, allowing for its execution across cloud VMs and client devices dynamically. Our framework allows games to operate during performance degradation and cloud service failure, enabling game developers to exploit heterogeneous graphical APIs unrestricted from Operating System and hardware constraints. Our initial experiments show that our framework improves game frame rates by up to 33% via frame interlacing between cloud and client systems.

Towards Supporting Millions of Users in Modifiable Virtual Environments by Redesigning Minecraft-Like Games as Serverless Systems

Jesse Donkervliet, Animesh Trivedi, and Alexandru Iosup, VU Amsterdam

Available Media

How can Minecraft-like games become scalable cloud services? Hundreds of Minecraft-like games, that is, games acting as modifiable virtual environments (MVEs), are currently played by over 100 million players, but surprisingly they do not scale and are frequently not published as cloud services. We envision a new architecture for large-scale MVEs, supporting much larger numbers of concurrent users by scaling up and out using serverless technology. In our vision, developers focus on the game (business) logic, while cloud providers manage resource management and scheduling (RMS) and guarantee non-functional properties. We provide a definition for MVEs, model their services and deployments, present a vision for large-scale MVEs architected as serverless systems, and suggest concrete steps towards realizing this vision.

AI4DL: Mining Behaviors of Deep Learning Workloads for Resource Management

Josep L. Berral, Barcelona Supercomputing Center, Universitat Politecnica de Catalunya; Chen Wang and Alaa Youssef, IBM Research

Available Media

The more we know about the resource usage patterns of workloads, the better we can allocate resources. Here we present a methodology to discover resource usage behaviors for the training workloads of Deep Learning (DL) models. From monitoring, we can observe repeating patterns and similitude of resource usage among containers running the training workloads of different DL models. The repeating patterns observed can be leveraged by the scheduler or the resource autoscaler to reduce resource fragmentation and overall resource utilization in a dedicated DL cluster. Specifically, our approach combines Conditional Restricted Boltzmann Machines (CRBMs) and clustering techniques to discover common sequences of behaviors (phases) of containers running the model training workloads in clusters providing IBM Deep Learning Services. By studying the resource usage pattern at each phase and the typical sequences of phases among different containers, we can discover a reduced set of prototypical executions representing most executions. We use statistical information from each phase to refine resource provisioning by dynamically tuning the amount of resource each container requires at each phase of its execution. Evaluation of our method shows that container resource usage displays typical patterns that can help reduce CPU and Memory consumption by 30% relative to reactive policies, which is close to having \emph{a-priori} knowledge of resource usage while fulfilling resource demand over 95% of the time.

Spotnik: Designing Distributed Machine Learning for Transient Cloud Resources

Marcel Wagenländer, Luo Mai, Guo Li, and Peter Pietzuch, Imperial College London

Available Media

To achieve higher utilisation, cloud providers offer VMs with GPUs as lower-cost transient cloud resources. Transient VMs can be revoked at short notice and vary in their availability. This poses challenges to distributed machine learning (ML) jobs, which perform long-running stateful computation in which many workers maintain and synchronise model replicas. With transient VMs, existing systems either require a fixed number of reserved VMs or degrade performance when recovering from revoked transient VMs.

We believe that future distributed ML systems must be designed from the ground up for transient cloud resources. This paper describes SPOTNIK, a system for training ML models that features a more adaptive design to accommodate transient VMs: (i) SPOTNIK uses an adaptive implementation of the all-reduce collective communication operation. As workers on transient VMs are revoked, SPOTNIK updates its membership and uses the all-reduce ring to recover; and (ii) SPOTNIK supports the adaptation of the synchronisation strategy between workers. This allows a training job to switch between different strategies in response to the revocation of transient VMs. Our experiments show that, after VM revocation, SPOTNIK recovers training within 300 ms for ResNet/ImageNet.

Model-Switching: Dealing with Fluctuating Workloads in Machine-Learning-as-a-Service Systems

Jeff Zhang, New York University; Sameh Elnikety, Microsoft Research; Shuayb Zarar and Atul Gupta, Microsoft; Siddharth Garg, New York University

Available Media

Machine learning (ML) based prediction models, and especially deep neural networks (DNNs) are increasingly being served in the cloud in order to provide fast and accurate inferences. However, existing service ML serving systems have trouble dealing with fluctuating workloads and either drop requests or significantly expand hardware resources in response to load spikes. In this paper, we introduce Model-Switching, a new approach to dealing with fluctuating workloads when serving DNN models. Motivated by the observation that end-users of ML primarily care about the accuracy of responses that are returned within the deadline (which we refer to as effective accuracy), we propose to switch from complex and highly accurate DNN models to simpler but less accurate models in the presence of load spikes. We show that the flexibility introduced by enabling online model switching provides higher effective accuracy in the presence of fluctuating workloads compared to serving using any single model. We implement Model-Switching within Clipper, a state-of-art DNN model serving system, and demonstrate its advantages over baseline approaches.

Towards GPU Utilization Prediction for Cloud Deep Learning

Gingfung Yeung, Damian Borowiec, Adrian Friday, Richard Harper, and Peter Garraghan, Lancaster University

Available Media

Understanding the GPU utilization of Deep Learning (DL) workloads is important for enhancing resource-efficiency and cost-benefit decision making for DL frameworks in the cloud. Current approaches to determine DL workload GPU utilization rely on online profiling within isolated GPU devices, and must be performed for every unique DL workload submission resulting in resource under-utilization and reduced service availability. In this paper, we propose a prediction engine to proactively determine the GPU utilization of heterogeneous DL workloads without the need for in-depth or isolated online profiling. We demonstrate that it is possible to predict DL workload GPU utilization via extracting information from its model computation graph. Our experiments show that the prediction engine achieves an RMSLE of 0.154, and can be exploited by DL schedulers to achieve up to 61.5% improvement to GPU cluster utilization.

8:30 am–9:00 am

Break

9:00 am–10:15 am

Cloud Econ and Networking

Session Chairs: Jose Faleiro, Microsoft Research, and Daniel Williams, IBM Research

Serverless Boom or Bust? An Analysis of Economic Incentives

Xiayue Charles Lin, Joseph E. Gonzalez, and Joseph M. Hellerstein, UC Berkeley

Available Media

Serverless computing is a new paradigm that promises to free cloud users from the burden of having to provision and manage resources. However, the degree to which serverless computing will replace provisioned servers remains an open question.

To address this, we develop an economic model that aims to quantify the value of serverless to providers and customers. A simple model of incentives for rational providers and customers allows us to see, in broad strokes, when and why serverless technologies are worth pursuing. By characterizing the conditions under which mutually beneficial economic incentives exist, our model suggests that many classes of customers can already benefit from switching to a serverless model and taking advantage of autoscaling at today's price points. Our model also helps characterize technical research directions that would be likely to have impact in the market.

No Reservations: A First Look at Amazon's Reserved Instance Marketplace

Pradeep Ambati, David Irwin, and Prashant Shenoy, University of Massachusetts Amherst

Available Media

Cloud users can significantly reduce their cost (by up to 60%) by reserving virtual machines (VMs) for long periods (1 or 3 years) rather than acquiring them on demand. Unfortunately, reserving VMs exposes users to \emph{demand risk} that can increase cost if their expected future demand does not materialize. Since accurately forecasting demand over long periods is challenging, users often limit their use of reserved VMs. To mitigate demand risk, Amazon operates a Reserved Instance Marketplace (RIM) where users may publicly list the remaining time on their VM reservations for sale at a price they set. The RIM enables users to limit demand risk by either selling VM reservations if their demand changes, or purchasing variable- and shorter-term VM reservations that better match their demand forecast horizon. Clearly, the RIM's potential to mitigate demand risk is a function of its price characteristics. However, to the best of our knowledge, historical RIM prices have neither been made publicly available nor analyzed. To address the problem, we have been monitoring and archiving RIM prices for 1.75 years across all 69 availability zones and 22 regions in Amazon's Elastic Compute Cloud (EC2). This paper provides a first look at this data and its implications for cost-effectively provisioning cloud infrastructure.

Towards Plan-aware Resource Allocation in Serverless Query Processing

Malay Bag, Alekh Jindal, and Hiren Patel, Microsoft

Available Media

Resource allocation for serverless query processing is a challenge. Unfortunately, prior approaches have treated queries as black boxes, thereby missing significant resource optimization opportunities. In this paper, we proposed a plan-aware resource allocation approach where the resources are adaptively allocated based on the runtime characteristics of the query plan. We show the savings opportunity from such an allocation scheme over production SCOPE workloads at Microsoft. We present our current implementation of a greedy version that periodically estimates the peak resource for the remaining of the query as the query execution progresses. Our experimental evaluation shows that such an implementation could already save more than 8% resource usage over one of our production virtual clusters. We conclude by opening the discussion on various strategies for plan-aware resource allocation and their implications on the cloud computing stack.

Multitenancy for Fast and Programmable Networks in the Cloud

Tao Wang, New York University; Hang Zhu, Johns Hopkins University; Fabian Ruffy, New York University; Xin Jin, Johns Hopkins University; Anirudh Sivaraman, New York University; Dan R. K. Ports, Microsoft Research; Aurojit Panda, New York University

Available Media

Fast and programmable network devices are now readily available, both in the form of programmable switches and smart network-interface cards. Going forward, we envision that these devices will be widely deployed in the networks of cloud providers (e.g., AWS, Azure, and GCP) and exposed as a programmable surface for cloud customers—similar to how cloud customers can today rent CPUs, GPUs, FPGAs, and ML accelerators. Making this vision a reality requires us to develop a mechanism to share the resources of a programmable network device across multiple cloud tenants. In other words, we need to provide multitenancy on these devices. In this position paper, we design compile and run-time approaches to multitenancy. We present preliminary results showing that our design provides both efficient resource utilization and isolation of tenant programs from each other.

Securing RDMA for High-Performance Datacenter Storage Systems

Anna Kornfeld Simpson and Adriana Szekeres, University of Washington; Jacob Nelson and Irene Zhang, Microsoft Research

Available Media

RDMA is increasingly popular for low-latency communication in datacenters, marking a major change in how we build distributed systems. Unfortunately, as we pursue significant system re-designs inspired by new technology, we have not given equal thought to the consequences for system security.This paper investigates security issues introduced to datacenter systems by switching to RDMA and challenges in building secure RDMA systems. These challenges include changes in RPC reliability guarantees and unauditable data-accesses. We show how RDMA’s design makes it challenging to build secure storage systems by analyzing recent research systems; then we outline several directions for solutions and future research, with the goal of securing RDMA datacenter systems while they are still in the research and prototype stages.

10:15 am–11:00 am

Break

11:00 am–12:00 pm

Joint Keynote Address with HotStorage '20

Systems and ML at RISELab

Ion Stoica, University of California at Berkeley

Available Media

In this talk, I will present several of the projects we are developing at RISELab, a three-year old lab at UC Berkeley that focuses on building platforms and algorithms for real-time intelligent decisions, decisions that are secure and explainable. These projects include both systems to better support machine learning (ML) workloads, and leveraging ML to build better systems. In the first category, I will present Ray, a general-purpose distributed system which provides both task-parallel and actor abstractions. Ray already supports several popular libraries, including a reinforcement learning library (RLlib) and a hyperparameter search library (Tune), and it is deployed in production at tens of organizations. In the second category, I will present Autopandas, a system that synthesizes snippets of API calls from input-output examples for Pandas, the most popular data science library today, and NeuroCuts, a tool to generate decision trees that implement network packet classifiers.

Ion Stoica, University of California, Berkeley

Ion Stoica is a Professor in the EECS Department at University of California at Berkeley, and the Director of RISELab. He is currently doing research on cloud computing and AI systems. Past work includes Apache Spark, Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State (DPS). He is an ACM Fellow and has received numerous awards, including the ACM SIGOPS Mark Weiser award 2019, SIGOPS Hall of Fame Award (2015), the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). He is Executive Chairman at Databricks, a company he co-founded in 2013 to commercialize Apache Spark. In 2006 he also co-founded Conviva, a startup to commercialize technologies for large scale video distribution.

Tuesday, July 14, 2020

7:00 am–8:15 am

Isolation and Disaggregation

Session Chairs: Deian Stefan, University of California, San Diego, and Behnaz Arzani, Microsoft Research

Stratus: Clouds with Microarchitectural Resource Management

Kaveh Razavi, ETH Zürich; Animesh Trivedi, VU Amsterdam

Available Media

The emerging next generation of cloud services like Granular and Serverless computing are pushing the boundaries of the current cloud infrastructure. In order to meet the performance objectives, researchers are now leveraging low-level hardware microarchitectural resources in clouds. At the same time these resources are also a major source of security problems that can compromise the confidentiality and integrity of sensitive data in multi-tenant shared cloud infrastructures. The core of the problem is the lack of isolation due to the unsupervised sharing of microarchitectural resources across different performance and security boundaries. In this paper, we introduce Stratus clouds that treat the isolation on microarchitectural elements as the key design principle when allocating cloud resources. This isolation improves both performance and security, but at the cost of reducing resource utilization. Stratus captures this trade-off using a novel abstraction that we call isolation credit, and show how it can help both providers and tenants when allocating microarchitectural resources using Stratus’s declarative interface. We conclude by discussing the challenges of realizing Stratus clouds today.

On the Impact of Isolation Costs on Locality-aware Cloud Scheduling

Ankit Bhardwaj, Meghana Gupta, and Ryan Stutsman, University of Utah

Available Media

Serverless applications create an opportunity for more granular scheduling across machines in cloud platforms that can improve efficiency, especially if functions can be run within storage services to eliminate data movement. However, embedding code within storage services creates code isolation overheads that offset some of those savings. We argue for a new approach to serverless function scheduling that can look within serverless applications' functions, profile their data movement and networking costs, and model the impact of different code placement and isolation schemes for those costs. Beyond improvements in efficiency, such an approach would fuel innovation in cloud isolation schemes and programming abstractions, since a scheduler with a modular cost modeling approach could incorporate new schemes and automatically use them to improve efficiency for pre-existing applications.

Rethinking Isolation Mechanisms for Datacenter Multitenancy

Varun Gandhi and James Mickens, Harvard University

Available Media

In theory, trusted execution environments like SGX are promising approaches for isolating datacenter tenants. In practice, the associated hardware primitives suffer from three major problems: side channels induced by microarchitectural co-tenancy; weak guarantees for post-load software integrity; and opaque hardware implementations which prevent third-party security auditing. We explain why these limitations are so problematic for datacenters, and then propose a new approach for trusted execution. This approach, called IME (Isolated Monitor Execution) provides SGX-style memory encryption, but strictly prevents microarchitectural co-tenancy of secure and insecure code. IME also uses a separate, microarchitecturally-isolated pipeline to run dynamic security checks on monitored code, enabling post-load monitoring for security invariants like CFI or type safety. Finally, an IME processor exports a machine-readable description of its microarchitectural implementation, allowing tenants to reason about the security properties of a particular IME instance.

Disaggregation and the Application

Sebastian Angel, University of Pennsylvania; Mihir Nanavati and Siddhartha Sen, Microsoft Research

Available Media

This paper examines disaggregated data center architectures from the perspective of the applications that would run on these data centers, and challenges the abstractions that have been proposed to date. In particular, we argue that operating systems for disaggregated data centers should not abstract disaggregated hardware resources, such as memory, compute, and storage away from applications, but should instead give them information about, and control over, these resources. To this end, we propose augmenting OSes for disaggregation so as to benefit data transfer in data parallel frameworks and speedup failure recovery in replicated, fault-tolerant applications, as well as discussing some of the implementation challenges.

Towards An Application Objective-Aware Network Interface

Sangeetha Abdu Jyothi, UC Irvine, VMware Research; Sayed Hadi Hashemi and Roy Campbell, UIUC; Brighten Godfrey, UIUC, VMware

Available Media

The network representation used for conveying an application's objective in cloud environments, which we refer to as the Application Network Interface (ANI), has steadily evolved --- from packet to flow and flowlet, and more complex abstractions such as coflow. In this paper, we argue that state-of-the-art ANIs still fail to capture important application needs. Using distributed deep learning as a representative application, we show that application performance achievable using current ANIs are up to 25% lower than optimal. We analyze these ANIs to understand the missing pieces and put forward CadentFlow, an ANI with per-flow metrics and an optimization objective, to capture application requirements effectively. We discuss the opportunity for real-world implementation of a more expressive ANI and its implications on the design of network controllers and scheduling algorithms.

JACKPOT: Online Experimentation of Cloud Microservices

Mert Toslali, Boston University; Srinivasan Parthasarathy and Fabio Oliveira, IBM Research; Ayse K. Coskun, Boston University

Available Media

Online experimentation is an agile software development practice, which plays a central role in enabling rapid innovation. It helps shorten code delivery cycles, which is critical for companies to survive in a competitive software-driven market. Recent advances in cloud computing, including the maturity of container-based technologies and cloud infrastructure, as well as the advent of service meshes, have created an opportunity to broaden the scope of online experimentation and further increase developers’ agility.

In this paper, we propose a novel formulation for online experimentation of cloud applications which generalizes traditional approaches applied to web and mobile applications by incorporating the unique challenges posed by cloud environments. To enable practitioners to apply our formulation, we develop and present JACKPOT, a system for online cloud experimentation in the presence of multiple interacting microservices. We discuss an initial prototype of JACKPOT along with a preliminary evaluation of this prototype based on experiments on a public container cloud.

8:15 am–8:45 am

Break

8:45 am–10:00 am

Storage and Stream Processing

Session Chairs: Michael Wei, VMware Research, and Peter Alvaro, University of California, Santa Cruz

More IOPS for Less: Exploiting Burstable Storage in Public Clouds

Hojin Park, Gregory R. Ganger, and George Amvrosiadis, Carnegie Mellon University

Available Media

Burstable storage is a public cloud feature that enhances cloud storage volumes with credits that can be used to boost performance temporarily. These credits can be exchanged for increased storage throughput, for a short period of time, and are replenished over time. We examine how burstable storage can be leveraged to reduce cost and/or improve performance for three use cases with different data-longevity requirements: traditional persistent storage, caching, and ephemeral storage. Although cloud storage volumes are typically priced by capacity, we find that each AWS gp2 volume starts with the same number of burst credits. Exploiting that fact, we find that aggressive interchanging of large numbers of small short-term volumes can increase IOPS by up to 100 times at a cost increase of only 10–40%. Compared to an AWS io1 volume provisioned for the same performance, such interchanging reduces cost by 97.5%.

A Cloud-native Architecture for Replicated Data Services

Hemant Saxena, University of Waterloo; Jeffrey Pound, SAP Labs, Waterloo, Canada

Available Media

Many services replicate data for fault-tolerant storage of the data and high-availability of the service. When deployed in the cloud, the replication performed by these services provides the desired high-availability but does not provide significant additional fault-tolerance for the data. This is because cloud deployments use fault-tolerant storage services instead of the simple local disks that many replicated data services were designed to use. Because the cloud storage services already provide fault-tolerance for the data, the extra replicas create unnecessary cost in running the service. However, replication is still needed for high-availability of the service itself.

In this paper, we explore types of replicated data services and how they can be mapped onto various classes of cloud storage. We then propose a general architectural pattern that can be used to: (1) limit additional storage resulting in monetary cost saving, (2) while keeping the same performance for the service, and (3) maintaining the same high-availability of the services and the durability guarantees for the data. We prototype our approach in two popular open-source replicated data services, Kafka and Cassandra, and show that with relatively little modification these systems can be deployed for a fraction of the storage cost without affecting the availability guarantees, durability guarantees, or performance.

When is the Cache Warm? Manufacturing a Rule of Thumb

Lei Zhang, Emory University; Juncheng Yang, Carnegie Mellon University; Anna Blasiak, Akamai Inc and Indigo Ag; Mike McCall, Akamai Inc and Facebook Inc; Ymir Vigfusson, Emory University

Available Media

The plethora of parameters and nuanced configuration options that govern complex, large-scale caching systems restrict their designers and operators. We analyze cache warmup times that can arise in failure handling, load balancing, and cache partitioning of large-scale distributed memory and storage systems. Through simulation on traces from production CDN and storage systems, we derive rules of thumb formulas for designers and operators to use when reasoning about caches.

Resource Efficient Stream Processing Platform with Latency-Aware Scheduling Algorithms

Yuta Morisawa, Masaki Suzuki, and Takeshi Kitahara, KDDI Research, Inc.

Available Media

We presented a novel platform dedicated to stream processing that improved resource efficiency by sharing resources among applications. The platform utilized latency-aware schedulers to handle stream applications with heterogeneous SLAs and workloads. We implemented the prototype in Spark Structured Streaming and evaluated the platform with pseudo IoT services. The result showed that our platform outperformed default Spark Structured Streaming while reducing the necessary CPU cores by 36%. We further compared the adaptability of the schedulers and found that one of the schedulers reduced the SLA violations by 90% compared to the default FAIR when the platform was overloaded.

Auto-sizing for Stream Processing Applications at LinkedIn

Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty, LinkedIn Corp

Available Media

Stream processing as a platform-as-a-service (PaaS) offering is used at LinkedIn to host thousands of business-critical applications. This requires service owners to manage applications' resource sizing and tuning. Unfortunately, applications have diverged from their conventional model of a directed acyclic graph (DAG) of operators and incorporate multiple other functionalities, which presents numerous challenges for sizing. We present a controller that dynamically controls applications' resource sizing while accounting for diverse functionalities, load variations, and service dependencies, to maximize cluster utilization and minimize cost. We discuss the challenges and opportunities in designing such a controller.

10:00 am–11:00 am

Break

11:00 am–12:15 pm

Joint Keynote Address with HotStorage '20

From Hyper Converged Infrastructure to Hybrid Cloud Infrastructure

Karan Gupta, Nutanix

Available Media

This talk will cover Nutanix's journey as it transitioned from a pioneer in Hyper Converged Infrastructure to a strong contender of Hybrid Cloud Infrastructure. I will draw on examples from my seven years of experience of building distributed storage systems and LSM-based key value stores at Nutanix. I will describe challenges faced by customers and engineers on the ground, and briefly touch on challenges I see on the horizon for hybrid cloud infrastructure.

Karan Gupta, Nutanix

Karan Gupta is the principal architect at Nutanix and has over 20 years of experience in distributed filesystems. He has designed and led the evolution of hyper converged storage infrastructure for all tiers (performance and scale) of workloads. He has published multiple papers on LSM based key-value stores and won the best paper award in ATC’19. At Nutanix, he is leading the charter to build geo-distributed federated object stores for the hybrid cloud world. He started his journey in distributed systems at IBM Research Labs at Almaden.