All sessions will take place in the Magnolia Room unless otherwise noted.
Papers are available for download below to registered attendees now and to everyone beginning July 10, 2017. Paper abstracts are available to everyone now. Copyright to the individual works is retained by the author[s].
Downloads for Registered Attendees
(Sign in to your USENIX account to download these files.)
Monday, July 10, 2017
8:00 am–9:00 am
9:00 am–9:15 am
Program Co-Chairs: Eyal de Lara, University of Toronto, and Swaminathan Sundararaman, Parallel Machines
9:15 am–10:30 am
Session Chair: Irfan Ahmad, CachePhysics
Hao Wang and Baochun Li, University of Toronto
Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-established mantras (e.g., bandwidth scarcity). Without data-driven insights into performance bottlenecks at runtime, schedulers might blindly assign tasks to workers that are suffering from unidentified bottlenecks.
In this paper, we present Lube, a system framework that minimizes query response times by detecting and mitigating bottlenecks at runtime. Lube monitors geo-distributed data analytic queries in real-time, detects potential bottlenecks, and mitigates them with a bottleneck-aware scheduling policy. Our preliminary experiments on a real-world prototype across Amazon EC2 regions have shown that Lube can detect bottlenecks with over 90% accuracy, and reduce the median query response time by up to 33% compared to Spark’s built-in locality-based scheduler.
Hangyu Li, Hong Xu, and Sarana Nutanong, City University of Hong Kong
We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggregated by the combiner to reduce the intermediate data that needs to be shuffled across the WAN. Thus our similarity aware approach reduces the shuffle time and in turn the query completion time (QCT).
We design and implement Bohr based on OLAP data cubes to perform efficient similarity checking among datasets in different sites. Evaluation across ten sites of AWS EC2 shows that Bohr decreases the QCT by 30% compared to state-of-the-art solutions.
Shripad Nadgowda, Sahil Suneja, and Canturk Isci, IBM TJ Watson Research Center
Applications have commonly been oblivious to their cloud runtimes. This is primarily because they started their journey in IaaS clouds, running on a guestOS inside VMs. Then to increase performance, many guestOSes have been paravirtualized making them virtualization aware, so that they can bypass some of the virtualization layers, as in virtio. This approach still kept applications unmodified. Recently, we are witnessing a rapid adoption of containers due to their packaging benefits, high density, fast start-up and low overhead. Applications are increasingly being on-boarded to PaaS clouds in the form of application containers or appc, where they are run directly on a cloud substrate like Kubernetes or Docker Swarm. This shift in deployment practices present an opportunity to make applications aware of their cloud. In this paper, we present Paracloud framework for application containers and discuss the Paracloud interface (PaCI) for three cloud operations namely migration, auto-scaling and load-balancing.
10:30 am–11:00 am
Break with Refreshments
11:00 am–12:30 pm
Session Chair: Theophilus Benson, Duke University
Tian Zhang and Ryan Stutsman, University of Utah
Daniele Rogora, Universita della Svizzera italiana; Steffen Smolka, Cornell University; Antonio Carzaniga, Universita della Svizzera italiana; Amer Diwan, Google; Robert Soulé, Universita della Svizzera italiana and Barefoot Networks
Web services and applications are complex systems. Layers of abstraction and virtualization allow flexible and scalable deployment. But they also introduce complications if one wants predictable performance and easy trouble-shooting. We propose to support the designers, testers, and maintainers of such systems by annotating system components with performance models. Our goal is to formulate annotations that can be used as oracles in performance testing, that can provide valuable guidance for debugging, and that can also inform designers by predicting the performance profile of an assembly of annotated components. We present an initial formulation of such annotations together with their concrete derivation from the execution of a complex web service.
Mohammad Shahrad and David Wentzlaff, Princeton University
The performance of mobile phone processors has been steadily increasing, causing the performance gap between server and mobile processors to narrow with mobile processors sporting superior performance per unit energy. Fueled by the slowing of Moore’s Law, the overall performance of single-chip mobile and server processors have likewise plateaued. These trends and the glut of used and partially broken smartphones which become environmental e-waste motivate creating cloud servers out of decommissioned mobile phones. This work proposes creating a compute dense server built out of used and partially broken smartphones (e.g. screen can be broken). This work evaluates the total cost of ownership (TCO) benefit of using servers based on decommissioned mobile devices and analyzes some of the architectural design trade-offs in creating such servers.
12:30 pm–2:00 pm
Luncheon for Workshop Attendees
2:00 pm–3:30 pm
Session Chair: Margo Seltzer, Harvard University
Michael Kaufmann, IBM Research Zurich, Karlsruhe Institute of Technology; Kornilios Kourtis, IBM Research Zurich
Heterogeneity is a growing concern for scheduling on the cloud. Hardware is increasingly heterogeneous (e.g., GPUs, FPGAs, diverse I/O devices), emphasizing the need to build schedulers that identify the internal structure of applications and utilize available hardware resources to their full potential. In this paper we present our initial efforts to build a scheduler that tackles heterogeneity (in hardware and in software) as a primary concern. Our scheduler, HCl (Heterogeneous Cluster), models applications as annotated directed acyclic graphs (DAGs), where each node represents a task. It maps tasks onto hardware nodes, also organized in DAGs. Initial results using application models based on TPC-DS queries running on Apache Spark show that HCl can improve significantly upon approaches that do not consider heterogeneity and generate schedules that approach the critical path in length.
James Gleeson and Eyal de Lara, University of Toronto
Emerging cloud markets like spot markets and batch computing services scale up services at the granularity of whole VMs. In this paper, we observe that GPU workloads underutilize GPU device memory, leading us to explore the benefits of reallocating heterogeneous GPUs within existing VMs. We outline approaches for upgrading and downgrading GPUs for OpenCL GPGPU workloads, and show how to minimize the chance of cloud operator VM termination by maximizing the heterogeneous environments in which applications can run.
Sandeep D'souza and Ragunathan (Raj) Rajkumar, Carnegie Mellon University
Emerging Cyber-Physical Systems (CPS) such as connected vehicles and smart cities span large geographical areas. These systems are increasingly distributed and interconnected. Hence, a hierarchy of cloudlet and cloud deployments will be key to enable scaling, while simultaneously hosting the intelligence behind these systems. Given that CPS applications are often safety-critical, existing techniques focus on reducing latency to provide real-time performance. While low latency is useful, a shared and precise notion of time is key to enabling coordinated action in distributed CPS. In this position paper, we argue for a global Quality of Time (QoT)-based architecture, centered around a shared virtualized notion of time, based on the timeline abstraction. Our architecture allows applications to specify their QoT requirements, while exposing timing uncertainty to the application. The timeline abstraction with the associated knowledge of QoT enables scalable geo-distributed coordination in CPS, while providing avenues for fault tolerance and graceful degradation in the face of adversity.
3:30 pm–4:00 pm
Break with Refreshments
4:00 pm–5:30 pm
Session Chair: Ryan Stutsman, University of Utah
Supreeth Shastri and David Irwin, University of Massachusetts Amherst
Infrastructure-as-a-Service clouds are rapidly evolving into market-like environments that offer a wide range of server contracts. Amazon EC2’s spot market is the clearest example of this trend: it operates over 5000 markets globally where users can rent servers for a variable price. To exploit spot instances, while mitigating the risk of price spikes and revocations, many researchers and startups have developed techniques for modeling and predicting prices to optimize spot server selection. However, prior approaches focus largely on predicting individual server prices, which is akin to predicting the price of a single stock. We argue that researchers should instead focus on “index-based” modeling and prediction that aggregates prices from many markets in each region and availability zone. We show that, for applications flexible enough to select and “trade” servers globally, making decisions based on broader indices lowers costs and improves availability compared to index-agnostic policies.
Lianjie Cao, Purdue University; Puneet Sharma, Hewlett Packard Labs; Sonia Fahmy, Purdue University; Vinay Saxena, Hewlett Packard Enterprise
Dynamic and elastic resource allocation to Virtual Network Functions (VNFs) in accordance with varying workloads is a must for realizing promised reductions in capital and operational expenses in Network Functions Virtualization (NFV). However, workload heterogeneity and complex relationships between resources allocated to a VNF and the resulting capacity makes elastic resource flexing a challenging task. We propose an NFV resource flexing system, ENVI, that uses a combination of VNF-level features and infrastructure-level features to construct a machine-learning-based decision engine for detecting resource flexing events. ENVI also extracts the dependence relationship among VNFs in deployed Service Function Chains (SFCs) to carefully plan the sequence of resource flexing steps upon scaling detection. We present preliminary results for the accuracy of ENVI’s resource flexing decision engine with two different VNFs, namely, the caching proxy Squid and the intrusion detection system Suricata. Our preliminary results show that using a combination of features to train a neural network model is a promising approach for scaling detection.
Gábor Németh, Dániel Géhberger, and Péter Mátray, Ericsson Research
Latency-sensitive applications like virtualized telecom and industrial IoT systems require a service for ultrafast state externalization to become cloud-native. In this paper we propose a distributed shared memory system, called DAL, which achieves the lowest possible latency by transparently co-locating individual data items with applications working on them. Upon changes in data access patterns, the system automatically adapts data locations to keep the number of remote operations at a minimum. By avoiding the costs of network transport and using shared memory communication, the system can achieve 1 μs data access latency. We envision DAL as a platform component which enables latency-sensitive applications to take advantage of the cloud.
6:00 pm–7:00 pm
Joint Poster Session and Happy Hour with HotStorage '17
Sponsored by NetApp
The poster session will feature posters by authors of all papers presented at both the HotCloud and HotStorage workshops, including the HotStorage Wild and Crazy Ideas (WACI).
Tuesday, July 11, 2017
8:00 am–9:00 am
9:00 am–10:30 am
Shared Keynote Address with HotStorage '17
Santa Clara Ballroom
Mahadev Satyanarayanan, School of Computer Science, Carnegie Mellon University
Edge computing is new paradigm in which the resources of a small data center are placed at the edge of the Internet, in close proximity to mobile devices, sensors, and end users. Terms such as "cloudlets," "micro data centers," "fog, nodes" and "mobile edge cloud" have been used in the literature to refer to these edge-located computing entities. Located just one wireless hop away from associated mobile devices and sensors, they offer ideal placement for low-latency offload infrastructure to support emerging applications. They are optimal sites for aggregating, analyzing and distilling bandwidth-hungry sensor data from devices such as video cameras. In the Internet of Things, they offer a natural vantage point for organizational access control, privacy, administrative autonomy and responsive analytics. In vehicular systems, they mark the junction between the well-connected inner world of a moving vehicle and its tenuous reach into the cloud. For cloud computing, they enable fallback cloud services in hostile environments. Significant industry investments are already starting to be made in edge computing. This talk will examine why edge computing is a fundamentally disruptive technology, and will explore some of the challenges and opportunities that it presents to us.
Satya is the Carnegie Group Professor of Computer Science at Carnegie Mellon University. He received the PhD in Computer Science from Carnegie Mellon, after Bachelor's and Master's degrees from the Indian Institute of Technology, Madras. He is a Fellow of the ACM and the IEEE. He was the founding Program Chair of the HotMobile series of workshops, the founding Editor-in-Chief of IEEE Pervasive Computing, the founding Area Editor for the Synthesis Series on Mobile and Pervasive Computing, and the founding Program Chair of the First IEEE Symposium on Edge Computing. He was the founding director of Intel Research Pittsburgh, and was an Advisor to Maginatics, which has created a cloud-based realization of the AFS vision and was acquired by EMC in 2014.
10:30 am–11:00 am
Break with Refreshments
11:00 am–12:30 pm
Related to Protocols
Session Chair: Eyal de Lara, University of Toronto
Kamala Ramasubramanian, Kathryn Dahlgren, Asha Karim, Sanjana Maiya, Sarah Borland, UC Santa Cruz, Boaz Leskes, Elastic; Peter Alvaro, UC Santa Cruz
Verification is often regarded as a one-time procedure undertaken after a protocol is specified but before it is implemented. However, in practice, protocols continually evolve with the addition of new capabilities and performance optimizations. Existing verification tools are ill-suited to “tracking” protocol evolution and programmers are too busy (or too lazy?) to simultaneously co-evolve specifications manually. This means that the correctness guarantees determined at verification time can erode as protocols evolve. Existing software quality techniques such as regression testing and root cause analysis, which naturally support system evolution, are poorly suited to reasoning about fault tolerance properties of a distributed system because these properties require a search of the execution schedule rather than merely replaying inputs. This paper advocates that our community should explore the intersection of testing and verification to better ensure quality for distributed software and presents our experience evolving a data replication protocol at Elastic using a novel bug-finding technology called Lineage Driven Fault Injection (LDFI) as evidence.
Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols
Vaibhav Arora, Tanuj Mittal, Divyakant Agrawal, and Amr El Abbadi, University of California, Santa Barbara; Xun Xue, Zhiyanan, and Zhujianfeng, Huawei
Consensus protocols are used to provide consistency guarantees over replicated data in a distributed system, and allow a set of replicas to work together as a coherent group. Raft is a consensus protocol that is designed to be easy to understand and implement. It is equivalent to Multi-Paxos in fault-tolerance and performance. It uses a leader based approach for coordinating replication to a majority. The leader regularly informs the followers of its existence using heartbeats. All reads and writes go through the leader to ensure strong consistency. However, read-heavy workloads increase load on the leader since the followers in Raft are maintained as cold standbys. Since the algorithm itself guarantees replication to at least a majority, why not exploit this fact to serve strongly consistent reads without a leader? We propose mechanisms to use quorum reads in Raft to offload the leader and better utilize the cluster. We integrate our approach in CockroachDB, an open-source distributed SQL database which uses Raft and leader leases, to compare our proposed changes. The evaluation results with the YCSB benchmark illustrate that quorum reads result in an increased throughput of the system under read-heavy workloads, as well as lower read/write latencies.
Mohammad Noormohammadpour and Cauligi S. Raghavendra, University of Southern California; Sriram Rao and Srikanth Kandula, Microsoft
Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute multiple copies of data, cloud providers depend on inter-datacenterWANs that ought to be used efficiently considering their limited capacity and the ever-increasing data demands. In this paper, we focus on applications that transfer objects from one datacenter to several datacenters over dedicated inter-datacenter networks. We present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses forwarding trees to efficiently deliver an object from a source datacenter to required destination datacenters. With low computational overhead, DCCast selects forwarding trees that minimize bandwidth usage and balance load across all links. With simulation experiments on Google’s GScale network, we show that DCCast can reduce total bandwidth usage and tail Transfer Completion Times (TCT) by up to 50% compared to delivering the same objects via independent point-to-point (P2P) transfers.
12:30 pm–2:00 pm
Luncheon for Workshop Attendees
2:00 pm–3:00 pm
Content Distribution Networks
Session Chair: Nisha Talagala, Parallel Machines
Usama Naseer and Theophilus Benson, Duke University
The web serving protocol stack is constantly changing and evolving to tackle technological shifts in networking infrastructure and website complexity. For example, Cubic to tackle high throughput, SPDY to tackle loss and QUIC to tackle security issues and lower connection setup time. Accordingly, there are a plethora of protocols and configuration parameters that enable the web serving protocol stack to address a variety of realistic conditions. Yet, despite the diversity in end-user networks and devices, today, most content providers have adopted a “one-size-fits-all” approach to configuring user facing web stacks (CDN servers).
In this paper, we illustrate the drawbacks through empirical evidence that this “one-size-fits-all” approach results in sub-optimal performance and argue for a novel framework that extends existing CDN architectures to provide programmatic control over the configuration options of the CDN serving stack.
Debopam Bhattacherjee, ETH Zurich; Muhammad Tirmazi, LUMS; Ankit Singla, ETH Zurich
Many popular Web services use CDNs to host their content closer to users and thus improve page load times. While this model’s success is beyond question, it has its limits: for users with poor last-mile latency even to a nearby CDN node, the many RTTs needed to fetch a Web page add up to large delays. Thus, in this work, we explore a complementary model of speeding up Web page delivery—a content gathering network (CGN), whereby users establish their own geo-distributed presence, and use these points of presence to proxy content for them. We show that deploying only 14 public cloud-based CGN nodes puts the closest node within a median RTT of merely 4.8 ms (7.2 ms) from servers hosting the top 10k (100k) most popular Web sites. The CGN node nearest to a server can thus obtain content from it rapidly, and then transmit it to the client over fewer (limited by available bandwidth) high-latency interactions using aggressive transport protocols. This simple approach reduces the median page load time across 100 popular Web sites by as much as 53%, and can be deployed immediately without depending on any changes to Web servers at an estimated cost of under $1 per month per user.
3:00 pm–3:30 pm
Break with Refreshments
3:30 pm–4:30 pm
Security & Provenence
Session Chair: Shripad Nadgowda, IBM T.J. Watson Research Center
Xueyuan Han, Thomas Pasquier, Tanvi Ranjan, Mark Goldstein, and Margo Seltzer, Harvard University
We present FRAPpuccino (or FRAP), a provenance-based fault detection mechanism for Platform as a Service (PaaS) users, who run many instances of an application on a large cluster of machines. FRAP models, records, and analyzes the behavior of an application and its impact on the system as a directed acyclic provenance graph. It assumes that most instances behave normally and uses their behavior to construct a model of legitimate behavior. Given a model of legitimate behavior, FRAP uses a dynamic sliding window algorithm to compare a new instance’s execution to that of the model. Any instance that does not conform to the model is identified as an anomaly. We present the FRAP prototype and experimental results showing that it can accurately detect application anomalies.
Yan Zhai, University of Wisconsin Madison; Qiang Cao and Jeffrey Chase, Duke University; Michael Swift, University of Wisconsin Madison
One way to establish trust in a service is to know what code it is running. However, verified code identity is currently not possible for programs launched on a cloud by another party. We propose an approach to integrate support for code attestation—authenticated statements of code identity—into layered cloud platforms and services.
To illustrate, this paper describes TapCon, an attesting container manager that provides source-based attestation and network-based authentication for containers on a trusted cloud platform incorporating new features for code attestation. TapCon allows a third party to verify that an attested container is running specific code bound securely to an identified source repository. We also show how to use attested code identity as a basis for access control. This structure enables new use cases such as joint data mining, in which two data owners agree on a safe analytics program that protects the privacy of their inputs, and then ensure that only the designated program can access their data.