HotCloud '10 Session Abstracts

WORKSHOP PROGRAM ABSTRACTS

Tuesday, June 22, 2010

8:40 a.m.–10:00 a.m.

Seawall: Performance Isolation for Cloud Datacenter Networks
Back to Program
While today's virtual datacenters have hypervisor based mechanisms to partition compute resources between the tenants co-located on an end host, they provide little control over how tenants share the network. This opens cloud applications to interference from other tenants, resulting in unpredictable performance and exposure to denial of service attacks. This paper explores the design space for achieving performance isolation between tenants. We find that existing schemes for enterprise datacenters suffer from at least one of these problems: they cannot keep up with the numbers of tenants and the VM churn observed in cloud datacenters; they impose static bandwidth limits to obtain isolation at the cost of network utilization; they require switch and/or NIC modifications; they cannot tolerate malicious tenants and compromised hypervisors. We propose Seawall, an edge-based solution, that achieves max-min fairness across tenant VMs by sending traffic through congestion-controlled, hypervisor-to-hypervisor tunnels.

Performance Profiling in a Virtualized Environment
Back to Program
Virtualization is a key enabling technology for cloud computing. Many applications deployed in a cloud run in virtual machines. However, profilers based on CPU performance counters do not work well in a virtualized environment. In this paper, we explore the possibilities for achieving performance profiling in virtual machine monitors (VMMs) built on paravirtualization, hardware assistance, and binary translation. We present the design and implementation of performance profiling for a VMM based on the x86 hardware extensions, with some preliminary experimental results.

The Case for Energy-Oriented Partial Desktop Migration
Back to Program
Office and home environments are increasingly crowded with personal computers. Even though these computers see little use in the course of the day, they often remain powered, even when idle. Leaving idle PCs running is not only wasteful, but with rising energy costs it is increasingly more expensive. We propose partial migration of idle desktop sessions into the cloud to achieve energy-proportional computing. Partial migration only propagates the small footprint of state that will be needed during idle period execution, and returns the session to the PC when it is no longer idle. We show that this approach can reduce energy usage of an idle desktop by up to 50% over an hour and by up to 69% overnight. We show that idle desktop sessions have small working sets, up to an order of magnitude smaller than their allocated memory, enabling significant consolidation ratios. We also show that partial VM migration can save medium to large size organizations tens to hundreds of thousands of dollars annually.

Energy Efficiency of Mobile Clients in Cloud Computing
Back to Program
Energy efficiency is a fundamental consideration for mobile devices. Cloud computing has the potential to save mobile client energy but the savings from offloading the computation need to exceed the energy cost of the additional communication. In this paper we provide an analysis of the critical factors affecting the energy consumption of mobile clients in cloud computing. Further, we present our measurements about the central characteristics of contemporary mobile handheld devices that define the basic balance between local and remote computing. We also describe a concrete example, which demonstrates energy savings. We show that the trade-offs are highly sensitive to the exact characteristics of the workload, data communication patterns and technologies used, and discuss the implications for the design and engineering of energy efficient mobile cloud computing solutions.

10:20 a.m.–Noon

CloudCmp: Shopping for a Cloud Made Easy
Back to Program
Cloud computing has gained much popularity recently, and many companies now offer a variety of public cloud computing services, including Google AppEngine, Amazon AWS, and Microsoft Azure. These services differ in service models and pricing schemes, making it challenging for customers to choose the best suited cloud provider for their applications. This paper proposes a framework called CloudCmp to help a customer select a cloud provider. We outline the design of CloudCmp and highlight the main technical challenges. CloudCmp includes a set of benchmarking tools that compare the common services offered by cloud providers, and uses the benchmarking results to predict the performance and costs of a customer's application when deployed on a cloud provider. We present preliminary benchmarking results on three representative cloud providers (AppEngine, AWS, and Azure). These results show that the performance and costs of various clouds differ widely, suggesting that CloudCmp, if implemented, will have practical relevance.

Distributed Systems Meet Economics: Pricing in the Cloud
Back to Program
Cloud computing allows users to perform computation in a public cloud with a pricing scheme typically based on incurred resource consumption. While cloud computing is often considered as merely a new application for classic distributed systems, we argue that, by decoupling users from cloud providers with a pricing scheme as the bridge, cloud computing has fundamentally changed the landscape of system design and optimization. Our preliminary studies on Amazon EC2 cloud service and on a local cloud computing testbed, have revealed an interesting interplay between distributed systems and economics related to pricing. We believe that this new angle of looking at distributed systems potentially fosters new insights into cloud computing.

See Spot Run: Using Spot Instances for MapReduce Workflows
Back to Program
MapReduce is a scalable and fault tolerant framework, patented by Google, for computing embarrassingly parallel reductions. Hadoop is an open-source implementation of Google MapReduce that is made available as a web service to cloud users by the Amazon Web Services (AWS) cloud computing infrastructure. Amazon Spot Instances (SIs) provide an inexpensive yet transient and market-based option to purchasing virtualized instances for execution in AWS. As opposed to manually controlling when an instance is terminated, SI termination can also occur automatically as a function of the market price and maximum user bid price. We find that we can significantly improve the runtime of MapReduce jobs in our benchmarks by using SIs as accelerators. However, we also find that SI termination due to budget constraints during the job can have adverse affects on the runtime and may cause the user to overpay for their job. We describe new techniques that help reduce such effects.

Disaster Recovery as a Cloud Service: Economic Benefits & Deployment Challenges
Back to Program
Many businesses rely on Disaster Recovery (DR) services to prevent either manmade or natural disasters from causing expensive service disruptions. Unfortunately, current DR services come either at very high cost, or with only weak guarantees about the amount of data lost or time required to restart operation after a failure. In this work, we argue that cloud computing platforms are well suited for offering DR as a service due to their pay-as-you-go pricing model that can lower costs, and their use of automated virtual platforms that can minimize the recovery time after a failure. To this end, we perform a pricing analysis to estimate the cost of running a public cloud based DR service and show significant cost reductions compared to using privately owned resources. Further, we explore what additional functionality must be exposed by current cloud platforms and describe what challenges remain in order to minimize cost, data loss, and recovery time in cloud based DR services.

CiteSeer^x: A Cloud Perspective
Back to Program
Information retrieval applications are good candidates for hosting in a cloud infrastructure. CiteSeer^x a digital library and search engine was built with the goal of efficiently disseminating scientific information and literature over the web. The framework for CiteSeer^x as an application of the SeerSuite software is a design built with extensibility and scalability as fundamental features. This loosely coupled architecture with service oriented interfaces allows the whole or parts of SeerSuite to readily be placed in the cloud. We discuss in brief the architecture, approaches, and advantages of hosting CiteSeer^x in the cloud. We present initial results on costs of migrating whole or parts of CiteSeer^x to two popular cloud offerings as well as discuss the effort involved.

1:30 p.m.–3:10 p.m.

Spark: Cluster Computing with Working Sets
Back to Program
MapReduce and its variants have been highly successful in implementing large-scale data-intensive applications on commodity clusters. However, most of these systems are built around an acyclic data flow model that is not suitable for other popular applications. This paper focuses on one such class of applications: those that reuse a working set of data across multiple parallel operations. This includes many iterative machine learning algorithms, as well as interactive data analysis tools. We propose a new framework called Spark that supports these applications while retaining the scalability and fault tolerance of MapReduce. To achieve these goals, Spark introduces an abstraction called resilient distributed datasets (RDDs). An RDD is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost. Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.

Turning Down the LAMP: Software Specialisation for the Cloud
Back to Program
The wide availability of cloud computing offers an unprecedented opportunity to rethink how we construct applications. The cloud is currently mostly used to package up existing software stacks and operating systems (e.g. LAMP) for scaling out websites. We instead view the cloud as a stable hardware platform, and present a programming framework which permits applications to be constructed to run directly on top of it without intervening software layers. Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen. Mirage applications exhibit significant performance speedups for I/O and memory handling versus the same code running under Linux/Xen. Our results can be generalised to offer insight into improving more commonly used languages such as PHP, Python and Ruby, and we discuss lessons learnt and future directions.

Scripting the Cloud with Skywriting
Back to Program
Recent distributed computing frameworks—such as MapReduce, Hadoop and Dryad—have made it simple to exploit multiple machines in a compute cloud. However, these frameworks use coordination languages that are insufficiently expressive for many classes of computation, including iterative and recursive algorithms. To address this problem, and generalise previous approaches, we introduce Skywriting: a Turing-powerful, purely-functional script language for describing distributed computations. In this paper, we introduce the main features of Skywriting, and outline our novel cooperative task farming execution engine.

Toward Risk Assessment as a Service in Cloud Environments
Back to Program
Security and privacy assessments are considered a best practice for evaluating a system or application for potential risks and exposures. Cloud computing introduces several characteristics that challenge the effectiveness of current assessment approaches. In particular, the on-demand, automated, multi-tenant nature of cloud computing is at odds with the static, human process-oriented nature of the systems for which typical assessments were designed. This paper describes these challenges and recommends addressing them by introducing risk assessment as a service.

Information-Acquisition-as-a-Service for Cyber-Physical Cloud Computing
Back to Program
Data center cloud computing distinguishes computational services such as database transactions and data storage from computational resources such as server farms and disk arrays. Cloud computing enables a software-as-a-service business model where clients may only pay for the service they really need and providers may fully utilize the resources they actually have. The key enabling technology for cloud computing is virtualization. Recent developments, including our own work on virtualization technology for embedded systems, show that service-oriented computing through virtualization may also have tremendous potential on mobile sensor networks where the emphasis is on information acquisition rather than computation and storage. We propose to study the notion of information-acquisition-as-a-service of mobile sensor networks, instead of server farms, for cyber-physical cloud computing. In particular, we discuss the potential capabilities and design challenges of software abstractions and systems infrastructure for performing information acquisition missions using virtualized versions of aerial vehicles deployed on a fleet of high-performance model helicopters.

3:30 p.m.–4:50 p.m.

A First Look at Problems in the Cloud
Back to Program
Cloud computing provides a revolutionary model for the deployment of enterprise applications and Web services alike. In this new model, cloud users save on the cost of purchasing and managing base infrastructure, while the cloud providers save on the cost of maintaining underutilized CPU, memory, and network resources. In migrating to this new model, users face a variety of issues. Commercial clouds provide several support models to aide users in resolving the reported issues. This paper arises from our quest to understand how to design IaaS support models for more efficient user troubleshooting. Using a data driven approach, we start our exploration into this issue with an investigation into the problems encountered by users and the methods utilized by the cloud support's staff to resolve these problems. We examine message threads appearing in the forum of a large IaaS provider over a 3 year period. We argue that the lessons derived from this study point to a set of principles that future IaaS offerings can implement to provide users with a more efficient support model. This data driven approach enables us to propose a set of principles that are pertinent to the experiences of users and that we believe could vastly improve the SLA observed by the users.

Secure Cloud Computing with a Virtualized Network Infrastructure
Back to Program
Despite the rapid development in the field of cloud computing, security is still one of the major hurdles to cloud computing adoption. Most cloud services (e.g. Amazon EC2) are offered at low cost without much protection to users. At the other end of the spectrum, highly secured cloud services (e.g. Google "government cloud") are offered at much higher cost by using isolated hardware, facility, and administrators with security clearance. In this paper, we explore the "middle ground", where users can still share physical hardware resource, but user networks are isolated and accesses are controlled in the way similar to that in enterprise networks. We believe this covers the need for most enterprise and individual users. We propose an architecture that takes advantage of network virtualization and centralized controller. This architecture overcomes scalability limitations of prior solutions based on VLANs, and enables users to customize security policy settings the same way they control their on-site network.

Look Who's Talking: Discovering Dependencies between Virtual Machines Using CPU Utilization
Back to Program
A common problem experienced in datacenters and utility clouds is the lack of knowledge about the mappings of the services being offered to or run by external users to the sets of virtual machines (VMs) realizing them. This makes it difficult to manage VM ensembles to attain provider goals like minimizing the resources consumed by certain services or reducing the power drawn by datacenter machines. This paper presents the 'Look Who's Talking' (LWT) set of methods and framework for identifying inter-VM dependencies. LWT does not require services to be modified, or middleware or operating systems to be instrumented, but instead, operates in management VMs with privileged access to hypervisor-level information about current machine use. The current implementation of LWT has been integrated into the Xen hypervisor running across a small-scale prototype datacenter, for which experimental measurements show that it can effectively identify dependencies between VMs with an average of 97.15% overall accuracy rate, with zero knowledge of or modifications to applications or workloads and with minimal effect on system performance.

A Collaborative Monitoring Mechanism for Making a Multitenant Platform Accountable
Back to Program
Multitenancy becomes common as an increasing amount of applications runs in clouds, however, the certainty of running applications in a fully controlled administrative domain is lost in the move. How to ensure that the data and business logic are handled faithfully becomes an issue. We propose to maintain a state machine outside of a multitenant platform to make the platform accountable in this paper. We give a mechanism to support accountability for a multitenant database with a centralized external service. We also describe how to implement a decentralized virtual accountability service via collaborative monitoring. Finally, we discuss the characteristics of the mechanism through experiments in Amazon EC2.

Need help? Use our Contacts page.

Last changed: 25 June 2010 jp