NSDI '13 Technical Sessions

Full papers are available to symposium registrants immediately and to everyone beginning Wednesday, April 3, 2013. Everyone can view the abstracts immediately.

Proceedings Front Matter:
Cover Page | Title Page and List of Organizers | Table of Contents | Message from the Program Co-Chairs

Wednesday, April 3, 2013

8:45 a.m.–9:00 a.m.	Wednesday
Opening Remarks and Awards Program Co-Chairs: Nick Feamster, Georgia Tech, and Jeff Mogul, HP Labs Opening Remarks and Awards NSDI '13 Available Media
9:00 a.m.–10:15 a.m.	Wednesday
Software Defined Networking Session Chair: Dejan Kostić, Institute IMDEA Networks Composing Software Defined Networks Christopher Monsanto and Joshua Reich, Princeton University; Nate Foster, Cornell University; Jennifer Rexford and David Walker, Princeton University Awarded Community Award! Managing a network requires support for multiple concurrent tasks, from routing and trafﬁc monitoring, to access control and server load balancing. Software-Deﬁned Networking (SDN) allows applications to realize these tasks directly, by installing packet-processing rules on switches. However, today’s SDN platforms provide limited support for creating modular applications. This paper introduces new abstractions for building applications out of multiple, independent modules that jointly manage network trafﬁc. First, we deﬁne composition operators and a library of policies for forwarding and querying trafﬁc. Our parallel composition operator allows multiple policies to operate on the same set of packets, while a novel sequential composition operator allows one policy to process packets after another. Second, we enable each policy to operate on an abstract topology that implicitly constrains what the module can see and do. Finally, we deﬁne a new abstract packet model that allows programmers to extend packets with virtual ﬁelds that maybe used to associate packets with high-level meta-data. We realize these abstractions in Pyretic, an imperative, domain-speciﬁc language embedded in Python. Available Media VeriFlow: Verifying Network-Wide Invariants in Real Time Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, and P. Brighten Godfrey, University of Illinois at Urbana-Champaign Networks are complex and prone to bugs. Existing tools that check network conﬁguration ﬁles and the data-plane state operate ofﬂine at timescales of seconds to hours, and cannot detect or prevent bugs as they arise. Is it possible to check network-wide invariants in real time, as the network state evolves? The key challenge here is to achieve extremely low latency during the checks so that network performance is not affected. In this paper, we present a design, VeriFlow, which achieves this goal. VeriFlow is a layer between a software-deﬁned networking controller and network devices that checks for network-wide invariant violations dynamically as each forwarding rule is inserted, modiﬁed or deleted. VeriFlow supports analysis over multiple header ﬁelds, and an API for checking custom invariants. Based on a prototype implementation integrated with the NOX OpenFlow controller, and driven by a Mininet OpenFlow network and Route Views trace data, we ﬁnd that VeriFlow can perform rigorous checking within hundreds of microseconds per rule insertion or deletion. Available Media Software Deﬁned Trafﬁc Measurement with OpenSketch Minlan Yu, University of Southern California; Lavanya Jose, Princeton University; Rui Miao, University of Southern California Most network management tasks in software-deﬁned networks (SDN) involve two stages: measurement and control. While many efforts have been focused on network control APIs for SDN, little attention goes into measurement. The key challenge of designing a new measurement API is to strike a careful balance between generality (supporting a wide variety of measurement tasks) and efﬁciency (enabling high link speed and low cost). We propose a software deﬁned trafﬁc measurement architecture OpenSketch, which separates the measurement data plane from the control plane. In the data plane, OpenSketch provides a simple three-stage pipeline (hashing, ﬁltering, and counting), which can be implemented with commodity switch components and support many measurement tasks. In the control plane, OpenSketch provides a measurement library that automatically conﬁgures the pipeline and allocates resources for different measurement tasks. Our evaluations of real world packet traces, our prototype on NetFPGA, and the implementation of ﬁve measurement tasks on top of OpenSketch, demonstrate that OpenSketch is general, efﬁcient and easily programmable. Available Media
10:15 a.m.–10:45 a.m.	Wednesday
Break
10:45 a.m.–12:20 p.m.	Wednesday
Pervasive Computing Session Chair: Philip Levis, Stanford University V-edge: Fast Self-constructive Power Modeling of Smartphones Based on Battery Voltage Dynamics Fengyuan Xu, College of William and Mary; Yunxin Liu, Microsoft Research Asia; Qun Li, College of William and Mary; Yongguang Zhang, Microsoft Research Asia System power models are important for power management and optimization on smartphones. However, existing approaches for power modeling have several limitations. Some require external power meters, which is not convenient for people to use. Other approaches either rely on the battery current sensing capability, which is not available on many smartphones, or take a long time to generate the power model. To overcome these limitations, we propose a new way of generating power models from battery voltage dynamics, called V-edge. V-edge is self-constructive and does not require current-sensing. Most importantly, it is fast in model building. Our implementation supports both component level power models and per-application energy accounting. Evaluation results using various benchmarks and applications show that the V-edge approach achieves high power modeling accuracy, and is two orders of magnitude faster than existing self-modeling approaches requiring no current-sensing. Available Media eDoctor: Automatically Diagnosing Abnormal Battery Drain Issues on Smartphones Xiao Ma, University of Illinois at Urbana-Champaign and University of California, San Diego; Peng Huang and Xinxin Jin, University of California, San Diego; Pei Wang, Peking University; Soyeon Park, Dongcai Shen, Yuanyuan Zhou, Lawrence K. Saul, and Geoffrey M. Voelker, University of California, San Diego The past few years have witnessed an evolutionary change in the smartphone ecosystem. Smartphones have gone from closed platforms containing only pre-installed applications to open platforms hosting a variety of third-party applications. Unfortunately, this change has also led to a rapid increase in Abnormal Battery Drain (ABD) problems that can be caused by software defects or misconﬁguration. Such issues can drain a fully-charged battery within a couple of hours, and can potentially affect a signiﬁcant number of users. This paper presents eDoctor, a practical tool that helps regular users troubleshoot abnormal battery drain issues on smartphones. eDoctor leverages the concept of execution phases to capture an app’s time-varying behavior, which can then be used to identify an abnormal app. Based on the result of a diagnosis, eDoctor suggests the most appropriate repair solution to users. To evaluate eDoctor’s effectiveness, we conducted both in-lab experiments and a controlled user study with 31 participants and 17 real-world ABD issues together with 4 injected issues in 19 apps. The experimental results show that eDoctor can successfully diagnose 47 out of the 50 use cases while imposing no more than 1.5% of power overhead. Available Media ArrayTrack: A Fine-Grained Indoor Location System Jie Xiong and Kyle Jamieson, University College London With myriad augmented reality, social networking, and retail shopping applications all on the horizon for the mobile handheld, a fast and accurate location technology will become key to a rich user experience. When roaming outdoors, users can usually count on a clear GPS signal for accurate location, but indoors, GPS often fades, and so up until recently, mobiles have had to rely mainly on rather coarse-grained signal strength readings. What has changed this status quo is the recent trend of dramatically increasing numbers of antennas at the indoor access point, mainly to bolster capacity and coverage with multiple-input, multiple-output (MIMO) techniques. We thus observe an opportunity to revisit the important problem of localization with a fresh perspective. This paper presents the design and experimental evaluation of ArrayTrack, an indoor location system that uses MIMO-based techniques to track wireless clients at a very ﬁne granularity in real time, as they roam about a building. With a combination of FPGA and general purpose computing, we have built a prototype of the ArrayTrack system. Our results show that the techniques we propose can pinpoint 41 clients spread out over an indoor ofﬁce environment to within 23 centimeters median accuracy, with the system incurring just 100 milliseconds latency, making for the ﬁrst time ubiquitous real-time, ﬁne-grained location available on the mobile handset. Available Media Walkie-Markie: Indoor Pathway Mapping Made Easy Guobin Shen, Zhuo Chen, Peichao Zhang, Thomas Moscibroda, and Yongguang Zhang, Microsoft Research Asia We present Walkie-Markie — an indoor pathway mapping system that can automatically reconstruct internal pathway maps of buildings without any a-priori knowledge about the building, such as the ﬂoor plan or access point locations. Central to Walkie-Markie is a novel exploitation of the WiFi infrastructure to deﬁne landmarks (WiFi-Marks) to fuse crowdsourced user trajectories obtained from inertial sensors on users’ mobile phones. WiFi-Marks are special pathway locations at which the trend of the received WiFi signal strength changes from increasing to decreasing when moving along the pathway. By embedding these WiFi-Marks in a 2D plane using a newly devised algorithm and connecting them with calibrated user trajectories, Walkie-Markie is able to infer pathway maps with high accuracy. Our experiments demonstrate that Walkie-Markie is able to reconstruct a high-quality pathway map for a real ofﬁce-building ﬂoor after only 5-6 rounds of walks, with accuracy gradually improving as more user data becomes available.The maximum discrepancy between the inferred pathway map and the real one is within 3m and 2.8m for the anchor nodes and path segments, respectively. Available Media
12:20 p.m.–2:00 p.m.	Wednesday
Symposium Luncheon
2:00 p.m.–3:15 p.m.	Wednesday
Network Integrity Session Chair: Ethan Katz-Bassett, University of Southern California Real Time Network Policy Checking Using Header Space Analysis Peyman Kazemian, Michael Chang, and Hongyi Zeng, Stanford University; George Varghese, University of California, San Diego and Microsoft Research; Nick McKeown, Stanford University; Scott Whyte, Google Inc. Network state may change rapidly in response to customer demands, load conditions or conﬁguration changes. But the network must also ensure correctness conditions such as isolating tenants from each other and from critical services. Existing policy checkers cannot verify compliance in real time because of the need to collect “state” from the entire network and the time it takes to analyze this state. SDNs provide an opportunity in thisrespect as they provide a logically centralized view from which every proposed change can be checked for compliance with policy. But there remains the need for a fast compliance checker. Our paper introduces a real time policy checking tool called NetPlumber based on Header Space Analysis (HSA). Unlike HSA, however, NetPlumber incrementally checks for compliance of state changes, using a novel set of conceptual tools that maintain a dependency graph between rules. While NetPlumber is a natural ﬁt for SDNs, its abstract intermediate form is conceptually applicable to conventional networks as well. We have tested NetPlumber on Google’s SDN, the Stanford backbone and Internet 2. With NetPlumber, checking the compliance of a typical rule update against a single policy on these networks takes 50-500s on average. Available Media Ensuring Connectivity via Data Plane Mechanisms Junda Liu, Google Inc.; Aurojit Panda, University of California, Berkeley; Ankit Singla and Brighten Godfrey, University of Illinois at Urbana-Champaign; Michael Schapira, Hebrew University; Scott Shenker, University of California, Berkeley and International Computer Science Institute We typically think of network architectures as having two basic components: a data plane responsible for forwarding packets at line-speed, and a control plane that instantiates the forwarding state the data plane needs. With this separation of concerns, ensuring connectivity is the responsibility of the control plane. However, the control plane typically operates at timescales several orders of magnitude slower than the data plane, which means that failure recovery will always be slow compared to dataplane forwarding rates. In this paper we propose moving the responsibility for connectivity to the data plane. Our design, called Data-Driven Connectivity (DDC) ensures routing connectivity via data plane mechanisms. We believe this new separation of concerns — basic connectivity on the data plane, optimal paths on the control plane — will allow networks to provide a much higher degree of availability, while still providing ﬂexible routing control. Available Media Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju, Purdue University; Navendu Jain, Microsoft Research; Cristina Nita-Rotaru, Purdue University This paper presents NetSieve, a system that aims to do automated problem inference from network trouble tickets. Network trouble tickets are diaries comprising fixed fields and free-form text written by operators to document the steps while troubleshooting a problem. Unfortunately, while tickets carry valuable information for network management, analyzing them to do problem inference is extremely difficult—fixed fields are often inaccurate or incomplete, and the free-form text is mostly written in natural language. This paper takes a practical step towards automatically analyzing natural language text in network tickets to infer the problem symptoms, troubleshooting activities and resolution actions. Our system, NetSieve, combines statistical natural language processing (NLP), knowledge representation, and ontology modeling to achieve these goals. To cope with ambiguity in free-form text, NetSieve leverages learning from human guidance to improve its inference accuracy. We evaluate NetSieve on 10K+ tickets from a large cloud provider, and compare its accuracy using (a) an expert review, (b) a study with operators, and (c) vendor data that tracks device replacement and repairs. Our results show that NetSieve achieves 89%-100% accuracy and its inference output is useful to learn global problem trends. We have used NetSieve in several key network operations: analyzing device failure trends, understanding why network redundancy fails, and identifying device problem symptoms. Available Media
3:15 p.m.–4:15 p.m.	Wednesday
Poster Session with Refreshments
4:15 p.m.–5:50 p.m.	Wednesday
Data Centers Session Chair: Michael Piatek, Google Yank: Enabling Green Data Centers to Pull the Plug Rahul Singh, David Irwin, and Prashant Shenoy, University of Massachusetts Amherst; K.K. Ramakrishnan, AT&T Labs—Research Balancing a data center’s reliability, cost, and carbon emissions is challenging. For instance, data centers designed for high availability require a continuous ﬂow of power to keep servers powered on, and must limit their use of clean, but intermittent, renewable energy sources. In this paper, we present Yank, which uses a transient server abstraction to maintain server availability, while allowing data centers to “pull the plug” if power becomes unavailable. A transient server’s deﬁning characteristic is that it may terminate anytime after a brief advance warning period. Yank exploits the advance warning—on the order of a few seconds—to provide high availability cheaply and efﬁciently at large scales by enabling each backup server to maintain “live” memory and disk snapshots for many transient VMs. We implement Yank inside of Xen. Our experiments show that a backup server can concurrently support up to 15 transient VMs with minimal performance degradation with advance warnings as small as 10 seconds, even when VMs run memory-intensive interactive web applications. Available Media Scalable Rule Management for Data Centers Masoud Moshref and Minlan Yu, University of Southern California; Abhishek Sharma, University of Southern California and NEC Labs America; Ramesh Govindan, University of Southern California Cloud operators increasingly need more and more ﬁne-grained rules to better control individual network ﬂows for various trafﬁc management policies. In this paper, we explore automated rule management in the context of a system called vCRIB (a virtual Cloud Rule Information Base), which provides the abstraction of a centralized rule repository. The challenge in our approach is the design of algorithms that automatically off-load rule processing to overcome resource constraints on hypervisors and/or switches, while minimizing redirection trafﬁc overhead and responding to system dynamics. vCRIB contains novel algorithms for ﬁnding feasible rule placements and adapting trafﬁc overhead induced by rule placement in the face of trafﬁc changes and VM migration. We demonstrate that vCRIB can ﬁnd feasible rule placements with less than 10% trafﬁc overhead even in cases where the trafﬁc-optimal rule placement may be infeasible with respect to hypervisor CPU or memory constraints. Available Media Chatty Tenants and the Cloud Network Sharing Problem Hitesh Ballani, Keon Jang, and Thomas Karagiannis, Microsoft Research, Cambridge; Changhoon Kim, Windows Azure; Dinan Gunawardena and Greg O’Shea, Microsoft Research, Cambridge The emerging ecosystem of cloud applications leads to signiﬁcant inter-tenant communication across a datacenter’s internal network. This poses new challenges for cloud network sharing. Richer inter-tenant trafﬁc patterns make it hard to offer minimum bandwidth guarantees to tenants. Further, for communication between economically distinct entities, it is not clear whose payment should dictate the network allocation. Motivated by this, we study how a cloud network that carries both intra- and inter-tenant trafﬁc should be shared. We argue for network allocations to be dictated by the least-paying of communication partners. This, when combined with careful VM placement, achieves the complementary goals of providing tenants with minimum bandwidth guarantees while bounding their maximum network impact. Through a prototype deployment and large-scale simulations, we show that minimum bandwidth guarantees, apart from helping tenants achieve predictable performance, also improve overall datacenter throughput. Further, bounding a tenant’s maximum impact mitigates malicious behavior. Available Media Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica, University of California, Berkeley Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8 times slower than the median task in that job. Such stragglers increase the average job duration by 47%. This is because current mitigation techniques all involve an element of waiting and speculation. We instead propose full cloning of small jobs, avoiding waiting and speculation altogether. Cloning of small jobs only marginally increases utilization because workloads show that while the majority of jobs are small, they only consume a small fraction of the resources. The main challenge of cloning is, however, that extra clones can cause contention for intermediate data. We use a technique, delay assignment, which efﬁciently avoids such contention. Evaluation of our system, Dolly, using production workloads shows that the small jobs speedup by 34% to 46% after state-of-the-art mitigation techniques have been applied, using just 5% extra resources for cloning. Available Media

Thursday, April 4, 2013

9:00 a.m.–10:15 a.m.	Thursday
Substrate Session Chair: Michael J. Freedman, Princeton University Wire Speed Name Lookup: A GPU-based Approach Yi Wang, Tsinghua University; Yuan Zu, University of Science and Technology of China; Ting Zhang, Tsinghua University; Kunyang Peng and Qunfeng Dong, University of Science and Technology of China; Bin Liu, Wei Meng, and Huicheng Dai, Tsinghua University; Xin Tian and Zhonghu Xu, University of Science and Technology of China; Hao Wu, Tsinghua University; Di Yang, University of Science and Technology of China This paper studies the name lookup issue with longest preﬁx matching, which is widely used in URL ﬁltering, content routing/switching, etc. Recently Content-Centric Networking (CCN) has been proposed as a clean slate future Internet architecture to naturally ﬁt the content centric property of today’s Internet usage: instead of addressing end hosts, the Internet should operate based on the identity/name of contents. A core challenge and enabling technique in implementing CCN is exactly to perform name lookup for packet forwarding at wirespeed. In CCN, routing tables can be orders of magnitude larger than current IP routing tables, and content names are much longer and more complex than IP addresses. In pursuit of conquering this challenge, we conduct an implementation-based case study on wire speed name lookup, exploiting GPU’s massive parallel processing power. Extensive experiments demonstrate that our GPU-based name lookup engine can achieve 63.52M searches per second lookup throughput on large-scale name tables containing millions of name entries with a strict constraint of no more than the telecommunication level 100μs per-packet lookup latency. Our solution can be applied to contexts beyond CCN, such as search engines, content ﬁltering, and intrusion prevention/detection. Available Media SoNIC: Precise Realtime Software Access and Control of Wired Networks Ki Suh Lee, Han Wang, and Hakim Weatherspoon, Cornell University The physical and data link layers of the network stack contain valuable information. Unfortunately, a systems programmer would never know. These two layers are often inaccessible in software and much of their potential goes untapped. In this paper we introduce SoNIC, Software-deﬁned Network Interface Card, which provides access to the physical and data link layers in software by implementing them in software. In other words, by implementing the creation of the physical layer bitstream in software and the transmission of this bitstream in hardware, SoNIC provides complete control over the entire network stack in realtime. SoNIC utilizes commodity off-the-shelf multi-core processors to implement parts of the physical layer in software, and employs an FPGA board to transmit optical signal over the wire. Our evaluations demonstrate that SoNIC can communicate with other network components while providing realtime access to the entire network stack in software. As an example of SoNIC’s ﬁne-granularity control, it can perform precise network measurements, accurately characterizing network components such as routers, switches, and network interface cards. Further, SoNIC enables timing channels with nanosecond modulations that are undetectable in software. Available Media Split/Merge: System Support for Elastic Execution in Virtual Middleboxes Shriram Rajagopalan, IBM T. J. Watson Research Center and University of British Columbia; Dan Williams and Hani Jamjoom, IBM T. J. Watson Research Center; Andrew Warﬁeld, University of British Columbia Developing elastic applications should be easy. This paper takes a step toward the goal of generalizing elasticity by observing that a broadly deployed class of software—the network middlebox—is particularly well suited to dynamic scale. Middleboxes tend to achieve a clean separation between a small amount of per-ﬂow network state and a large amount of complex application logic. We present a state-centric, systems-level abstraction for elastic middleboxes called Split/Merge. A virtual middlebox that has appropriately classiﬁed its state (e.g., per-ﬂow state) can be dynamically scaled out (or in) by a Split/Merge system, but remains ignorant of the number of replicas in the system. Per-ﬂow state may be transparently split between many replicas or merged back into one, while the network ensures ﬂows are routed to the correct replica. As a result, Split/Merge enables load balanced elasticity. We have implemented a Split/Merge system, called FreeFlow, and ported Bro, an open-source intrusion detection system, to run on it. In controlled experiments, FreeFlow enables a 25% reduction in maximum latency while eliminating hotspots during scale-out and a 50% quicker scale-in than standard approaches Available Media
10:15 a.m.–10:45 a.m.	Thursday
Break
10:45 a.m.–12:20 p.m.	Thursday
Wireless Session Chair: Brad Karp, University College London PinPoint: Localizing Interfering Radios Kiran Joshi, Steven Hong, and Sachin Katti, Stanford University This paper presents PinPoint, a technique for localizing rogue interfering radios that adhere to standard protocols in the in hospitable ISM band without any cooperation from the interfering radio. PinPoint is designed to be incrementally deployed on top of existing 802.11 WLAN infrastructure, and used by network administrators to identify and troubleshoot sources of interference which may be disrupting the network. PinPoint’s key contribution is a novel algorithm that accurately computes the line of sight angle of arrival (AoA) and cyclic signal strength indicator (CSSI) of the target interfering signal at all APs, even when the line of sight (LoS) component is buried by stronger multipath components, interference and noise. PinPoint leverages this algorithm to design an optimization technique, which can localize interfering radios and simultaneously identify the type of interference. Unlike several localization techniques which require extensive pre-deployment calibration (e.g. RFFingerprinting), PinPoint requires very little calibration by the network administrator, and uses a novel algorithm to self-initialize its bearings, even if the locations of some AP are initially unknown and are oriented randomly. We implement PinPoint on WARP software radios and deploy in an indoor testbed spanning an entire ﬂoor of our department. We compare PinPoint with the best known prior RSSI and MUSIC-AoA based approaches and show that PinPoint achieves a median localization error of 0:97 meters, which is around three times lower compared to the RSSI and MUSIC-AoA based approaches. Available Media SloMo: Downclocking WiFi Communication Feng Lu, Geoffrey M. Voelker, and Alex C. Snoeren, University of California, San Diego As manufacturers continue to improve the energy efﬁciency of battery-powered wireless devices, WiFi has become one of—if not the—most signiﬁcant power draws. Hence, modern devices fastidiously manage their radios, shifting into low-power listening or sleep states whenever possible. The fundamental limitation with this approach, however, is that the radio is incapable of transmitting or receiving unless it is fully powered. Unfortunately, applications found on today’s wireless devices often require frequent access to the channel. We observe, however, that many of these same applications have relatively low bandwidth requirements. Leveraging the inherent sparsity in Direct Sequence Spread Spectrum (DSSS) modulation, we propose a transceiver design based on compressive sensing that allows WiFi devices to operate their radios at lower clock rates when receiving and transmitting at low bit rates, thus consuming less power. We have implemented our 802.11b-based design in a software radio platform, and show that it seamlessly interacts with existing WiFi deployments. Our prototype remains fully functional when the clock rate is reduced by a factor of ﬁve, potentially reducing power consumption by over 30%. Available Media Splash: Fast Data Dissemination with Constructive Interference in Wireless Sensor Networks Manjunath Doddavenkatappa, Mun Choon Chan, and Ben Leong, National University of Singapore It is well-known that the time taken for disseminating a large data object over a wireless sensor network is dominated by the overhead of resolving the contention for the underlying wireless channel. In this paper, we propose a new dissemination protocol called Splash, that eliminates the need for contention resolution by exploiting constructive interference and channel diversity to effectively create fast and parallel pipelines over multiple paths that cover all the nodes in a network. We call this tree pipelining. In order to ensure high reliability, Splash also incorporates several techniques, including exploiting transmission density diversity, opportunistic overhearing, channel-cycling and XOR coding. Our evaluation results on two large-scale testbeds show that Splash is more than an order of magnitude faster than state-of-the-art dissemination protocols and achieves a reduction in data dissemination time by a factor of more than 20 compared to DelugeT2. Available Media Expanding Rural Cellular Networks with Virtual Coverage Kurtis Heimerl and Kashif Ali, University of California, Berkeley; Joshua Blumenstock, University of Washington; Brian Gawalt and Eric Brewer, University of California, Berkeley Awarded Community Award! The cellular system is the world’s largest network, providing service to over ﬁve billion people. Operators of these networks face fundamental trade-offs in coverage, capacity and operating power. These trade-offs, when coupled with the reality of infrastructure in poorer areas, mean that upwards of a billion people lack access to this fundamental service. Limited power infrastructure, in particular, hampers the economic viability of wide-area rural coverage. In this work, we present an alternative system for implementing large-scale rural cellular networks. Rather than providing constant coverage, we instead provide virtual coverage: coverage that is only present when requested. Virtual coverage powers the network on demand, which reduces overall power draw, lowers the cost of rural connectivity, and enables new markets. We built a prototype cellular system utilizing virtual coverage by modifying a GSM base station and a set of Motorola phones to support making and receiving calls under virtual coverage. To support the billions of already-deployed devices, we also implemented a small radio capable of adding backwards-compatible support for virtual coverage to existing GSM handsets. We demonstrate a maximum of 84% power and cost savings from using virtual coverage. We also evaluated virtual coverage by simulating the potential power savings on real-world cellular networks in two representative developing counties: one in sub-Saharan Africa and one in South Asia. Simulating power use based on real-world call records obtained from local mobile operators, we ﬁnd our system saves 21-34% of power draw at night, and 7-21% during the day. We expect evenmore savings in areas currently off the grid. These results demonstrate the feasibility of implementing such a system, particularly in areas with solar or otherwise intermittent power sources. Available Media
12:20 p.m.–2:00 p.m.	Thursday
Lunch On Your Own
2:00 p.m.–3:15 p.m.	Thursday
Performance Session Chair: James Mickens, Microsoft Research EyeQ: Practical Network Performance Isolation at the Edge Vimalkumar Jeyakumar, Stanford University; Mohammad Alizadeh, Stanford University and Insieme Networks; David Mazières and Balaji Prabhakar, Stanford University; Changhoon Kim and Albert Greenberg, Windows Azure The datacenter network is shared among untrusted tenants in a public cloud, and hundreds of services in a private cloud. Today we lack ﬁne-grained control over network bandwidth partitioning across tenants. In this paper we present EyeQ, a simple and practical system that provides tenants with bandwidth guarantees as if their endpoints were connected to a dedicated switch. To realize this goal, EyeQ leverages the high bisection bandwidth in a datacenter fabric and enforces admission control on trafﬁc, regardless of the tenant transport protocol. We show that this pushes bandwidth contentionto the network’s edge, enabling EyeQ to support end-to-end minimum bandwidth guarantees to tenant endpoints in a simple and scalable manner at the servers. EyeQ requires no changes to applications and is deployable with support from the network available today. We evaluate EyeQ with an efﬁcient software implementation at 10Gb/s speeds using unmodiﬁed applications and adversarial trafﬁc patterns. Our evaluation demonstrates EyeQ’s promise of predictable network performance isolation. For instance, even with an adversarial tenant with bursty UDP trafﬁc, EyeQ is able to maintain the 99.9th percentile latency for a collocated memcached application close to that of a dedicated deployment. Available Media Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd and Michael J. Freedman, Princeton University; Michael Kaminsky, Intel Labs; David G. Andersen, Carnegie Mellon University We present the ﬁrst scalable, geo-replicated storage system that guarantees low latency, offers a rich data model, and provides “stronger” semantics. Namely, all client requests are satisﬁed in the local datacenter in which they arise; the system efﬁciently supports useful data model abstractions such as column families and counter columns; and clients can access data in a causally consistent fashion with read-only and write-only transactional support, even for keys spread across many servers. The primary contributions of this work are enabling scalable causal consistency for the complex column family data model, as well as novel, non-blocking algorithms for both read-only and write-only transactions. Our evaluation shows that our system, Eiger, achieves low latency (single-ms), has throughput competitive with eventually-consistent and non-transactional Cassandra (less than 7% overhead for one of Facebook’s real-world workloads), and scales out to large clusters almost linearly (averaging 96% increases up to 128 server clusters). Available Media Bobtail: Avoiding Long Tails in the Cloud Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey, University of Michigan Highly modular data center applications such as Bing, Facebook, and Amazon’s retail platform are known to be susceptible to long tails in response times. Services such as Amazon’s EC2 have proven attractive platforms for building similar applications. Unfortunately, virtualization used in such platforms exacerbates the long tail problem by factors of two to four. Surprisingly, we ﬁnd that poor response times in EC2 are a property of nodes rather than the network, and that this property of nodes is both pervasive throughout EC2 and persistent over time. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that proactively detects and avoids these bad neighboring VMs without significantly penalizing node instantiation. With Bobtail, common communication patterns beneﬁt from reductions of up to 40% in 99.9th percentile response times. Available Media
3:15 p.m.–3:45 p.m.	Thursday
Break
3:45 p.m.–5:20 p.m.	Thursday
Big Data Session Chair: George Porter, University of California, San Diego Rhea: Automatic Filtering for Unstructured Cloud Storage Christos Gkantsidis, Dimitrios Vytiniotis, Orion Hodson, Dushyanth Narayanan, Florin Dinu, and Antony Rowstron, Microsoft Research, Cambridge Unstructured storage and data processing using platforms such as MapReduce are increasingly popular for their simplicity, scalability, and ﬂexibility. Using elastic cloud storage and computation makes them even more attractive. However cloud providers such as Amazon and Windows Azure separate their storage and compute resources even within the same data center. Transferring data from storage to compute thus uses core data center network bandwidth, which is scarce and oversubscribed. As the data is unstructured, the infrastructure cannot automatically apply selection, projection, or other ﬁltering predicates at the storage layer. The problem is even worse if customers want to use compute resources on one provider but use data stored with other provider(s). The bottleneck is now the WAN link which impacts performance but also incurs egress bandwidth charges. This paper presents Rhea, a system to automatically generate and run storage-side data ﬁlters for unstructured and semi-structured data. It uses static analysis of application code to generate ﬁlters that are safe, stateless, side effect free, best effort, and transparent to both storage and compute layers. Filters never remove data that is used by the computation. Our evaluation shows that Rhea ﬁlters achieve a reduction in data transfer of 2x–20,000x, which reduces job run times by up to 5x and dollar costs for cross-cloud computations by up to 13x. Available Media Robustness in the Salus Scalable Block Store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin, The University of Texas at Austin This paper describes Salus, a block store that seeks to maximize simultaneously both scalability and robustness. Salus provides strong end-to-end correctnessguarantees for read operations, strict ordering guarantees for write operations, and strong durability and availability guarantees despite a wide range of server failures (including memory corruptions, disk corruptions, ﬁrmware bugs, etc.). Such increased protection does not come at the cost of scalability or performance: indeed, Salus often actually outperforms HBase (the codebase from which Salus descends). For example, Salus’ active replication allows it to halve network bandwidth while increasing aggregate write throughput by a factor of 1.74 compared to HBase in a well-provisioned system. Available Media MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing Bin Fan and David G. Andersen, Carnegie Mellon University; Michael Kaminsky, Intel Labs This paper presents a set of architecturally and workload inspired algorithmic and engineering improvements to the popular Memcached system that substantially improve both its memory efficiency and throughput. These techniques—optimistic cuckoo hashing, a compact LRU-approximating eviction algorithm based uponCLOCK, and comprehensive implementation of optimistic locking—enable the resulting system to use 30% less memory for small key-value pairs, and serve up to 3x as many queries per second over the network. We have implemented these modiﬁcations in a system we call MemC3—Memcached with CLOCK and Concurrent Cuckoo hashing—but believe that they also apply more generally to many of today’s read-intensive, highly concurrent networked storage and caching systems. Available Media Scaling Memcache at Facebook Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani, Facebook Inc. Memcached is a well known, simple, in memory caching solution. This paper describes how Facebook leverages memcached as a building block to construct and scale a distributed key-value store that supports the world’s largest social network. Our system handles billions of requests per second and holds trillions of items to deliver a rich experience for over a billion users around the world. Available Media
6:00 p.m.–7:30 p.m.	Thursday
Poster and Demo Session and Reception Matthew Caesar, University of Illinois at Urbana-Champaign

Friday, April 5, 2013

9:00 a.m.–10:15 a.m.	Friday
Reliability Session Chair: Dave Levin, University of Maryland F10: A Fault-Tolerant Engineered Network Vincent Liu, Daniel Halperin, Arvind Krishnamurthy, and Thomas Anderson, University of Washington Awarded Best Paper! The data center network is increasingly a cost, reliability and performance bottleneck for cloud computing. Although multi-tree topologies can provide scalable bandwidth and traditional routing algorithms can provide eventual fault tolerance, we argue that recovery speed can be dramatically improved through the co-design of the network topology, routing algorithm and failure detector. We create an engineered network and routing protocol that directly address the failure characteristics observed in data centers. At the core of our proposal is a novel network topology that has many of the same desirable properties as FatTrees, but with much better fault recovery properties. We then create a series of failover protocols that beneﬁt from this topology and are designed to cascade and complement each other. The resulting system, F10, can almost instantaneously reestablish connectivity and load balance, even in the presence of multiple failures. Our results show that following network link and switch failures, F10 has less than 1/7th the packet loss of current schemes. A trace-driven evaluation of MapReduce performance shows that F10’s lower packet loss yields a median application-level 30% speedup. Available Media LOUP: The Principles and Practice of Intra-Domain Route Dissemination Nikola Gvozdiev, Brad Karp, and Mark Handley, University College London Under misconﬁguration or topology changes, iBGP with route reﬂectors exhibits a variety of ills, including routing instability, transient loops, and routing failures. In this paper, we consider the intra-domain route dissemination problem from ﬁrst principles, and show that these pathologies are not fundamental–rather, they are artifacts of iBGP. We propose the Simple Ordered Update Protocol (SOUP) and Link-Ordered Update Protocol (LOUP), clean-slate dissemination protocols for external routes that do not create transient loops, make stable route choices in the presence of failures, and achieve policy compliant routing without any conﬁguration. We prove SOUP cannot loop, and demonstrate both protocols’ scalability and correctness in simulation and through measurements of a Quagga-based implementation. Available Media Improving Availability in Distributed Systems with Failure Informers Joshua B. Leners and Trinabh Gupta, The University of Texas at Austin; Marcos K. Aguilera, Microsoft Research Silicon Valley; Michael Walfish, The University of Texas at Austin This paper addresses a core question in distributed systems: how should applications be notiﬁed of failures? When a distributed system acts on failure reports, the system’s correctness and availability depend on the granularity and semantics of those reports. The system’s availability also depends on coverage (failures are reported), accuracy (reports are justiﬁed), and timeliness (reports come quickly). This paper describes Pigeon, a failure reporting service designed to enable high availability in the applications that use it. Pigeon exposes a new abstraction, called a failure informer, which allows applications to take informed, application-speciﬁc recovery actions, and which encapsulates uncertainty, allowing applications to proceed safely in the presence of doubt. Pigeon also signiﬁcantly improves over the previous state of the art in the three-way trade-off among coverage, accuracy, and timeliness. Available Media
10:15 a.m.–10:45 a.m.	Friday
Break
10:45 a.m.–12:20 p.m.	Friday
Applications Session Chair: Rebecca Isaacs, Microsoft Research BOSS: Building Operating System Services Stephen Dawson-Haggerty, Andrew Krioukov, Jay Taneja, Sagar Karandikar, Gabe Fierro, Nikita Kitaev, and David Culler, University of California, Berkeley Commercial buildings are attractive targets for introducing innovative cyber-physical control systems, because they are already highly instrumented distributed systems which consume large quantities of energy. However, they are not currently programmable in a meaningful sense because each building is constructed with vertically integrated, closed subsystems and without uniform abstractions to write applications against. We develop a set of operating system services called BOSS, which supports multiple portable, fault-tolerant applications on top of the distributed physical resources present in large commercial buildings. We evaluate our system based on lessons learned from deployments of many novel applications in our test building, a four-year-old, 140,000sf building with modern digital controls, as well as partial deployments at other sites. Available Media Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks Keith Winstein, Anirudh Sivaraman, and Hari Balakrishnan, M.I.T. Computer Science and Artiﬁcial Intelligence Laboratory Sprout is an end-to-end transport protocol for interactive applications that desire high throughput and low delay. Sprout works well over cellular wireless networks, where link speeds change dramatically with time, and current protocols build up multi-second queues in network gateways. Sprout does not use TCP-style reactive congestion control; instead the receiver observes the packet arrival times to infer the uncertain dynamics of the network path. This inference is used to forecast how many bytes may be sent by the sender, while bounding the risk that packets will be delayed inside the network for too long. In evaluations on traces from four commercial LTE and 3G networks, Sprout, compared with Skype, reduced self-inﬂicted end-to-end delay by a factor of 7.9 and achieved 2.2 the transmitted bit rate on average. Compared with Google’s Hangout, Sprout reduced delay by a factor of 7.2 while achieving 4.4 the bit rate, and compared with Apple’s Facetime, Sprout reduced delay by a factor of 8.7 with 1.9 the bit rate. Although it is end-to-end, Sprout matched or outperformed TCP Cubic running over the CoDel active queue management algorithm, which requires changes to cellular carrier equipment to deploy. We also tested Sprout as a tunnel to carry competing interactive and bulk trafﬁc (Skype and TCP Cubic), and found that Sprout was able to isolate client application ﬂows from one another. Available Media Demystifying Page Load Performance with WProf Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David Wetherall, University of Washington Web page load time is a key performance metric that many techniques aim to reduce. Unfortunately, the complexity of modern Web pages makes it difﬁcult to identify performance bottlenecks. We present WProf, a lightweight in-browser proﬁler that produces a detailed dependency graph of the activities that make up a pageload. WProf is based on a model we developed to capture the constraints between network load, page parsing, JavaScript/CSS evaluation, and rendering activity in popular browsers. We combine WProf reports with critical path analysis to study the page load time of 350 Web pages under a variety of settings including the use of end-host caching, SPDY instead of HTTP, and the mod pagespeed server extension. We ﬁnd that computation is a signiﬁcant factor that makes up as much as 35% of the critical path, and that synchronous JavaScript plays a signiﬁcant role in page load time by blocking HTML parsing. Caching reduces page load time, but the reduction is not proportional to the number of cached objects, because most object loads are not on the critical path. SPDY reduces page load time only for networks with high RTTs and mod_pagespeed helps little on an average page. Available Media Dasu: Pushing Experiments to the Internet’s Edge Mario A. Sánchez, John S. Otto, and Zachary S. Bischof, Northwestern University; David R. Choffnes, University of Washington; Fabián E. Bustamante, Northwestern University; Balachander Krishnamurthy and Walter Willinger, AT&T Labs—Research We present Dasu, a measurement experimentation platform for the Internet’s edge. Dasu supports both controlled network experimentation and broadband characterization, building on public interest on the latter to gain the adoption necessary for the former. We discuss some of the challenges we faced building a platform for the Internet’s edge, describe our current design and implementation, and illustrate the unique perspective it brings to Internet measurement. Dasu has been publicly available since July 2010 and has been installed by over 90,000 users with a heterogeneous set of connections spreading across 1,802 networks and 147 countries. Available Media
12:20 p.m.–2:00 p.m.	Friday
Lunch On Your Own
2:00 p.m.–3:15 p.m.	Friday
Security and Privacy Session Chair: Krishna Gummadi, Max Planck Institute for Software Systems (MPI-SWS) πBox: A Platform for Privacy-Preserving Apps Sangmin Lee, Edmund L. Wong, Deepak Goel, Mike Dahlin, and Vitaly Shmatikov, The University of Texas at Austin We present πBox, a new application platform that prevents apps from misusing information about their users. To strike a useful balance between users’ privacy and apps’ functional needs, πBox shifts much of the responsibility for protecting privacy from the app and its users to the platform itself. To achieve this, πBox deploys (1) a sandbox that spans the user’s device and the cloud, (2) specialized storage and communication channels that enable common app functionalities, and (3) an adaptation of recent theoretical algorithms for differential privacyunder continual observation. We describe a prototype implementation of πBox and show how it enables a wide range of useful apps with minimal performance overhead and without sacriﬁcing user privacy. Available Media P3: Toward Privacy-Preserving Photo Sharing Moo-Ryong Ra, Ramesh Govindan, and Antonio Ortega, University of Southern California With increasing use of mobile devices, photo sharing services are experiencing greater popularity. Aside from providing storage, photo sharing services enable bandwidth-efﬁcient downloads to mobile devices by performing server-side image transformations (resizing, cropping). On the ﬂip side, photo sharing services have raised privacy concerns such as leakage of photos to unauthorized viewers and the use of algorithmic recognition technologies by providers. To address these concerns, we propose a privacy-preserving photo encoding algorithm that extracts and encrypts a small, but significant, component of the photo, while preserving the remainder in a public, standards-compatible, part. These two components can be separately stored. This technique signiﬁcantly reduces the accuracy of automated detection and recognition on the public part, while preserving the ability of the provider to perform server-side transformations to conserve download bandwidth usage. Our prototype privacy-preserving photo sharing system, P3, works with Facebook, and can be extended to other services as well. P3 requires no changes to existing services or mobile application software, and adds minimal photo storage overhead. Available Media Embassies: Radically Refactoring the Web Jon Howell, Bryan Parno, and John R. Douceur, Microsoft Research Awarded Best Paper! Web browsers ostensibly provide strong isolation for the client-side components of web applications. Unfortunately, this isolation is weak in practice; as browsers add increasingly rich APIs to please developers, these complex interfaces bloat the trusted computing base and erode cross-app isolation boundaries. We reenvision the web interface based on the notion of a pico-datacenter, the client-side version of a shared server datacenter. Mutually untrusting vendors run their code on the user’s computer in low-level native code containers that communicate with the outside world only via IP. Just as in the cloud datacenter, the simple semantics makes isolation tractable, yet native code gives vendors the freedom to run any software stack. Since the datacenter model is designed to be robust to malicious tenants, it is never dangerous for the user to click a link and invite a possibly-hostile party onto the client. Available Media