Enforcing {Network-Wide} Policies in the Presence of Dynamic Middlebox Actions using {FlowTags}

Seyed Kaveh Fayazbakhsh; Luis Chiang; Vyas Sekar; Minlan Yu; Jeffrey C. Mogul; Dongsu Han; KyoungSoo Park; Osama Khan; Krishna Nareddy; Ethan Jackson; Andrew Lambeth; Romain Lenglet; Shih-Hao Li; Amar Padmanabhan; Justin Pettit; Ben Pfaff; Rajiv Ramanathan; Scott Shenker; Alan Shieh; Jeremy Stribling; Pankaj Thakkar; Dan Wendlandt; Alexander Yip; Ronghua Zhang

twitter

Technical Sessions

To access a presentation's content, please click on its title below.

The conference papers are available to registered attendees immediately and to everyone beginning Wednesday, April 2, 2014. Everyone can view the abstracts and the proceedings front matter immediately.

Proceedings Front Matter:
Covers | Title Page and List of Organizers | Table of Contents | Message from the Program Co-Chairs

Full Proceedings PDFs
NSDI '14 Full Proceedings (PDF)
NSDI '14 Proceedings Interior (PDF, best for mobile devices)
NSDI '14 Proceedings Errata Slip
NSDI '14 Proceedings Errata Slip (revised 4/1/14)

Full Proceedings ePub (for iPad and most eReaders)
NSDI '14 Full Proceedings (ePub)
NSDI '14 Full Proceedings Errata Slip (ePub)

Full Proceedings Mobi (for Kindle)
NSDI '14 Full Proceedings (Mobi)
NSDI '14 Full Proceedings Errata Slip (Mobi)

Download Proceedings Archive (Conference Attendees Only)

Attendee Files

NSDI '14 Proceedings Archive

All sessions will take place in the Seattle Ballroom, Fourth Floor, unless otherwise noted.

Wednesday, April 2, 2014

8:15 a.m.–8:45 a.m.	Wednesday
Continental Breakfast Belltown Foyer, Fourth Floor
8:45 a.m.–9:00 a.m.	Wednesday
Opening Remarks and Awards Session Chairs: Ratul Mahajan, Microsoft Research Redmond, and Ion Stoica, University of California, Berkeley NSDI '14 Opening Remarks and Awards Available Media Read more about NSDI '14 Opening Remarks and Awards
9:00 a.m.–10:15 a.m.	Wednesday
Datacenter Networks Session Chair: Changhoon Kim, Microsoft Circuit Switching Under the Radar with REACToR He Liu, Feng Lu, Alex Forencich, Rishi Kapoor, Malveeka Tewari, Geoffrey M. Voelker, George Papen, Alex C. Snoeren, and George Porter, University of California, San Diego The potential advantages of optics at high link speeds have led to significant interest in deploying optical switching technology in data-center networks. Initial efforts have focused on hybrid approaches that rely on millisecond-scale circuit switching in the core of the network, while maintaining the flexibility of electrical packet switching at the edge. Recent demonstrations of microsecond-scale optical circuit switches motivate considering circuit switching for more dynamic traffic such as that generated from a top-of-rack (ToR) switch. Based on these technology trends, we propose a prototype hybrid ToR, called REACToR, which utilizes a combination of packet switching and circuit switching to appear to end-hosts as a packet-switched ToR. In this paper, we describe a prototype REACToR control plane which synchronizes end host transmissions with end-to-end circuit assignments. This control plane can react to rapid, bursty changes in the traffic from end hosts on a time scale of 100s of microseconds, several orders of magnitude faster than previous hybrid approaches. Using the experimental data from a system of eight end hosts, we calibrate a hybrid network simulator and use this simulator to predict the performance of larger-scale hybrid networks. Available Media Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center Peng Cheng, Fengyuan Ren, Ran Shu, and Chuang Lin, Tsinghua University An increasing number of TCP performance issues including TCP Incast, TCP Outcast, and long query completion times are common in large-scale data centers. We demonstrate that the root cause of these problems is that existing techniques are unable to maintain self-clocking or to achieve accurate and rapid packet loss notification. We present cutting payload (CP), a mechanism that simply drops a packet’s payload at an overloaded switch, and a SACK-like precise ACK (PACK) mechanism to accurately inform senders about lost packets. Experiments demonstrate that CP successfully addresses the root cause of TCP performance issues. Furthermore, CP works well with other TCP variants used in data center networks. Available Media High Throughput Data Center Topology Design Ankit Singla, P. Brighten Godfrey, and Alexandra Kolla, University of Illinois at Urbana–Champaign With high throughput networks acquiring a crucial role in supporting data-intensive applications, a variety of data center network topologies have been proposed to achieve high capacity at low cost. While this work explores a large number of design points, even in the limited case of a network of identical switches, no proposal has been able to claim any notion of optimality. The case of heterogeneous networks, incorporating multiple line-speeds and port-counts as data centers grow over time, introduces even greater complexity. In this paper, we present the first non-trivial upperbound on network throughput under uniform traffic patterns for any topology with identical switches. We then show that random graphs achieve throughput surprisingly close to this bound, within a few percent at the scale of a few thousand servers. Apart from demonstrating that homogeneous topology design may be reaching its limits, this result also motivates our use of random graphs as building blocks for design of heterogeneous networks. Given a heterogeneous pool of network switches, we explore through experiments and analysis, how the distribution of servers across switches and the interconnection of switches affect network throughput. We apply these insights to a real-world heterogeneous data center topology, VL2, demonstrating as much as 43% higher throughput with the same equipment. Available Media
10:15 a.m.–10:45 a.m.	Wednesday
Break with Refreshments Belltown Foyer, Fourth Floor
10:45 a.m.–12:20 p.m.	Wednesday
Debugging Complex Systems Session Chair: KyoungSoo Park, KAIST Adtributor: Revenue Debugging in Advertising Systems Ranjita Bhagwan, Rahul Kumar, Ramachandran Ramjee, George Varghese, Surjyakanta Mohapatra, Hemanth Manoharan, and Piyush Shah, Microsoft Advertising (ad) revenue plays a vital role in supporting free websites. When the revenue dips or increases sharply, ad system operators must find and fix the rootcause if actionable, for example, by optimizing infrastructure performance. Such revenue debugging is analogous to diagnosis and root-cause analysis in the systems literature but is more general. Failure of infrastructure elements is only one potential cause; a host of other dimensions (e.g., advertiser, device type) can be sources of potential causes. Further, the problem is complicated by derived measures such as costs-per-click that are also tracked along with revenue. Our paper takes the first systematic look at revenue debugging. Using the concepts of explanatory power, succinctness, and surprise, we propose a new multidimensional root-cause algorithm for fundamental and derived measures of ad systems to identify the dimension mostly likely to blame. Further, we implement the attribution algorithm and a visualization interface in a tool called the Adtributor to help troubleshooters quickly identify potential causes. Based on several case studies on a very large ad system and extensive evaluation, we show that the Adtributor has an accuracy of over 95% and helps cut down troubleshooting time by an order of magnitude. Available Media DECAF: Detecting and Characterizing Ad Fraud in Mobile Apps Bin Liu, University of Southern California; Suman Nath, Microsoft Research; Ramesh Govindan, University of Southern California; Jie Liu, Microsoft Research Ad networks for mobile apps require inspection of the visual layout of their ads to detect certain types of placement frauds. Doing this manually is error prone, and does not scale to the sizes of today’s app stores. In this paper, we design a system called DECAF to automatically discover various placement frauds scalably and effectively. DECAF uses automated app navigation, together with optimizations to scan through a large number of visual elements within a limited time. It also includes a framework for efficiently detecting whether ads within an app violate an extensible set of rules that govern ad placement and display. We have implemented DECAF for Windows-based mobile platforms, and applied it to 1,150 tablet apps and 50,000 phone apps in order to characterize the prevalence of ad frauds. DECAF has been used by the ad fraud team in Microsoft and has helped find many instances of ad frauds. Available Media I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, David Mazières, and Nick McKeown, Stanford University The complexity of networks has outpaced our tools to debug them; today, administrators use manual tools to diagnose problems. In this paper, we show how packet histories—the full stories of every packet's journey through the network—can simplify network diagnosis. To demonstrate the usefulness of packet histories and the practical feasibility of constructing them, we built NetSight, an extensible platform that captures packet histories and enables applications to concisely and flexibly retrieve packet histories of interest. Atop NetSight, we built four applications that illustrate its flexibility: an interactive network debugger, a live invariant monitor, a path-aware history logger, and a hierarchical network profiler. On a single modern multi-core server, NetSight can process packet histories for the traffic of multiple 10 Gb/s links. For larger networks, NetSight scales linearly with additional servers and scales even further with straightforward additions to hardware- and hypervisor-based switches. Available Media Libra: Divide and Conquer to Verify Forwarding Tables in Huge Networks Hongyi Zeng, Stanford University; Shidong Zhang and Fei Ye, Google; Vimalkumar Jeyakumar, Stanford University; Mickey Ju and Junda Liu, Google; Nick McKeown, Stanford University; Amin Vahdat, Google and University of California, San Diego Data center networks often have errors in the forwarding tables, causing packets to loop indefinitely, fall into black-holes or simply get dropped before they reach the correct destination. Finding forwarding errors is possible using static analysis, but none of the existing tools scale to a large data center network with thousands of switches and millions of forwarding entries. Worse still, in a large data center network the forwarding state is constantly in flux, which makes it hard to take an accurate snapshot of the state for static analysis. We solve these problems with Libra, a new tool for verifying forwarding tables in very large networks. Libra runs fast because it can exploit the scaling properties of MapReduce. We show how Libra can take an accurate snapshot of the forwarding state 99.9% of the time, and knows when the snapshot cannot be trusted. We show results for Libra analyzing a 10,000 switch Available Media
12:20 p.m.–2:00 p.m.	Wednesday
Symposium Luncheon Emerald Ballroom, Third Floor
2:00 p.m.–3:15 p.m.	Wednesday
Software Verification and Testing Session Chair: Minlan Yu, University of Southern California Software Dataplane Verification Mihai Dobrescu and Katerina Argyraki, École Polytechnique Fédérale de Lausanne *Awarded Best Paper!* Software dataplanes are emerging as an alternative to traditional hardware switches and routers, promising programmability and short time to market. These advantages are set against the risk of disrupting the network with bugs, unpredictable performance, or security vulnerabilities. We explore the feasibility of verifying software dataplanes to ensure smooth network operation. For general programs, verifiability and performance are competing goals; we argue that software dataplanes are different—we can write them in a way that enables verification and preserves performance. We present a verification tool that takes as input a software dataplane, written in a way that meets a given set of conditions, and (dis)proves that the dataplane satisfies crash-freedom, bounded-execution, and filtering properties. We evaluate our tool on stateless and simple stateful Click pipelines; we perform complete and sound verification of these pipelines within tens of minutes, whereas a state-of-the art general-purpose tool fails to complete the same task within several hours. Available Media NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang, Polytechnic Institute of New York University and University of British Columbia; Eleni Gessiou, Polytechnic Institute of New York University; Steven Portzer, University of Washington; Fraida Fund and Monzur Muhammad, Polytechnic Institute of New York University; Ivan Beschastnikh, University of British Columbia; Justin Cappos, Polytechnic Institute of New York University This paper introduces NetCheck, a tool designed to diagnose network problems in large and complex applications. NetCheck relies on blackbox tracing mechanisms, such as strace, to automatically collect sequences of network system call invocations generated by the application hosts. NetCheck performs its diagnosis by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from expected network semantics. Our evaluation demonstrates that NetCheck is able to diagnose failures in popular and complex applications without relying on any application- or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/ reconnection, and platform portability issues. In a more targeted evaluation, NetCheck correctly detects over 95% of the network problems we found from bug trackers of projects like Python, Apache, and Ruby. When applied to traces of faults reproduced in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. Additionally, NetCheck is efficient and can process a GB-long trace in about 2 minutes. Available Media Exalt: Empowering Researchers to Evaluate Large-Scale Storage Systems Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and Mike Dahlin, The University of Texas at Austin This paper presents Exalt, a library that gives back to researchers the ability to test the scalability of today’s large storage systems. To that end, we introduce Tardis, a data representation scheme that allows data to be identified and efficiently compressed even at low-level storage layers that are not aware of the semantics and formatting used by higher levels of the system. This compression enables a high degree of node colocation, which makes it possible to run large-scale experiments on as few as a hundred machines. Our experience with HDFS and HBase shows that, by allowing us to run the real system code at an unprecedented scale, Exalt can help identify scalability problems that are not observable at lower scales: in particular, Exalt helped us pinpoint and resolve issues in HDFS that improved its aggregate throughput by an order of magnitude. Available Media
3:15 p.m.–3:45 p.m.	Wednesday
Break with Refreshments Belltown Foyer, Fourth Floor
3:45 p.m.–5:20 p.m.	Wednesday
Security and Privacy Session Chair: Miguel Castro, Microsoft Research Cambridge ipShield: A Framework For Enforcing Context-Aware Privacy Supriyo Chakraborty, Chenguang Shen, Kasturi Rangan Raghavan, Yasser Shoukry, Matt Millar, and Mani Srivastava, University of California, Los Angeles Smart phones are used to collect and share personal data with untrustworthy third-party apps, often leading to data misuse and privacy violations. Unfortunately, state-of-the-art privacy mechanisms on Android provide inadequate access control and do not address the vulnerabilities that arise due to unmediated access to so-called innocuous sensors on these phones. We present ipShield, a framework that provides users with greater control over their resources at runtime. ipShield performs monitoring of every sensor accessed by an app and uses this information to perform privacy risk assessment. The risks are conveyed to the user as a list of possible inferences that can be drawn using the shared sensor data. Based on user-configured lists of allowed and private inferences, a recommendation consisting of binary privacy actions on individual sensors is generated. Finally, users are provided with options to override the recommended actions and manually configure context-aware fine-grained privacy rules. We implemented ipShield by modifying the AOSP on a Nexus 4 phone. Our evaluation indicates that running ipShield incurs negligible CPU and memory overhead and only a small reduction in battery life. Available Media Building Web Applications on Top of Encrypted Data Using Mylar Raluca Ada Popa, MIT/CSAIL; Emily Stark, Meteor, Inc.; Steven Valdez, Jonas Helfer, Nickolai Zeldovich, and Hari Balakrishnan, MIT/CSAIL Web applications rely on servers to store and process confidential information. However, anyone who gains access to the server (e.g., an attacker, a curious administrator, or a government) can obtain all of the data stored there. This paper presents Mylar, a platform for building web applications, which protects data confidentiality against attackers with full access to servers. Mylar stores sensitive data encrypted on the server, and decrypts that data only in users’ browsers. Mylar addresses three challenges in making this approach work. First, Mylar allows the server to perform keyword search over encrypted documents, even if the documents are encrypted with different keys. Second, Mylar allows users to share keys and encrypted data securely in the presence of an active adversary. Finally, Mylar ensures that client-side application code is authentic, even if the server is malicious. Results with a prototype of Mylar built on top of the Meteor framework are promising: porting 6 applications required changing just 36 lines of code on average, and the performance overheads are modest, amounting to a 17% throughput loss and a 50 ms latency increase for sending a message in a chat application. Available Media PHY Covert Channels: Can you see the Idles? Ki Suh Lee, Han Wang, and Hakim Weatherspoon, Cornell University Network covert timing channels embed secret messages in legitimate packets by modulating interpacket delays. Unfortunately, such channels are normally implemented in higher network layers (layer 3 or above) and easily detected or prevented. However, access to the physical layer of a network stack allows for timing channels that are virtually invisible: Sub-microsecond modulations that are undetectable by software endhosts. Therefore, covert timing channels implemented in the physical layer can be a serious threat to the security of a system or a network. In fact, we empirically demonstrate an effective covert timing channel over nine routing hops and thousands of miles over the Internet (the National Lambda Rail). Our covert timing channel works with cross traffic, less than 10% bit error rate, which can be masked by forward error correction, and a covert rate of 81 kilobits per second. Key to our approach is access and control over every bit in the physical layer of a 10 Gigabit network stack (a bit is 100 picoseconds wide at 10 gigabit per seconds), which allows us to modulate and interpret interpacket spacings at sub-microsecond scale. We discuss when and how a timing channel in the physical layer works, how hard it is to detect such a channel, and what is required to do so. Available Media cTPM: A Cloud TPM for Cross-Device Trusted Applications Chen Chen, Carnegie Mellon University; Himanshu Raj, Stefan Saroiu, and Alec Wolman, Microsoft Research Current Trusted Platform Modules (TPMs) are ill-suited for cross-device scenarios in trusted mobile applications because they hinder the seamless sharing of data across multiple devices. This paper presents cTPM, an extension of the TPM’s design that adds an additional root key to the TPM and shares that root key with the cloud. As a result, the cloud can create and share TPM-protected keys and data across multiple devices owned by one user. Further, the additional key lets the cTPM allocate cloud-backed remote storage so that each TPM can benefit from a trusted real-time clock and high-performance, non-volatile storage. This paper shows that cTPM is practical, versatile, and easily applicable to trusted mobile applications. Our simple change to the TPM specification is viable because its fundamental concepts—a primary root key and off-chip, NV storage—are already found in the current specification, TPM 2.0. By avoiding a clean-slate redesign, we sidestep the difficult challenge of re-verifying the security properties of a new TPM design. We demonstrate cTPM’s versatility with two case studies: extending Pasture with additional functionality, and re-implementing TrInc without the need for extra hardware. Available Media
6:00 p.m.–8:00 p.m.	Wednesday
Poster/Demo Session and Reception Emerald Ballroom, Third Floor Check out the cool new ideas and the latest preliminary research on display at the Poster/Demo Session and Reception. Take part in discussions with your colleagues over complimentary food and drinks. The list of accepted posters is available here.

All sessions will take place in the Seattle Ballroom, Fourth Floor, unless otherwise noted.

Thursday, April 3, 2014

8:30 a.m.–9:00 a.m.	Thursday
Continental Breakfast Belltown Foyer, Fourth Floor
9:00 a.m.–10:15 a.m.	Thursday
Operational Systems Track Session Chair: Katerina Argyraki, EPFL Network Virtualization in Multi-tenant Datacenters Teemu Koponen, Keith Amidon, Peter Balland, Martín Casado, Anupam Chanda, Bryan Fulton, Igor Ganichev, Jesse Gross, Natasha Gude, Paul Ingram, Ethan Jackson, Andrew Lambeth, Romain Lenglet, Shih-Hao Li, Amar Padmanabhan, Justin Pettit, Ben Pfaff, and Rajiv Ramanathan, VMware; Scott Shenker, International Computer Science Institute and the University of California, Berkeley; Alan Shieh, Jeremy Stribling, Pankaj Thakkar, Dan Wendlandt, Alexander Yip, and Ronghua Zhang, VMware Multi-tenant datacenters represent an extremely challenging networking environment. Tenants want the ability to migrate unmodified workloads from their enterprise networks to service provider datacenters, retaining the same networking configurations of their home network. The service providers must meet these needs without operator intervention while preserving their own operational flexibility and efficiency. Traditional networking approaches have failed to meet these tenant and provider requirements. Responding to this need, we present the design and implementation of a network virtualization solution for multi-tenant datacenters. Available Media Operational Experiences with Disk Imaging in a Multi-Tenant Datacenter Kevin Atkinson, Gary Wong, and Robert Ricci, University of Utah Disk images play a critical role in multi-tenant datacenters. In this paper, the first study of its kind, we analyze operational data from the disk imaging system that forms part of the infrastructure of the Emulab facility. This dataset spans four years and more than a quarter-million disk image loads requested by Emulab’s users. From our analysis, we draw observations about the nature of the images themselves (for example: how similar are they to each other?) and about usage patterns (what is the statistical distribution of image popularity?). Many of these observations have implications for the design and operation of disk imaging systems, including how images are stored, how caching is employed, the effectiveness of pre-loading, and strategies for network distribution. Available Media VPN Gate: A Volunteer-Organized Public VPN Relay System with Blocking Resistance for Bypassing Government Censorship Firewalls Daiyuu Nobori and Yasushi Shinjo, University of Tsukuba VPN Gate is a public VPN relay service designed to achieve blocking resistance to censorship firewalls such as the Great Firewall (GFW) of China. To achieve such resistance, we organize many volunteers to provide a VPN relay service, with many changing IP addresses. To block VPN Gate with their firewalls, censorship authorities must find the IP addresses of all the volunteers. To prevent this, we adopted two techniques to improve blocking resistance. The first technique is to mix a number of innocent IP addresses into the relay server list provided to the public. The second technique is collaborative spy detection. The volunteer servers work together to create a list of spies, meaning the computers used by censorship authorities to probe the volunteer servers. Using this list, each volunteer server ignores packets from spies. We launched VPN Gate on March 8, 2013. By the end of August it had about 3,000 daily volunteers using 6,300 unique IP addresses to facilitate 464,000 VPN connections from users worldwide, including 45,000 connections and 9,000 unique IP addresses from China. At the time VPN Gate maintained about 70% of volunteer VPN servers as unblocked by the GFW. Available Media
10:15 a.m.–10:45 a.m.	Thursday
Break with Refreshments Belltown Foyer, Fourth Floor
10:45 a.m.–12:20 p.m.	Thursday
Data Storage and Analytics Session Chair: George Porter, University of California, San Diego Bolt: Data Management for Connected Homes Trinabh Gupta, The University of Texas at Austin; Rayman Preet Singh, University of Waterloo; Amar Phanishayee, Jaeyeon Jung, and Ratul Mahajan, Microsoft Research We present Bolt, a data management system for an emerging class of applications—those that manipulate data from connected devices in the home. It abstracts this data as a stream of time-tag-value records, with arbitrary, application-defined tags. For reliable sharing among applications, some of which may be running outside the home, Bolt uses untrusted cloud storage as seamless extension of local storage. It organizes data into chunks that contains multiple records and are individually compressed and encrypted. While chunking enables efficient transfer and storage, it also implies that data is retrieved at the granularity of chunks, instead of records. We show that the resulting overhead, however, is small because applications in this domain frequently query for multiple proximate records. We develop three diverse applications on top of Bolt and find that the performance needs of each are easily met. We also find that compared to OpenTSDB, a popular time-series database system, Bolt is up to 40 times faster than OpenTSDB while requiring 3–5 times less storage space. Available Media Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications James Mickens, Edmund B. Nightingale, Jeremy Elson, Krishna Nareddy, and Darren Gehring, Microsoft Research; Bin Fan, Carnegie Mellon University; Asim Kadav and Vijay Chidambaram, University of Wisconsin—Madison; Osama Khan, Johns Hopkins University Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any remote disk as fast as if it were local. Using a novel striping scheme, Blizzard exposes high disk parallelism to both sequential and random workloads; also, by decoupling the durability and ordering requirements expressed by flush requests, Blizzard can commit writes out-of-order, providing high performance and crash consistency to applications that issue many small, random IOs. Blizzard’s virtual disk drive, which clients mount like a normal physical one, provides maximum throughputs of 1200 MB/s, and can improve the performance of unmodified, cloud-oblivious applications by 2x–10x. Compared to EBS, a commercially available, state-of-the-art virtual drive for cloud applications, Blizzard can improve SQL server IOp rates by seven-fold while still providing crash consistency. Available Media Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman, Princeton University We present JetStream, a system that allows real-time analysis of large, widely-distributed changing data sets. Traditional approaches to distributed analytics require users to specify in advance which data is to be backhauled to a central location for analysis. This is a poor match for domains where available bandwidth is scarce and it is infeasible to collect all potentially useful data. JetStream addresses bandwidth limits in two ways, both of which are explicit in the programming model. The system incorporates structured storage in the form of OLAP data cubes, so data can be stored for analysis near where it is generated. Using cubes, queries can aggregate data in ways and locations of their choosing. The system also includes adaptive filtering and other transformations that adjusts data quality to match available bandwidth. Many bandwidth-saving transformations are possible; we discuss which are appropriate for which data and how they can best be combined. We implemented a range of analytic queries on web request logs and image data. Queries could be expressed in a few lines of code. Using structured storage on source nodes conserved network bandwidth by allowing data to be collected only when needed to fulfill queries. Our adaptive control mechanisms are responsive enough to keep end-to-end latency within a few seconds, even when available bandwidth drops by a factor of two, and are flexible enough to express practical policies. Available Media GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, University of California, Berkeley; Michael Chien-Chun Hung, University of Southern California; Xiaoqi Ren, California Institute of Technology; Ion Stoica, University of California, Berkeley; Adam Wierman, California Institute of Technology; Minlan Yu, University of Southern California In big data analytics, timely results, even if based on only part of the data, are often good enough. For this reason, approximation jobs, which have deadline or error bounds and require only a subset of their tasks to complete, are projected to dominate big data workloads. Straggler tasks are an important hurdle when designing approximate data analytic frameworks, and the widely adopted approach to deal with them is speculative execution. In this paper, we present GRASS, which carefully uses speculation to mitigate the impact of stragglers in approximation jobs. GRASS’s design is based on first principles analysis of the impact of speculation. GRASS delicately balances immediacy of improving the approximation goal with the long term implications of using extra resources for speculation. Evaluations with production workloads from Facebook and Microsoft Bing in an EC2 cluster of 200 nodes shows that GRASS increases accuracy of deadline-bound jobs by 47% and speeds up error-bound jobs by 38%. GRASS’s design also speeds up exact computations (zero error-bound), making it a unified solution for straggler mitigation. Available Media
12:20 p.m.–2:00 p.m.	Thursday
Lunch, on your own
2:00 p.m.–3:15 p.m.	Thursday
Interpreting Signals Session Chair: David Wetherall, University of Washington Bringing Gesture Recognition to All Devices Bryce Kellogg, Vamsi Talla, and Shyamnath Gollakota, University of Washington Existing gesture-recognition systems consume significant power and computational resources that limit how they may be used in low-end devices. We introduce AllSee, the first gesture-recognition system that can operate on a range of computing devices including those with no batteries. AllSee consumes three to four orders of magnitude lower power than state-of-the-art systems and can enable always-on gesture recognition for smartphones and tablets. It extracts gesture information from existing wireless signals (e.g., TV transmissions), but does not incur the power and computational overheads of prior wireless approaches. We build AllSee prototypes that can recognize gestures on RFID tags and power-harvesting sensors. We also integrate our hardware with an off-the-shelf Nexus S phone and demonstrate gesture recognition in through-the-pocket scenarios. Our results show that AllSee achieves classification accuracies as high as 97% over a set of eight gestures. Available Media 3D Tracking via Body Radio Reflections Fadel Adib, Zach Kabelac, Dina Katabi, and Robert C. Miller, Massachusetts Institute of Technology This paper introduces WiTrack, a system that tracks the 3D motion of a user from the radio signals reflected off her body. It works even if the person is occluded from the WiTrack device or in a different room.WiTrack does not require the user to carry any wireless device, yet its accuracy exceeds current RF localization systems, which require the user to hold a transceiver. Empirical measurements with a WiTrack prototype show that, on average, it localizes the center of a human body to within a median of 10 to 13 cm in the x and y dimensions, and 21 cm in the z dimension. It also provides coarse tracking of body parts, identifying the direction of a pointing hand with a median of 11.2°. WiTrack bridges a gap between RF-based localization systems which locate a user through walls and occlusions, and human-computer interaction systems like Kinect, which can track a user without instrumenting her body, but require the user to stay within the direct line of sight of the device. Available Media Epsilon: A Visible Light Based Positioning System Liqun Li, Microsoft Research, Beijing; Pan Hu, University of Massachusetts Amherst; Chunyi Peng, The Ohio State University; Guobin Shen and Feng Zhao, Microsoft Research, Beijing Exploiting the increasingly wide use of Light-emitting Diode (LED) lighting, in this paper, we study the problem of using visible LED lights for accurate localization. The basic idea is to leverage the existing lighting infrastructure and apply trilateration to localize any devices with light sensing capability (e.g., a smartphone), using LED lamps as anchors. Through the design of Epsilon, we identify and tackle several technique challenges. In particular, we establish and experimentally verify the optical channel model for localization. We adopt BFSK and channel hopping to enable reliable location beaconing from multiple, uncoordinated light sources over the shared optical medium. We handle realistic situations towards robust localization, for example, we exploit user involvement to resolve the ambiguity in case of insufficient LED anchors. We have implemented the Epsilon system and evaluated it with a small scale hardware testbed as well as moderate-size simulations. Experimental results confirmed the effectiveness of Epsilon: the 90th percentile accuracies are 0.4m, 0.7m and 0.8m for three typical office environments. Even in the extreme situation with a single light, the 90th percentile accuracy is 1.1m. We believe that visible light based localization is promising to significantly improve the positioning accuracy, despite few open problems in practice. Available Media
3:15 p.m.–3:45 p.m.	Thursday
Break with Refreshments Belltown Foyer, Fourth Floor
3:45 p.m.–5:20 p.m.	Thursday
Improving Throughput and Latency (at Different Layers) Session Chair: Shyamnath Gollakota, University of Washington Enabling Bit-by-Bit Backscatter Communication in Severe Energy Harvesting Environments Pengyu Zhang and Deepak Ganesan, University of Massachusetts Amherst Micro-powered wireless sensors present new challenges due to the severe harvesting conditions under which they need to operate and their tiny energy reservoirs. However, existing lowpower network stacks make a slew of design choices that limit the ability to scale down to such environments. We address these issues with QuarkNet, a backscatter-based network stack that is designed to enable continuous communication even if there is only enough harvested energy to transmit a few bits at a time while simultaneously optimizing throughput across a network of micro-powered devices. We design and implement QuarkNet on a software radio based RFID reader and the UMass Moo platform, and show that QuarkNet increases the communication distance by 3.5 over Dewdrop, 9 over Buzz, and is within 96% of the upper bound of achievable range. QuarkNet also improves the communication throughput by 10.5 over EPC Gen 2, 5.8 over Dewdrop, and 3.3 over Flit for tag-to-reader communication and by 1.5 over EPC Gen 2 for reader-to-tag communication. Available Media Full Duplex MIMO Radios Dinesh Bharadia and Sachin Katti, Stanford University This paper presents the design and implementation of the first in-band full duplex WiFi-PHY based MIMO radios that practically achieve the theoretical doubling of throughput. Our design solves two fundamental challenges associated with MIMO full duplex: complexity and performance. Our design achieves full duplex with a cancellation design whose complexity scales almost linearly with the number of antennas, this complexity is close to the optimal possible. Further we also design novel digital estimation and cancellation algorithms that eliminate almost all interference and achieves the same performance as a single antenna full duplex SISO system, which is again the best possible performance. We prototype our design by building our own analog circuit boards and integrating them with a WiFi-PHY compatible standard WARP software radio implementation. We show experimentally that our design works robustly in noisy indoor environments, and provides close to the expected theoretical doubling of throughput in practice. Available Media Recursively Cautious Congestion Control Radhika Mittal, Justine Sherry, and Sylvia Ratnasamy, University of California, Berkeley; Scott Shenker, University of California, Berkeley, and International Computer Science Institute TCP’s congestion control is deliberately cautious, avoiding network overloads by starting with a small initial window and then iteratively ramping up. As a result, it often takes flows several round-trip times to fully utilize the available bandwidth. In this paper we propose RC3, a technique to quickly take advantage of available capacity from the very first RTT. RC3 uses several levels of lower priority service and a modified TCP behavior to achieve near-optimal throughputs while preserving TCP-friendliness and fairness. We implement RC3 in the Linux kernel and in NS-3. In common wide-area scenarios, RC3 results in over 40% reduction in average flow completion times, with strongest improvements—more than 70% reduction in flow completion time—seen in medium to large sized (100KB-3MB) flows. Available Media How Speedy is SPDY? Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David Wetherall, University of Washington SPDY is increasingly being used as an enhancement to HTTP/1.1. To understand its impact on performance, we conduct a systematic study of Web page load time (PLT) under SPDY and compare it to HTTP. To identify the factors that affect PLT, we proceed from simple, synthetic pages to complete page loads based on the top 200 Alexa sites. We find that SPDY provides a significant improvement over HTTP when we ignore dependencies in the page load process and the effects of browser computation. Most SPDY benefits stem from the use of a single TCP connection, but the same feature is also detrimental under high packet loss. Unfortunately, the benefits can be easily overwhelmed by dependencies and computation, reducing the improvements with SPDY to 7% for our lower bandwidth and higher RTT scenarios. We also find that request prioritization is of little help, while server push has good potential; we present a push policy based on dependencies that gives comparable performance to mod spdy while sending much less data. Available Media

All sessions will take place in the Seattle Ballroom, Fourth Floor, unless otherwise noted.

Friday, April 4, 2014

8:30 a.m.–9:00 a.m.	Friday
Continental Breakfast Belltown Foyer, Fourth Floor
9:00 a.m.–10:15 a.m.	Friday
In-Memory Computing and Caching Session Chair: Timothy Roscoe, ETH Zürich FaRM: Fast Remote Memory Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro; Microsoft Research We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and free objects in the address space with location transparency. We expect this simple programming model to be sufficient for most application code. FaRM provides two mechanisms to improve performance where required: lock-free reads over RDMA, and support for collocating objects and function shipping to enable the use of efficient single machine transactions. FaRM uses RDMA both to directly access data in the shared address space and for fast messaging and is carefully tuned for the best RDMA performance. We used FaRM to build a key-value store and a graph store similar to Facebook’s. They both perform well, for example, a 20-machine cluster can perform 167 million key-value lookups per second with a latency of 31We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and free objects in the address space with location transparency. We expect this simple programming model to be sufficient for most application code. FaRM provides two mechanisms to improve performance where required: lock-free reads over RDMA, and support for collocating objects and function shipping to enable the use of efficient single machine transactions. FaRM uses RDMA both to directly access data in the shared address space and for fast messaging and is carefully tuned for the best RDMA performance. We used FaRM to build a key-value store and a graph store similar to Facebook’s. They both perform well, for example, a 20-machine cluster can perform 167 million key-value lookups per second with a latency of 31μs. Available Media Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, and Michael S. Kester, Harvard University; Neha Narula, Yandong Mao, and Robert Morris, MIT/CSAIL Pequod is a distributed application-level key-value cache that supports declaratively defined, incrementally maintained, dynamic, partially-materialized views. These views, which we call cache joins, can simplify application development by shifting the burden of view maintenance onto the cache. Cache joins define relationships among key ranges; using cache joins, Pequod calculates views on demand, incrementally updates them as required, and in many cases improves performance by reducing client communication. To build Pequod, we had to design a view abstraction for volatile, relationless key-value caches and make it work across servers in a distributed system. Pequod performs as well as other inmemory key-value caches and, like those caches, outperforms databases with view support. Available Media MICA: A Holistic Approach to Fast In-Memory Key-Value Storage Hyeontaek Lim, Carnegie Mellon University; Dongsu Han, Korea Advanced Institute of Science and Technology (KAIST); David G. Andersen, Carnegie Mellon University; Michael Kaminsky, Intel Labs MICA is a scalable in-memory key-value store that handles 65.6 to 76.9 million key-value operations per second using a single general-purpose multi-core system. MICA is over 4–13.5x faster than current state-of-the-art systems, while providing consistently high throughput over a variety of mixed read and write workloads. MICA takes a holistic approach that encompasses all aspects of request handling, including parallel data access, network request handling, and data structure design, but makes unconventional choices in each of the three domains. First, MICA optimizes for multi-core architectures by enabling parallel access to partitioned data. Second, for efficient parallel data access, MICA maps client requests directly to specific CPU cores at the server NIC level by using client-supplied information and adopts a light-weight networking stack that bypasses the kernel. Finally, MICA’s new data structures—circular logs, lossy concurrent hash indexes, and bulk chaining—handle both read- and write-intensive workloads at low overhead. Available Media
10:15 a.m.–10:45 a.m.	Friday
Break with Refreshments Belltown Foyer, Fourth Floor
10:45 a.m.–12:20 p.m.	Friday
Scalable Networking Session Chair: Vyas Sekar, Stony Brook University NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms Jinho Hwang, The George Washington University; K. K. Ramakrishnan, Rutgers University; Timothy Wood, The George Washington University NetVM brings virtualization to the Network by enabling high bandwidth network functions to operate at near line speed, while taking advantage of the flexibility and customization of low cost commodity servers. NetVM allows customizable data plane processing capabilities such as firewalls, proxies, and routers to be embedded within virtual machines, complementing the control plane capabilities of Software Defined Networking. NetVM makes it easy to dynamically scale, deploy, and reprogram network functions. This provides far greater flexibility than existing purpose-built, sometimes proprietary hardware, while still allowing complex policies and full packet inspection to determine subsequent processing. It does so with dramatically higher throughput than existing software router platforms. NetVM is built on top of the KVM platform and Intel DPDK library. We detail many of the challenges we have solved such as adding support for high-speed inter-VM communication through shared huge pages and enhancing the CPU scheduler to prevent overheads caused by inter-core communication and context switching. NetVM allows true zero-copy delivery of data to VMs both for packet processing and messaging among VMs within a trust boundary. Our evaluation shows how NetVM can compose complex network functionality from multiple pipelined VMs and still obtain throughputs up to 10 Gbps, an improvement of more than 250% compared to existing techniques that use SR-IOV for virtualized networking. Available Media ClickOS and the Art of Network Function Virtualization Joao Martins and Mohamed Ahmed, NEC Europe Ltd.; Costin Raiciu and Vladimir Olteanu, University Politehnica of Bucharest; Michio Honda, Roberto Bifulco, and Felipe Huici, NEC Europe Ltd. Over the years middleboxes have become a fundamental part of today’s networks. Despite their usefulness, they come with a number of problems, many of which arise from the fact that they are hardware-based: they are costly, difficult to manage, and their functionality is hard or impossible to change, to name a few. To address these issues, there is a recent trend towards network function virtualization (NFV), in essence proposing to turn these middleboxes into software-based, virtualized entities. Towards this goal we introduce ClickOS, a high-performance, virtualized software middlebox platform. ClickOS virtual machines are small (5MB), boot quickly (about 30 milliseconds), add little delay (45 microseconds) and over one hundred of them can be concurrently run while saturating a 10Gb pipe on a commodity server. We further implement a wide range of middleboxes including a firewall, a carrier-grade NAT and a load balancer and show that ClickOS can handle packets in the millions per second. Available Media SENIC: Scalable NIC for End-Host Rate Limiting Sivasankar Radhakrishnan, University of California, San Diego; Yilong Geng and Vimalkumar Jeyakumar, Stanford University; Abdul Kabbani, Google Inc.; George Porter, University of California, San Diego; Amin Vahdat, Google Inc. and University of California, San Diego Rate limiting is an important primitive for managing server network resources. Unfortunately, software-based rate limiting suffers from limited accuracy and high CPU overhead, and modern NICs only support a handful of rate limiters. We present SENIC, a NIC design that can natively support 10s of thousands of rate limiters—100x to 1000x the number available in NICs today. The key idea is that the host CPU only classifies packets, enqueues them in per-class queues in host memory, and specifies rate limits for each traffic class. On the NIC, SENIC maintains class metadata, computes the transmit schedule, and only pulls packets from host memory when they are ready to be transmitted (on a real time basis). We implemented SENIC on NetFPGA, with 1000 rate limiters requiring just 30KB SRAM, and it was able to accurately pace packets. Further, in a memcached benchmark against software rate limiters, SENIC is able to sustain up to 250% higher load, while simultaneously keeping tail latency under 4ms at 90% network utilization. Available Media mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems EunYoung Jeong, Shinae Woo, Muhammad Jamshed, and Haewon Jeong, Korea Advanced Institute of Science and Technology (KAIST); Sunghwan Ihm, Princeton University; Dongsu Han and KyoungSoo Park, Korea Advanced Institute of Science and Technology (KAIST) *Community Award* Scaling the performance of short TCP connections on multicore systems is fundamentally challenging. Although many proposals have attempted to address various shortcomings, inefficiency of the kernel implementation still persists. For example, even state-of-the-art designs spend 70% to 80% of CPU cycles in handling TCP connections in the kernel, leaving only small room for innovation in the user-level program. This work presents mTCP, a high-performance userlevel TCP stack for multicore systems. mTCP addresses the inefficiencies from the ground up—from packet I/O and TCP connection management to the application interface. In addition to adopting well-known techniques, our design (1) translates multiple expensive system calls into a single shared memory reference, (2) allows efficient flowlevel event aggregation, and (3) performs batched packet I/O for high I/O efficiency. Our evaluations on an 8-core machine showed that mTCP improves the performance of small message transactions by a factor of 25 compared to the latest Linux TCP stack and a factor of 3 compared to the best-performing research system known so far. It also improves the performance of various popular applications by 33% to 320% compared to those on the Linux stack. Available Media
12:20 p.m.–2:00 p.m.	Friday
Lunch, on your own
2:00 p.m.–3:15 p.m.	Friday
New Programming Abstractions Session Chair: Aditya Akella, University of Wisconsin–Madison Warranties for Faster Strong Consistency Jed Liu, Tom Magrino, Owen Arden, Michael D. George, and Andrew C. Myers, Cornell University We present a new mechanism, warranties, to enable building distributed systems with linearizable transactions. A warranty is a time-limited assertion about one or more distributed objects. These assertions generalize optimistic concurrency control, improving throughput because clients holding warranties need not communicate to verify the warranty’s assertion. Updates that might cause an active warranty to become false are delayed until the warranty expires, trading write latency for read latency. For workloads biased toward reads, warranties improve scalability and system throughput. Warranties can be expressed using language-level computations, and they integrate harmoniously into the programming model as a form of memoization. Experiments with some nontrivial programs demonstrate that warranties enable high performance despite the simple programming model. Available Media Tierless Programming and Reasoning for Software-Defined Networks Tim Nelson, Andrew D. Ferguson, Michael J.G. Scheer, and Shriram Krishnamurthi, Brown University We present Flowlog, a tierless language for programming SDN controllers. In contrast to languages with different abstractions for each program tier—the controlplane, data-plane, and controller state—Flowlog provides a unified abstraction for all three tiers. Flowlog is reminiscent of both SQL and rule-based languages such as Cisco IOS and JunOS; unlike these network configuration languages, Flowlog supports programming with mutable state. We intentionally limit Flowlog’s expressivity to enable built-in verification and proactive compilation despite the integration of controller state. To compensate for its limited expressive power, Flowlog enables the reuse of external libraries through callouts. Flowlog proactively compiles essentially all forwarding behavior to switch tables. For rules that maintain controller state or generate fresh packets, the compiler instructs switches to send the minimum amount of necessary traffic to the controller. Given that Flowlog programs can be stateful, this process is non-trivial. We have successfully used Flowlog to implement real network applications. We also compile Flowlog programs to Alloy, a popular verification tool. With this we have verified several properties, including program-correctness properties that are topology-independent, and have found bugs in our own programs. Available Media Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions using FlowTags Seyed Kaveh Fayazbakhsh, Carnegie Mellon University; Luis Chiang, Deutsche Telekom Labs; Vyas Sekar, Carnegie Mellon University; Minlan Yu, University of Southern California; Jeffrey C. Mogul, Google Middleboxes provide key security and performance guarantees in networks. Unfortunately, the dynamic traffic modifications they induce make it difficult to reason about network management tasks such as access control, accounting, and diagnostics. This also makes it difficult to integrate middleboxes into SDN-capable networks and leverage the benefits that SDN can offer. In response, we develop the FlowTags architecture. FlowTags-enhanced middleboxes export tags to provide the necessary causal context (e.g., source hosts or internal cache/miss state). SDN controllers can configure the tag generation and tag consumption operations using new FlowTags APIs. These operations help restore two key SDN tenets: (i) bindings between packets and their “origins,” and (ii) ensuring that packets follow policymandated paths. We develop new controller mechanisms that leverage FlowTags. We show the feasibility of minimally extending middleboxes to support FlowTags. We also show that FlowTags imposes low overhead over traditional SDN mechanisms. Finally, we demonstrate the early promise of FlowTags in enabling new verification and diagnosis capabilities. Available Media
3:15 p.m.–3:30 p.m.	Friday
Closing Remarks

sponsors

twitter

usenix conference policies

You are here

connect with us

Technical Sessions

Wednesday, April 2, 2014

Continental Breakfast

Break with Refreshments

Symposium Luncheon

Break with Refreshments

Thursday, April 3, 2014

Continental Breakfast