OSDI '12 Program

The full USENIX OSDI ’12 Proceedings are now available.

 

Monday, October 8, 2012

8:45 a.m.–10:30 a.m. Monday

Welcome, Jay Lepreau Best Paper Awards, and Keynote Address

Ray Dolby Ballroom 123

Program Co-Chairs: Chandu Thekkath, Microsoft Research Silicon Valley, and Amin Vahdat, Google and University of California, San Diego

Keynote Address: The UCSC Cancer Genomics Hub

David Haussler, Investigator, Howard Hughes Medical Institute; Director, Center for Biomolecular Science and Engineering, University of California, Santa Cruz

Cancer is a complex condition—patients present with thousands of subtypes involving different combinations of DNA mutations. Understanding cancer will require aggregating DNA data from many thousands of cancer genomes, facilitating the statistical power to distinguish patterns in the mutations. The rapidly plummeting cost of DNA sequencing will soon make cancer genome sequencing a widespread clinical practice. To anticipate this, UCSC has built a 5-petabyte database for tumor genomes that will be sequenced through National Cancer Institute projects—the Cancer Genomics Hub—and is tackling the significant computational challenges posed by storing, serving, and interpreting cancer genomics data.

Cancer is a complex condition—patients present with thousands of subtypes involving different combinations of DNA mutations. Understanding cancer will require aggregating DNA data from many thousands of cancer genomes, facilitating the statistical power to distinguish patterns in the mutations. The rapidly plummeting cost of DNA sequencing will soon make cancer genome sequencing a widespread clinical practice. To anticipate this, UCSC has built a 5-petabyte database for tumor genomes that will be sequenced through National Cancer Institute projects—the Cancer Genomics Hub—and is tackling the significant computational challenges posed by storing, serving, and interpreting cancer genomics data.

David Haussler is an Investigator at Howard Hughes Medical Institute; Distinguished Professor of Biomolecular Engineering and Director of the Center for Biomolecular Science and Engineering at the University of California, Santa Cruz; Scientific Co-Director, California Institute for Quantitative Biosciences (QB3); and Cofounder of the Genome 10K Project. His research integrates mathematics, computer science, and molecular biology. He develops new computer-based algorithms to interpret comparative and high-throughput genomics data to understand gene structure, function, and regulation. As a collaborator on the international Human Genome Project, his team assembled the first human genome sequence and produced the UCSC Genome Browser, which is used extensively in biomedical research and serves as the platform for many large-scale genomics projects. Haussler received his Ph.D. in computer science from the University of Colorado at Boulder and belongs to the National Academy of Sciences and the American Academy of Arts and Sciences.

Available Media
10:30 a.m.–11:00 a.m. Monday

Break

Ray Dolby Ballroom Pre-Function Terrace


Monday

Big Data

Ray Dolby Ballroom 123

Session Chair: John Wilkes, Google

Flat Datacenter Storage

 Edmund B. Nightingale, Jeremy Elson, and Jinliang Fan, Microsoft Research;  Owen Hofmann, University of Texas at Austin;  Jon Howell and Yutaka Suzue, Microsoft Research

Flat Datacenter Storage (FDS) is a high-performance, fault-tolerant, large-scale, locality-oblivious blob store. Using a novel combination of full bisection bandwidth networks, data and metadata striping, and flow control, FDS multiplexes an application’s large-scale I/O across the available throughput and latency budget of every disk in a cluster. FDS therefore makes many optimizations around data locality unnecessary. Disks also communicate with each other at their full bandwidth, making recovery from disk failures extremely fast. FDS is designed for datacenter scale, fully distributing metadata operations that might otherwise become a bottleneck. 

FDS applications achieve single-process read and write performance of more than 2GB/s. We measure recovery of 92GB data lost to disk failure in 6.2 s and recovery from a total machine failure with 655GB of data in 33.7 s. Application performance is also high: we describe our FDS-based sort application which set the 2012 world record for disk-to-disk sorting.

Available Media

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University; Carlos Guestrin, University of Washington

Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability.

In this paper, we characterize the challenges of computation on natural graphs in the context of existing graphparallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains.

Available Media

GraphChi: Large-Scale Graph Computation on Just a PC

Aapo Kyrola and Guy Blelloch, Carnegie Mellon University; Carlos Guestrin, University of Washington

Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts.

In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives.

By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC. 

Available Media

12:30 p.m.–2:00 p.m. Monday

2:00 p.m.–3:30 p.m. Monday

Privacy

Ray Dolby Ballroom 123

Session Chair: Timothy Roscoe, ETH Zurich

Hails: Protecting Data Privacy in Untrusted Web Applications

Daniel B. Giffin, Amit Levy, Deian Stefan, David Terei, David Mazières, and John C. Mitchell, Stanford University; Alejandro Russo, Chalmers University 

Modern extensible web platforms like Facebook and Yammer depend on third-party software to offer a rich experience to their users. Unfortunately, users running a third-party “app” have little control over what it does with their private data. Today’s platforms offer only ad-hoc constraints on app behavior, leaving users an unfortunate trade-off between convenience and privacy. A principled approach to code confinement could allow the integration of untrusted code while enforcing flexible, end-to-end policies on data access. This paper presents a new web framework, Hails, that adds mandatory access control and a declarative policy language to the familiar MVC architecture. We demonstrate the flexibility of Hails through GitStar.com, a code-hosting website that enforces robust privacy policies on user data even while allowing untrusted apps to deliver extended features to users.

Available Media

Eternal Sunshine of the Spotless Machine: Protecting Privacy with Ephemeral Channels

Alan M. Dunn, Michael Z. Lee, Suman Jana, Sangman Kim, Mark Silberstein, Yuanzhong Xu, Vitaly Shmatikov, and Emmett Witchel, The University of Texas at Austin

Modern systems keep long memories. As we show in this paper, an adversary who gains access to a Linux system, even one that implements secure deallocation, can recover the contents of applications’ windows, audio buffers, and data remaining in device drivers—long after the applications have terminated.
We design and implement Lacuna, a system that allows users to run programs in “private sessions.” After the session is over, all memories of its execution are erased. The key abstraction in Lacuna is an ephemeral channel, which allows the protected program to talk to peripheral devices while making it possible to delete the memories of this communication from the host. Lacuna can run unmodified applications that use graphics, sound, USB input devices, and the network, with only 20 percentage points of additional CPU utilization.

Available Media

CleanOS: Limiting Mobile Data Exposure with Idle Eviction

Yang Tang, Phillip Ames, Sravan Bhamidipati, Ashish Bijlani, Roxana Geambasu, and Nikhil Sarda, Columbia University

Mobile-device theft and loss have reached gigantic proportions. Despite these threats, today’s mobile devices are saturated with sensitive information due to operating systems that never securely erase data and applications that hoard it on the vulnerable device for performance or convenience. This paper presents CleanOS, a new Android-based operating system that manages sensitive data rigorously and maintains a clean environment at all times. To do so, CleanOS leverages a key property of today’s mobile applications—the use of trusted, cloudbased services. Specifically, CleanOS identifies and tracks sensitive data in RAM and on stable storage, encrypts it with a key, and evicts that key to the cloud when the data is not in active use on the device. We call this process idle eviction of sensitive data. To implement CleanOS, we used the TaintDroid mobile taint-tracking system to identify sensitive data locations and instrumented Android’s Dalvik interpreter to securely evict that data after a specified period of non-use. Our experimental results show that CleanOS limits sensitive-data exposure drastically while incurring acceptable overheads on mobile networks.

Available Media

3:30 p.m.–4:00 p.m. Monday

Break

Ray Dolby Ballroom Pre-Function Terrace


4:00 p.m.–5:00 p.m. Monday

Mobility

Ray Dolby Ballroom 123

Session Chair: Junfeng Yang, Columbia University

COMET: Code Offload by Migrating Execution Transparently

Mark S. Gordon, D. Anoushe Jamshidi, Scott Mahlke, and Z. Morley Mao, University of Michigan; Xu Chen, AT&T Labs—Research

In this paper we introduce a runtime system to allow unmodified multi-threaded applications to use multiple machines. The system allows threads to migrate freely between machines depending on the workload. Our prototype, COMET (Code Offload by Migrating Execution Transparently), is a realization of this design built on top of the Dalvik Virtual Machine. COMET leverages the underlying memory model of our runtime to implement distributed shared memory (DSM) with as few interactions between machines as possible. Making use of a new VM-synchronization primitive, COMET imposes little restriction on when migration can occur. Additionally, enough information is maintained so one machine may resume computation after a network failure.

We target our efforts towards augmenting smartphones or tablets with machines available in the network. We demonstrate the effectiveness of COMET on several real applications available on Google Play. These applications include image editors, turn-based games, a trip planner, and math tools. Utilizing a server-class machine, COMET can offer significant speed-ups on these real applications when run on a modern smartphone. With WiFi and 3G networks, we observe geometric mean speed-ups of 2.88X and 1.27X relative to the Dalvik interpreter across the set of applications with speed-ups as high as 15X on some applications.

Available Media

AppInsight: Mobile App Performance Monitoring in the Wild

Lenin Ravindranath, Jitendra Padhye, Sharad Agarwal, Ratul Mahajan, Ian Obermiller, and Shahin Shayandeh, Microsoft Research

The mobile-app marketplace is highly competitive. To maintain and improve the quality of their apps, developers need data about how their app is performing in the wild. The asynchronous, multi-threaded nature of mobile apps makes tracing difficult. The difficulties are compounded by the resource limitations inherent in the mobile platform. To address this challenge, we develop AppInsight, a system that instruments mobileapp binaries to automatically identify the critical path in user transactions, across asynchronous-call boundaries. AppInsight is lightweight, it does not require any input from the developer, and it does not require any changes to the OS. We used AppInsight to instrument 30 marketplace apps, and carried out a field trial with 30 users for over 4 months. We report on the characteristics of the critical paths that AppInsight found in this data. We also give real-world examples of how AppInsight helped developers improve the quality of their app.

Available Media

6:00 p.m.–7:30 p.m. Monday

Poster Session and Reception

Ray Dolby Ballroom 456

Check out the cool new ideas and the latest preliminary research on display at the Poster Sessions & Receptions. Take part in discussions with your colleagues over complimentary food and drinks.

The list of accepted posters is available here.


Tuesday, October 9, 2012

9:00 a.m.–10:30 a.m. Tuesday

Distributed Systems and Networking

Ray Dolby Ballroom 123

Session Chair: Jason Flinn, University of Michigan

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

Zhenyu Guo, Microsoft Research Asia; Xuepeng Fan, Microsoft Research Asia and Huazhong University of Science and Technology; Rishan Chen, Microsoft Research Asia and Peking University; Jiaxing Zhang, Hucheng Zhou, and Sean McDirmid, Microsoft Research Asia; Chang Liu, Microsoft Research Asia and Shanghai Jiao Tong University; Wei Lin and Jingren Zhou, Microsoft Bing; Lidong Zhou, Microsoft Research Asia

To minimize the amount of data-shuffling I/O that occurs between the pipeline stages of a distributed dataparallel program, its procedural code must be optimized with full awareness of the pipeline that it executes in. Unfortunately, neither pipeline optimizers nor traditional compilers examine both the pipeline and procedural code of a data-parallel program so programmers must either hand-optimize their program across pipeline stages or live with poor performance. To resolve this tension between performance and programmability, this paper describes PeriSCOPE, which automatically optimizes a data-parallel program’s procedural code in the context of data flow that is reconstructed from the program’s pipeline topology. Such optimizations eliminate unnecessary code and data, perform early data filtering, and calculate small derived values (e.g., predicates) earlier in the pipeline, so that less data—sometimes much less data—is transferred between pipeline stages. We describe how PeriSCOPE is implemented and evaluate its effectiveness on real production jobs.

Available Media

MegaPipe: A New Programming Interface for Scalable Network I/O

Sangjin Han and Scott Marshall, University of California, Berkeley; Byung-Gon Chun, Yahoo! Research; Sylvia Ratnasamy, University of California, Berkeley

We present MegaPipe, a new API for efficient, scalable network I/O for message-oriented workloads. The design of MegaPipe centers around the abstraction of a channel—a per-core, bidirectional pipe between the kernel and user space, used to exchange both I/O requests and event notifications. On top of the channel abstraction, we introduce three key concepts of MegaPipe: partitioning, lightweight socket (lwsocket), and batching.

We implement MegaPipe in Linux and adapt memcached and nginx. Our results show that, by embracing a clean-slate design approach, MegaPipe is able to exploit new opportunities for improved performance and ease of programmability. In microbenchmarks on an 8-core server with 64 B messages, MegaPipe outperforms baseline Linux between 29% (for long connections) and 582% (for short connections). MegaPipe improves the performance of a modified version of memcached between 15% and 320%. For a workload based on real-world HTTP traces, MegaPipe boosts the throughput of nginx by 75%.

Available Media

DJoin: Differentially Private Join Queries over Distributed Databases

Arjun Narayan and Andreas Haeberlen, University of Pennsylvania

In this paper, we study the problem of answering queries about private data that is spread across multiple different databases. For instance, a medical researcher may want to study a possible correlation between travel patterns and certain types of illnesses. The necessary information exists today – e.g., in airline reservation systems and hospital records—but it is maintained by two separate companies who are prevented by law from sharing this information with each other, or with a third party. This separation prevents the processing of such queries, even if the final answer, e.g., a correlation coefficient, would be safe to release.

We present DJoin, a system that can process such distributed queries and can give strong differential privacy guarantees on the result. DJoin can support many SQLstyle queries, including joins of databases maintained by different entities, as long as they can be expressed using DJoin’s two novel primitives: BN-PSI-CA, a differentially private form of private set intersection cardinality, and DCR, a multi-party combination operator that can aggregate noised cardinalities without compounding the individual noise terms. Our experimental evaluation shows that DJoin can process realistic queries at practical timescales: simple queries on three databases with 15,000 rows each take between 1 and 7.5 hours.

Available Media

10:30 a.m.–11:00 a.m. Tuesday

Break

Ray Dolby Ballroom Pre-Function Terrace


11:00 a.m.–12:30 p.m. Tuesday

Security

Ray Dolby Ballroom 123

Session Chair: Tim Harris, Oracle Labs

Improving Integer Security for Systems with KINT

Xi Wang and Haogang Chen, MIT CSAIL; Zhihao Jia, Tsinghua University IIIS; Nickolai Zeldovich and M. Frans Kaashoek, MIT CSAIL

Integer errors have emerged as an important threat to systems security, because they allow exploits such as buffer overflow and privilege escalation. This paper presents KINT, a tool that uses scalable static analysis to detect integer errors in C programs. KINT generates constraints from source code and user annotations, and feeds them into a constraint solver for deciding whether an integer error can occur. KINT introduces a number of techniques to reduce the number of false error reports. KINT identified more than 100 integer errors in the Linux kernel, the lighttpd web server, and OpenSSH, which were confirmed and fixed by the developers. Based on the experience with KINT, the paper further proposes a new integer family with NaN semantics to help developers avoid integer errors in C programs.

Available Media

Dissent in Numbers: Making Strong Anonymity Scale

David Isaac Wolinsky, Henry Corrigan-Gibbs, and Bryan Ford, Yale University; Aaron Johnson, U.S. Naval Research Laboratory

Current anonymous communication systems make a trade-off between weak anonymity among many nodes, via onion routing, and strong anonymity among few nodes, via DC-nets. We develop novel techniques in Dissent, a practical group anonymity system, to increase by over two orders of magnitude the scalability of strong, traffic analysis resistant approaches. Dissent derives its scalability from a client/server architecture, in which many unreliable clients depend on a smaller and more robust, but administratively decentralized, set of servers. Clients trust only that at least one server in the set is honest, but need not know or choose which server to trust. Unlike the quadratic costs of prior peer-to-peer DC-nets schemes, Dissent’s client/server design makes communication and processing costs linear in the number of clients, and hence in anonymity set size. Further, Dissent’s servers can unilaterally ensure progress, even if clients respond slowly or disconnect at arbitrary times, ensuring robustness against client churn, tail latencies, and DoS attacks. On DeterLab, Dissent scales to 5,000 online participants with latencies as low as 600 milliseconds for 600-client groups. An anonymous Web browsing application also shows that Dissent’s performance suffices for interactive communication within smaller local-area groups.

Available Media

Efficient Patch-based Auditing for Web Application Vulnerabilities

Taesoo Kim, Ramesh Chandra, and Nickolai Zeldovich, MIT CSAIL

POIROT is a system that, given a patch for a newly discovered security vulnerability in a web application, helps administrators detect past intrusions that exploited the vulnerability. POIROT records all requests to the server during normal operation, and given a patch, re-executes requests using both patched and unpatched software, and reports to the administrator any request that executes differently in the two cases. A key challenge with this approach is the cost of re-executing all requests, and POIROT introduces several techniques to reduce the time required to audit past requests, including filtering requests based on their control flow and memoization of intermediate results across different requests.

A prototype of POIROT for PHP accurately detects attacks on older versions of MediaWiki and HotCRP, given subsequently released patches. POIROT’s techniques allow it to audit past requests 12–51× faster than the time it took to originally execute the same requests, for patches to code executed by every request, under a realistic MediaWiki workload.

Available Media

12:30 p.m.–2:00 p.m. Tuesday

Symposium Luncheon

Annex

The presentation of the ACM SIGOPS Hall of Fame Awards and the Mark Weiser Award will take place during lunch at 1:15 p.m.

2:00 p.m.–3:30 p.m. Tuesday

Potpourri

Ray Dolby Ballroom 123

Session Chair: Florentina Popovici, Google

Experiences from a Decade of TinyOS Development

Philip Levis, Stanford University

When first written in 2000, TinyOS’s users were a handful of academic computer science researchers. A decade later, TinyOS averages 25,000 downloads a year, is in many commercial products, and remains a platform used for a great deal of sensor network, low-power systems, and wireless research.

We focus on how technical and social decisions influenced this success, sometimes in surprising ways. As TinyOS matured, it evolved language extensions to help experts write efficient, robust systems. These extensions revealed insights and novel programming abstractions for embedded software. Using these abstractions, experts could build increasingly complex systems more easily than with other operating systems, making TinyOS the dominant choice.

This success, however, came at a long-term cost. System design decisions that seem good at first can have unforeseen and undesirable implications that play out over the span of years. Today, TinyOS is a stable, selfcontained ecosystem that is discouraging to new users. Other systems, such as Arduino and Contiki, by remaining more accessible, have emerged as better solutions for simpler embedded sensing applications.

Available Media

Automated Concurrency-Bug Fixing

Guoliang Jin, Wei Zhang, Dongdong Deng, Ben Liblit, and Shan Lu, University of Wisconsin—Madison

Concurrency bugs are widespread in multithreaded programs. Fixing them is time-consuming and error-prone. We present CFix, a system that automates the repair of concurrency bugs. CFix works with a wide variety of concurrency-bug detectors. For each failure-inducing interleaving reported by a bug detector, CFix first determines a combination of mutual-exclusion and order relationships that, once enforced, can prevent the buggy interleaving. CFix then uses static analysis and testing to determine where to insert what synchronization operations to force the desired mutual-exclusion and order relationships, with a best effort to avoid deadlocks and excessive performance losses. CFix also simplifies its own patches by merging fixes for related bugs.

Evaluation using four different types of bug detectors and thirteen real-world concurrency-bug cases shows that CFix can successfully patch these cases without causing deadlocks or excessive performance degradation. Patches automatically generated by CFix are of similar quality to those manually written by developers.

Available Media

All about Eve: Execute-Verify Replication for Multi-Core Servers

Manos Kapritsos and Yang Wang, University of Texas at Austin; Vivien Quema, Grenoble INP; Allen Clement, MPI-SWS; Lorenzo Alvisi and Mike Dahlin, University of Texas at Austin

This paper presents Eve, a new Execute-Verify architecture that allows state machine replication to scale to multi-core servers. Eve departs from the traditional agree-execute architecture of state machine replication: replicas first execute groups of requests concurrently and then verify that they can reach agreement on a state and output produced by a correct replica; if they can not, they roll back and execute the requests sequentially. Eve minimizes divergence using application-specific criteria to organize requests into groups of requests that are unlikely to interfere. Our evaluation suggests that Eve’s unique ability to combine execution independence with nondetermistic interleaving of requests enables highperformance replication for multi-core servers while tolerating a wide range of faults, including elusive concurrency bugs.

Available Media

3:30 p.m.–4:00 p.m. Tuesday

Break

Ray Dolby Ballroom Pre-Function Terrace


4:00 p.m.–5:00 p.m. Tuesday

Replication

Ray Dolby Ballroom 123

Session Chair: Jon Howell, Microsoft Research Redmond

Spanner: Google’s Globally-Distributed Database

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, Google, Inc.

Awarded Jay Lepreau Best Paper!

Spanner is Google’s scalable, multi-version, globally distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

Available Media

Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary

Cheng Li, Max Planck Institute for Software Systems; Daniel Porto, CITI/Universidade Nova de Lisboa and Max Planck Institute for Software Systems; Allen Clement, Max Planck Institute for Software Systems; Johannes Gehrke, Cornell University; Nuno Preguiça and Rodrigo Rodrigues, CITI/Universidade Nova de Lisboa

Online services distribute and replicate state across geographically diverse data centers and direct user requests to the closest or least loaded site. While effectively ensuring low latency responses, this approach is at odds with maintaining cross-site consistency. We make three contributions to address this tension. First, we propose RedBlue consistency, which enables blue operations to be fast (and eventually consistent) while the remaining red operations are strongly consistent (and slow). Second, to make use of fast operation whenever possible and only resort to strong consistency when needed, we identify conditions delineating when operations can be blue and must be red. Third, we introduce a method that increases the space of potential blue operations by breaking them into separate generator and shadow phases. We built a coordination infrastructure called Gemini that offers RedBlue consistency, and we report on our experience modifying the TPC-W and RUBiS benchmarks and an online social network to use Gemini. Our experimental results show that RedBlue consistency provides substantial performance gains without sacrificing consistency.

Available Media

6:00 p.m.–7:30 p.m. Tuesday

Poster Session and Reception

Ray Dolby Ballroom 456

Check out the cool new ideas and the latest preliminary research on display at the Poster Sessions & Receptions. Take part in discussions with your colleagues over complimentary food and drinks.

The list of accepted posters is available here.


Wednesday, October 10, 2012

9:00 a.m.–10:30 a.m. Wednesday

Testing & Debugging

Ray Dolby Ballroom 123

Session Chair: Jeff Mogul, HP Labs

SymDrive: Testing Drivers without Devices

Matthew J. Renzelmann, Asim Kadav, and Michael M. Swift, University of Wisconsin—Madison

Device-driver development and testing is a complex and error-prone undertaking. For example, testing errorhandling code requires simulating faulty inputs from the device. A single driver may support dozens of devices, and a developer may not have access to any of them. Consequently, many Linux driver patches include the comment “compile tested only.”

SymDrive is a system for testing Linux and FreeBSD drivers without their devices present. The system uses symbolic execution to remove the need for hardware, and extends past tools with three new features. First, SymDrive uses static-analysis and source-to-source transformation to greatly reduce the effort of testing a new driver. Second, SymDrive checkers are ordinary C code and execute in the kernel, where they have full access to kernel and driver state. Finally, SymDrive provides an executiontracing tool to identify how a patch changes I/O to the device and to compare device-driver implementations. In applying SymDrive to 21 Linux drivers and 5 FreeBSD drivers, we found 39 bugs.

Available Media

Be Conservative: Enhancing Failure Diagnosis with Proactive Logging

Ding Yuan, University of Illinois at Urbana-Champaign and University of California, San Diego; Soyeon Park, Peng Huang, Yang Liu, Michael M. Lee, Xiaoming Tang, Yuanyuan Zhou, and Stefan Savage, University of California, San Diego

When systems fail in the field, logged error or warning messages are frequently the only evidence available for assessing and diagnosing the underlying cause. Consequently, the efficacy of such logging—how often and how well error causes can be determined via postmortem log messages—is a matter of significant practical importance. However, there is little empirical data about how well existing logging practices work and how they can yet be improved. We describe a comprehensive study characterizing the efficacy of logging practices across five large and widely used software systems. Across 250 randomly sampled reported failures, we first identify that more than half of the failures could not be diagnosed well using existing log data. Surprisingly, we find that majority of these unreported failures are manifested via a common set of generic error patterns (e.g., system call return errors) that, if logged, can significantly ease the diagnosis of these unreported failure cases. We further mechanize this knowledge in a tool called Errlog, that proactively adds appropriate logging statements into source code while adding only 1.4% performance overhead. A controlled user study suggests that Errlog can reduce diagnosis time by 60.7%.

Available Media

X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software

Mona Attariyan, University of Michigan and Google; Michael Chow and Jason Flinn, University of Michigan
    Awarded Jay Lepreau Best Student Paper!   

Troubleshooting the performance of production software is challenging. Most existing tools, such as profiling, tracing, and logging systems, reveal what events occurred during performance anomalies. However, users of such tools must infer why these events occurred; e.g., that their execution was due to a root cause such as a specific input request or configuration setting. Such inference often requires source code and detailed application knowledge that is beyond system administrators and end users.

This paper introduces performance summarization, a technique for automatically diagnosing the root causes of performance problems. Performance summarization instruments binaries as applications execute. It first attributes performance costs to each basic block. It then uses dynamic information flow tracking to estimate the likelihood that a block was executed due to each potential root cause. Finally, it summarizes the overall cost of each potential root cause by summing the per-block cost multiplied by the cause-specific likelihood over all basic blocks. Performance summarization can also be performed differentially to explain performance differences between two similar activities. X-ray is a tool that implements performance summarization. Our results show that X-ray accurately diagnoses 17 performance issues in Apache, lighttpd, Postfix, and PostgreSQL, while adding 2.3% average runtime overhead.

Available Media

10:30 a.m.–11:00 a.m. Wednesday

Break

Ray Dolby Ballroom Pre-Function Terrace


11:00 a.m.–12:30 p.m. Wednesday

Isolation

Ray Dolby Ballroom 123

Session Chair: Nickolai Zeldovich, Massachusetts Institute of Technology

Pasture: Secure Offline Data Access Using Commodity Trusted Hardware

Ramakrishna Kotla and Tom Rodeheffer, Microsoft Research; Indrajit Roy, HP Labs; Patrick Stuedi, IBM Research; Benjamin Wester, Facebook

This paper presents Pasture, a secure messaging and logging library that enables rich mobile experiences by providing secure offline data access. Without trusting users, applications, operating systems, or hypervisors, Pasture leverages commodity trusted hardware to provide two important safety properties: accessundeniability (a user cannot deny any offline data access obtained by his device without failing an audit) and verifiable-revocation (a user who generates a verifiable proof of revocation of unaccessed data can never access that data in the future).

For practical viability, Pasture moves costly trusted hardware operations from common data access actions to uncommon recovery and checkpoint actions. We used Pasture to augment three applications with secure offline data access to provide high availability, rich functionality, and improved consistency. Our evaluation suggests that Pasture overheads are acceptable for these applications.

Available Media

Dune: Safe User-level Access to Privileged CPU Features

Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis, Stanford University

Dune is a system that provides applications with direct but safe access to hardware features such as ring protection, page tables, and tagged TLBs, while preserving the existing OS interfaces for processes. Dune uses the virtualization hardware in modern processors to provide a process, rather than a machine abstraction. It consists of a small kernel module that initializes virtualization hardware and mediates interactions with the kernel, and a user-level library that helps applications manage privileged hardware features. We present the implementation of Dune for 64bit x86 Linux. We use Dune to implement three userlevel applications that can benefit from access to privileged hardware: a sandbox for untrusted code, a privilege separation facility, and a garbage collector. The use of Dune greatly simplifies the implementation of these applications and provides significant performance advantages.

Available Media

Performance Isolation and Fairness for Multi-Tenant Cloud Storage

David Shue and Michael J. Freedman, Princeton University; Anees Shaikh, IBM T.J. Watson Research Center

Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Misbehaving or high-demand tenants can overload the shared service and disrupt other well-behaved tenants, leading to unpredictable performance and violating SLAs.

This paper presents Pisces, a system for achieving datacenter-wide per-tenant performance isolation and fairness in shared key-value storage. Today’s approaches for multi-tenant resource allocation are based either on perVM allocations or hard rate limits that assume uniform workloads to achieve high utilization. Pisces achieves per-tenant weighted fair shares (or minimal rates) of the aggregate resources of the shared service, even when different tenants’ partitions are co-located and when demand for different partitions is skewed, time-varying, or bottlenecked by different server resources. Pisces does so by decomposing the fair sharing problem into a combination of four complementary mechanisms—partition placement, weight allocation, replica selection, and weighted fair queuing—that operate on different time-scales and combine to provide system-wide max-min fairness.

An evaluation of our Pisces storage prototype achieves nearly ideal (0.99 Min-Max Ratio) weighted fair sharing, strong performance isolation, and robustness to skew and shifts in tenant demand. These properties are achieved with minimal overhead (<3%), even when running at high utilization (more than 400,000 requests/second/server for 10B requests).

Available Media