Check out the new USENIX Web site.
HotOS XIII Banner

Register Now!    WORKSHOP PROGRAM ABSTRACTS

Tech Sessions: Monday, May 9 | Tuesday, May 10 | Wednesday, May 11
Monday, May 9, 2011
9:15 a.m.–10:00 a.m.

Mind the Gap: Reconnecting Architecture and OS Research
Back to Program
The goal of a computer system is to run an application workload securely, reliably, efficiently, and fast. A computer's hardware architecture and operating system exist to support this goal, and it would be nice if they cooperated as effectively as possible. Yet there is a growing gap between architectural research and OS research, which seems to be the result of poor communication about what actually matters. In this paper, we discuss this gap and what to do about it. We illustrate the opportunities for closing the gap using examples from some recent OS research.

Operating System Implications of Fast, Cheap, Non-Volatile Memory
Back to Program
The existence of two basic levels of storage (fast/volatile and slow/non-volatile) has been a long-standing premise of most computer systems, influencing the design of OS components, including file systems, virtual memory, scheduling, execution models, and even their APIs. Emerging resistive memory technologies — such as phase-change memory (PCM) and memristors — have the potential to provide large, fast, non-volatile memory systems, changing the assumptions that motivated the design of current operating systems. This paper examines the implications of non-volatile memories on a number of OS mechanisms, functions, and properties.

Virtually Cool Ternary Content Addressable Memory
Back to Program
Fast content addressable data access mechanisms have compelling applications in today's systems. Many of these exploit the powerful wildcard matching capabilities provided by ternary content addressable memories. For example, TCAM based implementations of important algorithms in data mining been developed in recent years; these achieve an an order of magnitude speedup over prevalent techniques. However, large hardware TCAMs are still prohibitively expensive in terms of power consumption and cost per bit. This has been a barrier to extending their exploitation beyond niche and special purpose systems. We propose an approach to overcome this barrier by extending the traditional virtual memory hierarchy to scale up the user visible capacity of TCAMs while mitigating the power consumption overhead. By exploiting the notion of content locality (as opposed to spatial locality), we devise a novel combination of software and hardware techniques to provide an abstraction of a large virtual ternary content addressable space. In the long run, such abstractions enable applications to disassociate considerations of spatial locality and contiguity from the way data is referenced. If successful, ideas for making content addressability a first class abstraction in computing systems can open up a radical shift in the way applications are optimized for memory locality, just as storage class memories are soon expected to shift away from the way in which applications are typically optimized for disk access locality.

10:30 a.m.–11:15 a.m.

The Best of Both Worlds with On-Demand Virtualization
Back to Program
Virtualization offers many benefits such as live migration, resource consolidation, and checkpointing. However, there are many cases where the overhead of virtualization is too high to justify for its merits. Most desktop and laptop PCs and many large web properties run natively because the benefits of virtualization are too small compared to the overhead. We propose a new middle ground: on-demand virtualization, in which systems run natively when they need native performance or features but can be converted on-the-fly to run virtually when necessary. This enables the user or system administrator to leverage the most useful features of virtualization, such as or checkpointing or consolidating workloads, during off-peak hours without paying the overhead during peak usage. We have developed a prototype of on-demand virtualization in Linux using the hibernate feature to transfer state between native execution and virtual execution, and find even networking applications can survive being virtualized.

Repair from a Chair: Computer Repair as an Untrusted Cloud Service
Back to Program
Today, computer repair resembles television repair: the customer brings the computer into the shop or calls a technician (or family member) to request a visit. Yet, as indicated by a survey that we conducted, a large majority of computer repair issues involve only software. Moreover, computers are increasingly virtual machines — which could be repaired anywhere. These observations lead us to the following vision, which we outline in this paper: let a customer ship a computer into the cloud and get software-based problems fixed asynchronously. We also outline the research needed to realize this vision. Broadly, we must protect the privacy and integrity of the customer's data from an untrusted repairer, and we must protect availability, allowing the customer to keep working during the repair.

Structuring the Unstructured Middle with Chunk Computing
Back to Program
Modern computing substrates like cloud computing clusters, massively multi-core processors, and general-purpose GPUs offer a wealth of computing power with the caveat that programmers must carefully structure their programs to fit within various hardware and communication limits in order to get high performance. Unfortunately, modern abstractions expressly hide machine structure from the program and program structure from the machine, hindering automatic optimization. We introduce an alternative computing abstraction—a linked graph of finite-sized memory chunks—that explicitly exposes the size and structure of programs to the operating system. The chunk graph enables the operating system to use size and structure to optimize how programs are mapped onto complex machine structure.

2:30 p.m.–3:00 p.m.

Macho: Programming with Man Pages
Back to Program
Despite years of work on programming languages, programming is still slow and error-prone. In this paper we describe Macho, a system which combines a natural language parser, a database of code, and an automated debugger to write simple programs from natural language and examples of their correct execution. Adding examples to natural language makes it easier for Macho to actually generate a correct program, because it can test its candidate solutions and fix simple errors. Macho is able to synthesize basic versions of six out of nine small core-utils from short natural language descriptions based on their man pages and sample runs.

Pursue Robust Indefinite Scalability
Back to Program
For research insights and development potential, we should explore computer architectures designed to scale indefinitely. Given physical limits, we argue an indefinitely scalable computer should or must (1) reveal to programmers its component spatial relationships, (2) forego unique addresses, and (3) operate asynchronously. Further, such a machine and its programming must be inherently robust against local failures and outages, and be operable during its own construction. We propose the indefinitely scalable Movable Feast Machine, which defers many architectural decisions to an execution model that associates processing, memory, and communications functions with movable bit patterns rather than fixed locations. We illustrate basic and novel computational elements such as self-healing wire, simple cell membranes for modularity, and robust stochastic sorting by movable self-replicating programs.

3:30 p.m.–4:15 p.m.

Benchmarking File System Benchmarking: It *IS* Rocket Science
Back to Program
The quality of file system benchmarking has not improved in over a decade of intense research spanning hundreds of publications. Researchers repeatedly use a wide range of poorly designed benchmarks, and in most cases, develop their own ad-hoc benchmarks. Our community lacks a definition of what we want to benchmark in a file system. We propose several dimensions of file system benchmarking and review the wide range of tools and techniques in widespread use. We experimentally show that even the simplest of benchmarks can be fragile, producing performance results spanning orders of magnitude. It is our hope that this paper will spur serious debate in our community, leading to action that can improve how we evaluate our file and storage systems.

Multicore OS Benchmarks: We Can Do Better
Back to Program
Current multicore OS benchmarks do not provide workloads that sufficiently reflect real-world use: they typically run a single application, whereas real workloads consist of multiple concurrent programs. In this paper we show that this lack of mixed workloads leads to benchmarks that do not fully exercise the OS and are therefore inadequate at predicting real-world behavior. This implies that effective multicore OS benchmarks must include mixed workloads, but the main design challenge is choosing an appropriate mix. We present a principled approach which treats benchmark design as an optimization problem. Our solution leads to a workload mix that uses as much of a system's resources as possible, while also selecting applications whose performance is most sensitive to the availability of those resources.

It's Time for Low Latency
Back to Program
The operating systems community has ignored network latency for too long. In the past, speed-of-light delays in wide area networks and unoptimized network hardware have made sub-100μs round-trip times impossible. However, in the next few years datacenters will be deployed with low-latency Ethernet. Without the burden of propagation delays in the datacenter campus and network delays in the Ethernet devices, it will be up to us to finish the job and see this benefit through to applications. We argue that OS researchers must lead the charge in rearchitecting systems to push the boundaries of low-latency datacenter communication. 5-10μs remote procedure calls are possible in the short term — two orders of magnitude better than today. In the long term, moving the network interface on to the CPU core will make 1μs times feasible.

Tuesday, May 10, 2011
9:30 a.m.–10:15 a.m.

Disk-Locality in Datacenter Computing Considered Irrelevant
Back to Program
Datacenter computing frameworks have been instrumental in driving modern Internet services. Disk locality for tasks has received considerable attention, posited on the assumptions that disk reads are (a) faster than the network and (b) dominate a task's lifetime. Recent trends in technology adoption as well as operation of datacenters weaken both these assumptions. Faster networks continue to get deployed while disks, whose speeds have plateaued, remain the primary storage medium. The ever-increasing demand for storage not only makes SSDs economically infeasible to deploy but also leads to compressed data storage. There is lesser data to read because of compression reducing the read duration. Consequently, we take the position that disk locality is going to be moot for datacenter jobs. Instead, we advocate focus on memory-locality—running tasks on the same machine that has its input cached in memory. The three orders of magnitude difference in capacity between disks and memory presents a new set of challenges: identifying data to keep in memory with appropriate replacement schemes, prefetching singly-accessed blocks and reducing dependency on an aggregated memory pool. Our initial stab at the problem shows promising results and we solicit further research on the above challenges.

Optimizing Data Partitioning for Data-Parallel Computing
Back to Program
Performance of data-parallel computing (e.g., MapReduce, DryadLINQ) heavily depends on its data partitions. Solutions implemented by the current state of the art systems are far from optimal. Techniques proposed by the database community to find optimal data partitions are not directly applicable when complex user-defined functions and data models are involved. We outline our solution, which draws expertise from various fields such as programming languages and optimization, and present our preliminary results.

Disks Are Like Snowflakes: No Two Are Alike
Back to Program
Gone are the days of homogeneous sets of disks. Even disks of a given batch, of the same make and model, will have significantly different bandwidths. This paper describes the disk technology trends responsible for the now-inherent heterogeneity of multi-disk systems and disk-based clusters, provides measurements quantifying it, and discusses its implications for system designers.

10:45 a.m.–11:30 a.m.

The Case for Power-Agile Computing
Back to Program
Battery-powered devices are trapped by trends. More powerful performance requires more power, and while battery technologies slowly improve users want more capable devices with longer battery lifetimes. A way to escape this trap leverages power-proportional hardware architectures that scale power consumption to perform when needed and draw little power when idle. Because most components are tuned to operate efficiently within a narrow power-performance range, we expect future power-proportional architectures to be heterogeneous, featuring multiple different processors, memory chips, storage devices and radios, each with different power-performance tradeoffs. Heterogeneity produces devices with fluid characteristics: phones that sprint like desktops and sleep like sensor nodes. This paper outlines the principles of power-agile computing. To begin, we design a heterogeneous power-proportional device to illustrate the size and diversity of the state space inherent to these architectures. Next, we present a scenario demonstrating our device responding to changes in demand. Using this scenario, we develop a set of challenges inherent to power-agile operation and discuss approaches to overcoming them.

Mobile Apps: It's Time to Move Up to CondOS
Back to Program
High-level context extracted from raw sensor data on the phone has significant, unrealized potential to improve the entire mobile experience: from apps to the OS. The idea that the OS ought to manage (even app-specific) context generation at first appears counter-intuitive. Yet we uncover several benefits that emerge naturally from this approach: OS services can themselves benefit from context consumption, OS-managed context is better for user privacy, and the OS is the best positioned to coordinate context generation energy saving strategies. We propose an initial context dataflow operating system, CondOS, to realize these benefits.

Free Lunch: Exploiting Renewable Energy for Computing
Back to Program
This paper argues for "Free Lunch", a computation architecture that exploits otherwise wasted renewable energy by (i) colocating datacentres with these remote energy sources, (ii) connecting them over a dedicated network, and (iii) providing a software framework that supports the seamless execution and migration of virtual machines in the platform according to power availability. This work motivates and outlines the architecture and demonstrates its viability with a case study. Additionally, we discuss the major technical challenges facing the successful deployment of Free Lunch and the limiting factors inherent in its design.

1:00 p.m.–1:45 p.m.

Debug Determinism: The Sweet Spot for Replay-Based Debugging
Back to Program
Deterministic replay tools offer a compelling approach to debugging hard-to-reproduce bugs. Recent work on relaxed-deterministic replay techniques shows that replay debugging with low in-production overhead is possible. However, despite considerable progress, a replay-debugging system that offers not only low in-production runtime overhead but also high debugging utility, remains out of reach. To this end, we argue that the research community should strive for debug determinism—a new determinism model premised on the idea that effective debugging entails reproducing the same failure and the same root cause as the original execution. We present ideas on how to achieve and quantify debug determinism and give preliminary evidence that a debug-deterministic system has potential to provide both low in-production overhead and high debugging utility.

Non-Deterministic Parallelism Considered Useful
Back to Program
The development of distributed execution engines (such as MapReduce and Dryad) has greatly simplified parallel programming, by shielding developers from the gory details of programming in a distributed system, and allowing them to focus on writing sequential code. The "sacred cow" in these systems is transparent fault tolerance, which is achieved by dividing the computation into atomic tasks that execute deterministically, and hence may be re-executed if a participant fails or some intermediate data are lost. In this paper, we explore the possibility of relaxing this requirement, on the premise that non-determinism is useful and sometimes essential to support many programs.

Finding Concurrency Errors in Sequential Code—OS-level, In-vivo Model Checking of Process Races
Back to Program
While thread races have drawn huge attention from the research community, little has been done for process races, where multiple—possibly sequential—processes access a shared resource, such as a file, without proper synchronization. We present a preliminary study of real process races and show that they are numerous, dangerous, and difficult to detect. To address this problem, we present the design of RACEPRO, an in-vivo model checking system for automatically detecting process races in deployed systems, along with preliminary results from a RACEPRO prototype. To the best of our knowledge, we are the first to study real process races, and RACEPRO is the first system to detect them.

1:45 p.m.–2:30 p.m.

Privacy Revelations for Web and Mobile Apps
Back to Program
Users of Web and mobile apps must often decide whether to give the apps access to personal information without knowing what they will do with it. We argue that users could better manage their privacy and privacy standards would rise if the operating system simply revealed to users how their apps spread personal information. However, for this strategy to be effective, the research community must go well beyond today's low-level monitoring techniques to develop predictive, user-facing descriptions of information exposure that are grounded in measurement and analysis.

Do You Know Where Your Data Are? Secure Data Capsules for Deployable Data Protection
Back to Program
We argue that data protection will never reach the masses unless it is focused on legacy applications and open systems. We define practical data protection and substantiate gaps in current theory and practice. We formulate a conceptual architecture based on encapsulation. A sensitive data object is kept in a secure data capsule cryptographically bundled with its data-use policy and provenance. Untrusted applications operate on capsules confined within a secure execution environment, which tracks sensitive information flows and enforces policy upon externalization from the application. Although this conceptual architecture reaches the practicality goal by definition, our exploration of the significant efficiency and assurance challenges leads to a number of research questions, and opportunities.

Making Programs Forget: Enforcing Lifetime for Sensitive Data
Back to Program
This paper introduces guaranteed data lifetime, a novel system property ensuring that sensitive data cannot be retrieved from a system beyond a specified time. The trivial way to achieve this is to "reboot"; however, this is disruptive from the user's perspective, and may not even eliminate disk copies. We discuss an alternate approach based on state re-incarnation where data expiry is completely transparent to the user, and can be used even if the system is not designed a priori to provide the property.

3:30 p.m.–4:15 p.m.

Exploiting MISD Performance Opportunities in Multi-core Systems
Back to Program
A number of important system services, particularly network system services, need strong scaling on multi-core systems, where a fixed workload is executed more quickly with increased core counts. Unfortunately, modern multiple-instruction/multiple-data (MIMD) approaches to multi-core OS design cannot exploit the fine-grained parallelism needed to provide such scaling. In this paper, we propose a replicated work approach to parallelizing network system services for multi-core systems based on a multiple-instruction/single-data (MISD) execution model. In particular, we discuss the advantages of this approach for scalable network system services and we compare past methods for addressing the challenges this approach presents. We also present preliminary results that examine the viability of the basic approach and the software abstractions needed to support it.

More Intervention Now!
Back to Program
Techniques for characterizing performance and diagnosing problems typically endeavor to minimize perturbation by measurements and data collection. We are making a call to do exactly the opposite. In order to characterize the behavior of a system and to perform root-cause analysis and answer what-if questions, we need to conduct active and systematic experiments on our systems, perhaps at the same time these systems are running. We argue that in distributed computing frameworks such as MapReduce, Dryad and Hadoop, the conditions are right for automatically conducting these experiments. At each stage there is a large number of nodes doing the same computation, hence providing a sound statistical population. Furthermore, we have the infrastructure in such systems to isolate and recreate the conditions of a run. In this paper we propose the missing piece: a blueprint of the causal interactions that can be used to plan these experiments and perform inferences about the results. Machine learning and statistical analysis give us the tools and algorithms for inducing such a causal blueprint from a combination of passive observations and active intervention.

make world
Back to Program
Specialisation of software to its circumstances is seldom pursued by developers because it's difficult, time-consuming and error-prone. We propose an architecture centred around pervasive optimisation: automated specialisation of programs written in low-level systems languages like C using the techniques of partial evaluation. We also present initial results of a prototype implementation.

Wednesday, May 11, 2011
9:00 a.m.–10:00 p.m.

What If You Could Actually Trust Your Kernel?
Back to Program
The advent of formally verified OS kernels means that for the first time we have a truly trustworthy foundation for systems. In this paper we explore the design space this opens up. The obvious applications are in security, although not all of them are quite as obvious, for example as they relate to TPMs. We further find that the kernel's dependability guarantees can be used to improve performance, for example in database systems. We think that this just scratches the surface, and that trustworthy kernels will stimulate further research.

Provable Security: How Feasible Is It?
Back to Program
Strong, machine-checked security proofs of operating systems have been in the too hard basket long enough. They will still be too hard for large mainstream operating systems, but even for systems designed from the ground up for security they have been counted as infeasible. There are high-level formal models, nice security properties, ways of architecting and engineering secure systems, but no implementation level proofs yet, not even with the recent verification of the seL4 microkernel. This needs to change.

Toward Practical and Unconditional Verification of Remote Computations
Back to Program
This paper revisits a classic question: how can a machine specify a computation to another one and then, without executing the computation, check that the other machine carried it out correctly? We want a scheme that (a) has practical performance, (b) is simple to implement, and (c) provides unconditional guarantees. Of these goals, (a) and (b) contrast with the way this question is handled in the theory literature and (c) contrasts with current systems approaches. Our approach is to revisit the theory of interactive proofs and probabilistically checkable proofs (PCPs) and attempt to reduce it to practice. We do so by identifying a strand of theory that can lead to a practical solution and then applying several refinements that reduce the costs by over 12 orders of magnitude. Our resulting prototype meets goals (a)-(c) but only over a limited domain. In this partial success, the prototype provides a concrete foundation for our position: that PCP-based verifiable computation can be a systems problem, not just a theory problem. In its shortcomings, the prototype illustrates how much research remains, a gap that we hope to close with the help of the community's expertise.

MOMMIE Knows Best: Systematic Optimizations for Verifiable Distributed Algorithms
Back to Program
Distributed systems based on complex distributed algorithms are difficult to design, specify and prove correct, but also difficult to implement and deploy in a real environment. To match expected operating conditions, algorithm designers and system implementers must make tweaks to the original, conceptual algorithm. Those tweaks are typically optimizations for the targeted deployment environment and workload. The result of such complex tweaks is the addition of great algorithmic and implementational complexity to what may otherwise have been a simple distributed algorithm. In this work, we argue for an alternative approach that enables the independent development of algorithmic logic and implementational optimizations. Formal, rigorous proofs can be constructed in the former, which are maintained when the algorithm is composed with runtime optimizations into a performant executable. We present initial thoughts in the design of such a system.

10:30 a.m.–11:15 a.m.

The Case for VOS: The Vector Operating System
Back to Program
Operating systems research for many-core systems has recently focused its efforts on supporting the scalability of OS-intensive applications running on increasingly parallel hardware. Lost amidst the march towards this parallel future is efficiency: Perfectly parallel software may saturate the parallel capabilities of the host system, but in doing so can waste hardware resources. This paper describes our motivation for the Vector OS, a design inspired by vector processing systems that provides efficient parallelism. The Vector OS organizes and executes requests for operating system resources through "vector" interfaces that operate on vectors of objects. We argue that these interfaces allow the OS to capitalize on numerous chances to both eliminate redundant work found in OS-intensive systems and use the underlying parallel hardware to its full capability, opportunities that are missed by existing operating systems.

Operating Systems Must Support GPU Abstractions
Back to Program
This paper argues that lack of OS support for GPU abstractions fundamentally limits the usability of GPUs in many application domains. OSes offer abstractions for most common resources such as CPUs, input devices, and file systems. In contrast, OSes currently hide GPUs behind an awkward ioctl interface, shifting the burden for abstractions onto user libraries and run-times. Consequently, OSes cannot provide system-wide guarantees such as fairness and isolation for GPUs, and developers must sacrifice modularity and performance when composing systems that integrate GPUs along with other OS-managed resources. We propose new kernel abstractions to support GPUs and other accelerator devices as first class computing resources.

Multicore OSes: Looking Forward from 1991, er, 2011
Back to Program
Upcoming multicore processors, with hundreds of cores or more in a single chip, require a degree of parallel scalability that is not currently available in today's system software. Based on prior experience in the super-computing sector, the likely trend for multicore processors is away from shared memory and toward shared-nothing architectures based on message passing. In light of this, the lightweight messages and channels programming model, found among other places in Erlang, is likely the best way forward. This paper discusses what adopting this model entails, describes the architecture of an OS based on this model, and outlines a few likely implementation challenges.

?Need help? Use our Contacts page.

Last changed: 18 April 2011 jel