Check out the new USENIX Web site.

next up previous
Next: Acknowledgments Up: Adaptive and Reliable Previous: Related work

Conclusion

The widespread use of NOWs for parallel computation requires a software infrastructure that allows programmers to code in a high-level language that abstracts away the complexity of protocols, scheduling, and resource management. Cilk and Cilk-NOW are part of this developing software infrastructure. In this paper, we have shown how Cilk-NOW's end-to-end design leverages structure in the Cilk programming model to implement adaptive parallelism and fault tolerance simply and efficiently. All overheads are amortized against work-stealing operations, and the number of steals grows with the critical path and not with the work. This result is only possible because Cilk-NOW incorporates Cilk-specific policies at all levels of its design.

The Cilk-NOW runtime system, as described in this paper and as currently implemented, supports the Cilk-2 language which is essentially functional in that it does not have support for a global address space or parallel I/O. More recent incarnations of Cilk for MPPs and SMPs have support for a global address space using "dag-consistent" distributed shared memory [7], and we are currently working on extensions for parallel I/O. With these additions to Cilk, preserving Cilk-NOW's adaptive and fault tolerant execution model remains a challenging open problem. We are currently working on this problem. The dag-consistency model was conceived with adaptive parallelism and fault tolerance in mind, and we are investigating the idea of coupling our current return transactions mechanism with a causal message-logging mechanism [1].

In other current research, we are investigating distributed macroscheduling algorithms. The goal of such an algorithm is to assign idle workstations to Cilk jobs so that each job gets a "fair" share and without requiring that users explicitly state their application's resource needs. It turns out that the parallelism of a Cilk job can be determined continuously and automatically by monitoring the steal rate. We are examining a macroscheduling algorithm in which this information is used in randomized pairwise interactions among processors. The idea is that periodically (and asynchronously) each processor picks another processor in the network at random, and if the two processors are working on different jobs, then one processor may switch to the job of the other. The switching decision is randomized and based only on information about the parallelism and size of the two jobs involved. Early simulation results indicate that such a scheme is very effective [30], and we are currently working on analysis and implementation.

More information about Cilk, including papers, documentation, and software releases, but not including Cilk-NOW software, can be found on the World-Wide Web at https://theory.lcs.mit.edu/~cilk.


next up previous
Next: Acknowledgments Up: Adaptive and Reliable Previous: Related work

Robert D. Blumofe
Wed Nov 13 10:25:03 CST 1996