Check out the new USENIX Web site.

Home About USENIX Events Membership Publications Students
USENIX Technical Program - Abstract - Windows NT Symposium 99

Efficient User-Level Thread Migration and Checkpointing on Windows NT Clusters

Hazim Abdel-Shafi, Evan Speight, and John K. Bennett, Department of Electrical and Computer Engineering, Rice University


Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-scale parallel computing. However, these systems have several disadvantages over large-scale multiprocessors, including complex thread scheduling and increased susceptibility to failure. This paper describes the design and implementation of two user-level mechanisms in the Brazos parallel programming environment that address these issues on clusters of multiprocessors running Windows NT: thread migration and checkpointing. These mechanisms offer several benefits: (1) The ability to tolerate the failure of multiple computing nodes with minimal runtime overhead and short recovery time. (2) The ability to add and remove computing nodes while applications continue to run, simplifying scheduled maintenance operations and facilitating load balancing. (3) The ability to tolerate power failures by performing a checkpoint before shutdown or by migrating computation threads to other stable nodes. Brazos is a distributed system that supports both shared memory and message passing parallel programming paradigms on networks of Intel x86-based multiprocessors running Windows NT. The performance of thread migration in Brazos is an order of magnitude faster than previously reported Windows NT implementations, and is competitive with implementations on other operating systems. The checkpoint facility exhibits low runtime overhead and fast recovery time.
  • View the full text of this paper in HTML form and PDF form.

  • If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.

  • To become a USENIX Member, please see our Membership Information.

?Need help? Use our Contacts page.

Last changed: 27 Feb 2002 ml
Technical Program
Conference index