Skip to main content
Back to USENIX
  • Conferences
  • Students
Sign in

USENIX Conference Policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Efficient User-Level Thread Migration and Checkpointing on Windows NT Clusters

Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-scale parallel computing. However, these systems have several disadvantages over large-scale multiprocessors, including complex thread scheduling and increased susceptibility to failure. This paper describes the design and implementation of two user-level mechanisms in the Brazos parallel programming environment that address these issues on clusters of multiprocessors running Windows NT: thread migration and checkpointing. These mechanisms offer several benefits: (1) The ability to tolerate the failure of multiple computing nodes with minimal runtime overhead and short recovery time. (2) The ability to add and remove computing nodes while applications continue to run, simplifying scheduled maintenance operations and facilitating load balancing. (3) The ability to tolerate power failures by performing a checkpoint before shutdown or by migrating computation threads to other stable nodes. Brazos is a distributed system that supports both shared memory and message passing parallel programming paradigms on networks of Intel x86-based multiprocessors running Windows NT. The performance of thread migration in Brazos is an order of magnitude faster than previously reported Windows NT implementations, and is competitive with implementations on other operating systems. The checkpoint facility exhibits low runtime overhead and fast recovery time.

Hazim Abdel-Shafi, Rice University

Evan Speight, Rice University

John K. Bennett, Rice University

BibTeX
@inproceedings {271616,
author = {Hazim Abdel-Shafi and Evan Speight and John K. Bennett},
title = {Efficient {User-Level} Thread Migration and Checkpointing on Windows {NT} Clusters},
booktitle = {Windows NT 3rd Symposium (Windows NT 3rd Symposium)},
year = {1999},
address = {Seattle, WA},
url = {https://www.usenix.org/conference/windows-nt-3rd-symposium/efficient-user-level-thread-migration-and-checkpointing-windows},
publisher = {USENIX Association},
month = jul
}
Download

Links

Paper: 
http://www.usenix.org/events/usenix-nt99/full_papers/abdel-shafi/abdel-shafi.pdf
Paper (HTML): 
http://www.usenix.org/events/usenix-nt99/full_papers/abdel-shafi/abdel-shafi_html/index.html
  • Log in or register to post comments

© USENIX
EIN 13-3055038

  • Privacy Policy
  • Contact Us