USENIX - USENIX'97 Conference Summaries

Building Distributed Process Management on an Object-Oriented Framework

By Ken Shirriff, Sun Microsystems Laboratories

Summary by Gordon Galligher

"Solaris MC" is a research project to attempt to take a single operating system image and cluster it across a number of nodes. The main design goal was to have a single OS image, not a number of nodes each running its own OS. Ken later pointed out that each node really did have its own "minikernel" running. This was done to allow the entire cluster to survive the crash of any number of its component nodes.

The components were created using C++ language using the Interface Definition Language specification and the CORBA object model, which has a better RPC support. The distributed framework itself looks just like a C++ invocation, but might be running on a different physical node. The implementation is also completely independent of the interconnection of the component nodes (i.e., it could be 100Base-T, FDDI, etc.). A specialized filesystem object provides the "location independence" function for the global filesystem, and there are also hooks in the kernel to provide a global /proc filesystem.

A major design goal was to provide "global uniqueness" to the entire operating system environment. Having a single "system image" format dramatically decreases the time it takes to port various applications in that they do not need to concentrate on the clustering in order to have it take advantage of the parallelism possibilities. Multiple parts of the Solaris operating system had to be changed to provide this single image, including process creation (where does it now live); process, group, and session IDs for parts on each machine; signal handling across the platforms; and support for a global /proc filesystem.

The existing kernel process management routines in Solaris were leveraged, and a minimalist approach was taken to globalize the view. A global virtual process manager had to be created to support things such as the fork() and wait() calls across the hardware. The new routines would call this virtual manager instead of their normal local routines to create new processes. Finding out upon which node a particular process was running was handled by having the top bits of the process ID set to signify the clustered node, and the signal processing was handled by forwarding it to the appropriate node.

The implementation of this new environment did not dramatically improve the performance of various programs, but the researchers are already working on improving performance. The overhead added by the virtual process manager caused processes to run more slowly even when creating the processes on the local system and significantly more slowly when creating on a remote node. Building the clustering via a C++ object methodology made it easy to add new modules when needed because it provided built-in reference counting to know when a particular object was no longer running somewhere in the cluster. This also had its drawbacks, because it was found early on that exception handling was an expensive operation and somewhat dangerous across the cluster. There were also some efficiency problems when the class libraries ended up being too deep.

Overall, Sun Labs was happy with its research project and has learned quite a bit about the clustering possibilities of Sun hardware. A URL was given at the conference for more information on the Solaris MC implementation, but there was a different URL mentioned in the conference proceedings. You can try both of them and hopefully one will work: https://www.sunlabs.com/sunmc-proj/process.html https://www.sunlabs.com/research/solaris-mc/process.html

Originally published in ;login: Vol. 22, No.2, April 1997.

webster@usenix.org
Last changed: May 28, 1997 pc