UML clustering

Next: UML as a userspace Up: Applications Previous: Applications

UML clustering

It is not necessary for a virtual machine to be confined to running on a single host machine. Since UML physical memory is virtual from the perspective of the host, it can be mapped into and unmapped from the UML address space. This means that it's possible to partition the physical memory of a single UML instance across a number of hosts. Memory that's present on one host would be unmapped from the others. When a virtual processor accesses memory that's not present on its host, a new low-level fault handler would request the page from the host which currently owns it. It would be unmapped from that host, copied across the net, and mapped on the host that needs to access it.

This would create an SMP virtual machine instance running on multiple hosts with at least one virtual processor on each host. Since this virtual machine has access to the combined resources of the hosts, this is effectively a single system image (SSI) cluster.

This would be useful to experiment with and fun to play with, but it would be so slow that it would be unlikely to find any kind of production use. The reason is that some kernel data structures are accessed so often by all the processors on the system that this cluster would spend all of its time copying the pages containing this data from node to node. Some data structures, spinlocks in particular, are accessed in such a pathological way that this cluster would come to a halt whenever two processors on different hosts tried to access them simultaneously.

The root cause of these problems is that this is an extreme form of Non-Uniform Memory Access (NUMA). Normal NUMA machines have a number of nodes, each containing one or more processors, with their own local memory. This local memory can be accessed quickly by the processors in that node. Accesses to the local memory of a different node is comparatively very expensive. In addition, there is some global memory which is equally accessible by all the node. Access to global memory is slower than access to a node's local memory and faster than access to a different node's local memory.

This UML cluster has no global memory, only local memory. In addition, access to a different node's local memory is particularly slow. The performance problems of this type of cluster will at least be alleviated by NUMA support in the generic kernel. This will partition some of the kernel's data between the machine's nodes so that non-local accesses are infrequent, and the only inter-node accesses are from a slow, background load balancing process.

This existence of this type of cluster will effectively put NUMA hardware in the hands of a large number of people who otherwise would have no access whatsoever to it. I hope and expect that this will attract more people to the effort of adding good NUMA support to the kernel.

However, this sort of shared-memory cluster may be so extreme a type of NUMA that good support in Linux may not be enough to get good performance. In this case, some sort of RPC interfaces will be needed so that the nodes can cooperate without needing to fault entire pages back and forth. The pure shared memory cluster would make a good starting point for that effort. For all its performance problems, it would work, so the RPC could be added incrementally, with a working cluster available at each stage. Debugging and performance analysis would be possible at all points of this process.

This work would also largely be applicable to native kernels running on physical machines. So, this process would probably speed the development of a more traditional clustering system for Linux, where the nodes are physical machines rather than virtual ones. However, the genesis of this system as a virtual cluster would leave its mark. Once Linux has physical clustering, there would be no requirement that a cluster's nodes be physical machines. A cluster could consist of a combination of physical machines and virtual nodes.

This creates the possibility of personal clusters, where an individual could cluster a number of personal machines, such as desktop and laptop machines, with virtual machines running on larger central servers. This would provide access to the hardware of the personal machines and the CPU horsepower of the servers. The virtual nodes could have access to the servers' hardware, or they could have access to nothing but virtual devices. So, these clusters could be used for everything from a sysadmin cluster, which provides centralized access to all of the servers and their hardware, to a user cluster, which provides access to personal hardware and no access to server hardware.

Next: UML as a userspace Up: Applications Previous: Applications

Jeff Dike 2001-09-14