Check out the new USENIX Web site. next up previous
Next: Changes during the last Up: Overview Previous: Functional overview

Design and implementation overview

The other virtual machine technologies available on the PC platform, VMWare[6] and Plex86[4][5], create a platform (the emulated machine) which is authentic enough to boot an existing operating system. In contrast, UML takes Linux itself as the platform and modifies the Linux kernel to run on it. Since the usual method of making Linux run on a new platform is to port it, UML does exactly that, and ports the Linux kernel to its own system call interface.

The task of porting Linux to itself amounted to finding ways of virtualizing all of the required hardware capabilities in terms of Linux system calls. The most important of these is a distinction between a privileged kernel mode and an unprivileged user mode. A native kernel running on hardware must have a privileged mode in which only trusted code (i.e. the kernel) can run so that it can reliably and securely arbitrate access to the hardware. UML must have an equivalent distinction which allows the kernel to have access to the host's system calls while its processes must request that access by calling into the system. This distinction is implemented using the Linux system call interception mechanism provided by ptrace.

A special thread is used to intercept the system calls made by all the UML process threads. This tracing thread annuls these system calls in the host and redirects the processes into the kernel, where tracing is turned off. Thus, while in user mode, processes have their system calls intercepted and virtualized, in kernel mode, they are released from tracing and their system calls run directly in the host kernel. This is the exact analog to the privileged access to hardware enjoyed by the kernel on a physical machine.

Virtual memory posed the next most significant challenge. The Linux kernel expects the hardware to provide access to a pool of physical memory which may be allocated for kernel data structures or allocated for virtual memory and mapped arbitrarily in either a process or the kernel virtual memory area. This is done by creating a file on the host which is the size of the physical memory declared on the UML command line. This file is mapped as a contiguous block into a region of the UML process address space set aside as ``physical memory''. The pages of memory in this region are released to the kernel memory allocator which is then able to allocate them to whatever subsystems need it. When a page is mapped into virtual memory, the low-level mapping code maps that page from the underlying file into the appropriate spot in the virtual address space. Thus, each page is mapped into the physical memory region, and arbitrarily many times into process or kernel virtual address spaces.

With this mechanism in place, providing multiple, independent, mutually inaccessible address spaces is straightforward. Each UML process has a separate address space on the host, so a context switch from one process to another automatically causes an address space switch. However, this is complicated by the fact that a process address space may be modified while it's out of context. The system may swap out some of its memory. Therefore, when a process comes back into context, the state of its address space may need to be updated since it may still map pages which have been freed and reallocated for some other purpose. So, a scan of the process page tables occur on a context switch, during which any pages which have had their protections or mappings change are updated to reflect the current state of the address space.

Those were the most significant challenges involved in the port. With those solved, the rest of the port is comparatively simple, with the remaining required mechanisms having obvious implementations in terms of Linux system calls.

Hardware faults are implemented in terms of Linux system calls - I/O device interrupts are provided by SIGIO, page faults by SIGSEGV, and timer interrupts by SIGALRM and SIGVTALRM. A normal Linux signal handler layer receives all signals, determines what the cause is, and calls into the kernel appropriately, either by calling the IRQ system for device interrupts, the page fault handler for SIGSEGV, or the signal subsystem for signals, such as SIGILL, SIGBUS, and SIGTRAP, that are simply passed along to the process.

Delivering signals to processes is apparently simple, but there are a number of ways of fouling up the implementation. In general, this is done by constructing a special frame on the process stack which contains the process register state at the time of the signal, some other process context, and a procedure context which will cause the process to call its handler for that signal when it returns to userspace. UML makes the host kernel construct the signal frame by sending a signal to the process, which is handled by a UML handler. This handler then invokes the process signal handler. The cleanup after the handler returns is triggered by the interception of the host's sigreturn. The old process state is restored and the process returns to userspace at the point at which it received the signal.


next up previous
Next: Changes during the last Up: Overview Previous: Functional overview
Jeff Dike 2001-09-14