Synchronization Primitives

Several synchronization primitives have been introduced to aid in multithreading the kernel. These primitives are implemented by atomic operations and use appropriate memory barriers so that users of these primitives do not have to worry about doing it themselves. The primitives are very similar to those used in other operating systems including mutexes, condition variables, shared/exclusive locks, and semaphores.

Mutexes

The mutex primitive provides mutual exclusion for one or more data objects. Two versions of the mutex primitive are provided: spin mutexes and sleep mutexes.

Spin mutexes are a simple spin lock. If the lock is held by another thread when a thread tries to acquire it, the second thread will spin waiting for the lock to be released. Due to this spinning nature, a context switch cannot be performed while holding a spin mutex to avoid deadlocking in the case of a thread owning a spin lock not being executed on a CPU and all other CPUs spinning on that lock. An exception to this is the scheduler lock, which must be held during a context switch. As a special case, the ownership of the scheduler lock is passed from the thread being switched out to the thread being switched in to satisfy this requirement while still protecting the scheduler data structures. Since the bottom half code that schedules threaded interrupts and runs non-threaded interrupt handlers also uses spin mutexes, spin mutexes must disable interrupts while they are held to prevent bottom half code from deadlocking against the top half code it is interrupting on the current CPU. Disabling interrupts while holding a spin lock has the unfortunate side effect of increasing interrupt latency.

To work around this, a second mutex primitive is provided that performs a context switch when a thread blocks on a mutex. This second type of mutex is dubbed a sleep mutex. Since a thread that contests on a sleep mutex blocks instead of spinning, it is not susceptible to the first type of deadlock with spin locks. Sleep mutexes cannot be used in bottom half code, so they do not need to disable interrupts while they are held to avoid the second type of deadlock with spin locks.

As with Solaris, when a thread blocks on a sleep mutex, it propagates its priority to the lock owner. Therefore, if a thread blocks on a sleep mutex and its priority is higher than the thread that currently owns the sleep mutex, the current owner will inherit the priority of the first thread. If the owner of the sleep mutex is blocked on another mutex, then the entire chain of threads will be traversed bumping the priority of any threads if needed until a runnable thread is found. This is to deal with the problem of priority inversion where a lower priority thread blocks a higher priority thread. By bumping the priority of the lower priority thread until it releases the lock the higher priority thread is blocked on, the kernel guarantees that the higher priority thread will get to run as soon as its priority allows.

These two types of mutexes are similar to the Solaris spin and adaptive mutexes. One difference from the Solaris API is that acquiring and releasing a spin mutex uses different functions than acquiring and releasing a sleep mutex. A difference with the Solaris implementation is that sleep mutexes are not adaptive. Details of the Solaris mutex API and implementation can be found in section 3.5 of [Mauro01].

Condition Variables

Condition variables provide a logical abstraction for blocking a thread while waiting for a condition. Condition variables do not contain the actual condition to test, instead, one locks the appropriate mutex, tests the condition, and then blocks on the condition variable if the condition is not true. To prevent lost wakeups, the mutex is passed in as an interlock when waiting on a condition.

FreeBSD's condition variables use an API quite similar to those provided in Solaris. The only differences being the lack of a cv_wait_sig_swap and the addition of cv_init and cv_destroy constructors and destructors. The implementation also differs from Solaris in that the sleep queue is embedded in the condition variable itself instead of coming from the hashed pool of sleep queue's used by sleep and wakeup.

Shared/Exclusive Locks

Shared/Exclusive locks, also known as sx locks, provide simple reader/writer locks. As the name suggests, multiple threads may hold a shared lock simultaneously, but only one thread may hold an exclusive lock. Also, if one thread holds an exclusive lock, no threads may hold a shared lock.

FreeBSD's sx locks have some limitations not present in other reader/writer lock implementations. First, a thread may not recursively acquire an exclusive lock. Secondly, sx locks do not implement any sort of priority propagation. Finally, although upgrades and downgrades of locks are implemented, they may not block. Instead, if an upgrade cannot succeed, it returns failure, and the programmer is required to explicitly drop its shared lock and acquire an exclusive lock. This design was intentional to prevent programmers from making false assumptions about a blocking upgrade function. Specifically, a blocking upgrade must potentially release its shared lock. Also, another thread may obtain an exclusive lock before a thread trying to perform an upgrade. For example, if two threads are performing an upgrade on a lock at the same time.

Semaphores

FreeBSD's semaphores are simple counting semaphores that use an API similar to that of POSIX.4 semaphores [Gallmeister95]. Since sema_wait and sema_timedwait can potentially block, mutexes must not be held when these functions are called.

Next: Guidelines Up: Locking in the Multithreaded Previous: Basic Tools