Check out the new USENIX Web site. next up previous
Next: Glossary Up: Background Previous: Background


Figure 1: Lock Protecting Deletion and Search
\epsfxsize =3in

On SMP systems, any searching of or deletion from a linked list must be protected by a lock. When element B is deleted from the list shown in Figure 1, searching code is guaranteed to see this list in either the initial state (1) or the final state (3). In state (2), when element B is being deleted, the reader-writer lock guarantees that no readers (indicated by the triangles) will be accessing the list.

Figure 2: Race Between Deletion and Search
\epsfxsize =3in

However, many lists are searched much more often than they are modified. For example, an IP routing table would normally change at most once per few minutes, but might be searched many thousands of times per second. This could result in well over a million accesses per update, making lock-acquisition overhead burdensome to searches.

Unfortunately, omitting locking when searching means that the update no longer appears to be atomic. Instead, the update takes the multiple steps shown in Figure 2. A search might be referencing element B just as it was freed up, resulting in crashes, or worse, as indicated by the reader referencing nothingness in step (3).

Figure 3: RCU Protecting Deletion and Search
\epsfxsize =3in

One solution to this problem is to delay freeing up element B until all searches have given up their references to it, as shown in Figure 3. RCU indirectly determines when all references have been given up. To see how this works, recall that there are normally restrictions on what operations may be performed while holding a lock. For example, in the Linux kernel, it is forbidden to do a context switch while holding any spinlock. RCU mandates these same restrictions: even though the RCU-protected search need not acquire any locks, it is forbidden from performing any operation that would be forbidden if it were in fact holding a lock.

Therefore, any CPU that is seen performing a context switch after the linked-list deletion shown in step (2) of Figure 3 cannot possibly hold a reference to element B. As soon as all CPUs have performed a context switch, there can no longer be any readers, as shown in step (3). Element B may then be safely freed, as shown in step (4).

A simple, though inefficient, RCU-based deletion algorithm could perform the following steps in a non-preemptive Linux kernel (preemptive kernels can be handled as well [McK02a]):

  1. Unlink element B from the list, but do not free it. The state of the list will be that shown in step (2) of Figure 3.
  2. Run on each CPU in turn. At this point, each CPU has performed one context switch after element B has been unlinked. Thus, there cannot be any more references to element B.
  3. Free up element B.

Much more efficient implementations have been described elsewhere [McK98a,McK02a]. The following sections present the concepts underlying RCU.

next up previous
Next: Glossary Up: Background Previous: Background
Paul McKenney 2003-03-28