Check out the new USENIX Web site. next up previous
Next: RCU Implementation Up: Background Previous: RCU API


Reader-Writer-Locking/RCU Analogy

Although RCU has been used in a great many interesting and surprising ways, one of the most straightforward is as a replacement for reader-writer locking. In this section, we present this analogy, protecting the simple doubly-linked-list data structure shown in Figure 6 with reader-writer locks and then with RCU, comparing the results. This structure has a struct list_head that is manipulated by the standard Linux list-manipulation primitives, a search key, and a single integer for data. The RCU primitive call_rcu() requires some space in the data element, which is supplied by the my_rcu_head field.

Figure 6: List Element Data Structure
\begin{figure}{\tt\scriptsize\begin{verbatim}1 struct el {
2 struct list_hea...
...
4 long data;
5 struct rcu_head my_rcu_head;
6 };\end{verbatim}}
\end{figure}

The reader-writer-lock/RCU analogy substitutes primitives as shown in Table 1. The asterisked primitives are no-ops in non-preemptible kernels; in preemptible kernels, they suppress preemption, which is an extremely cheap operation on the local task structure. Note that since neither rcu_read_lock() nor rcu_read_unlock block irq or softirq contexts, it is necessary to add primitives for this purpose where needed. For example, read_lock_irqsave must become rcu_read_lock() followed by local_irq_save().


Table 1: Reader-Writer-Lock/RCU Substitutions
Reader-Writer Lock Read-Copy Update
rwlock_t spinlock_t
read_lock() rcu_read_lock() $*$
read_unlock() rcu_read_unlock() $*$
write_lock() spin_lock()
write_unlock() spin_unlock()
list_add() list_add_rcu()
list_add_tail() list_add_tail_rcu()
list_del() list_del_rcu()
list_for_each() list_for_each_rcu()
$*$ no-op unless CONFIG_PREEMPT, in which case
    preemption is suppressed


Deletion from the list is illustrated by the upper section of Table 2. The write_lock() and write_unlock() are replaced by spin_lock() and spin_unlock(), respectively, the list_del() is replaced by list_del_rcu(), and my_free() is replaced by call_rcu(). The call_rcu() will, after a grace period elapses, pass p to function my_free(), using the struct rcu_head in the my_rcu_head field to keep track of the deferred function call and argument.


Table 2: Reader-Writer-Lock and Read-Copy-Update Analogy
Reader-Writer Lock Read-Copy Update
   
 1 void delete(long mykey)
 2 {
 3    struct el *p;
 4    write_lock(&list_lock);
 5    p = search(mykey);
 6    if (p != NULL) {
 7       list_del(p);
 8    }
 9    write_unlock(&list_lock);
10    my_free(p); 
11 }
 1 void delete(long mykey)
 2 {
 3    struct el *p;
 4    spin_lock(&list_lock);
 5    p = search(mykey);
 6    if (p != NULL) {
 7       list_del_rcu(p);
 8    }
 9    spin_unlock(&list_lock);
10    call_rcu(&p->rcuhead,
11             (void (*)(void *))my_free, p); 
12 }
  1 void insert(long key, long data)
  2 {
  3     struct el *p;
  4     p = kmalloc(sizeof(*p), GPF_ATOMIC);
  5     p->key = key;
  6     p->data = data;
  7     write_lock(&list_lock);
  8     list_add_tail(&(p->list), &head);
  9     write_unlock(&list_lock);
 10 }
  1 void insert(long key, long data)
  2 {
  3     struct el *p;
  4     p = kmalloc(sizeof(*p), GPF_ATOMIC);
  5     p->key = key;
  6     p->data = data;
  7     spin_lock(&list_lock);
  8     list_add_tail_rcu(&(p->list), &head);
  9     spin_unlock(&list_lock);
 10 }
  1 struct el *search(long mykey)
  2 {
  3     struct el *p;
  4     list_for_each(p, &head) {
  5        if (p->key == mykey) {
  6           return (p);
  7        }
  8     }
  9     return (NULL);
 10 }
  1 struct el *search(long mykey)
  2 {
  3     struct el *p;
  4     list_for_each_rcu(p, &head) {
  5        if (p->key == mykey) {
  6           return (p);
  7        }
  8     }
  9     return (NULL);
 10 }
  1 /* Read-only search */
  2 struct el *p;
  3 read_lock(&list_lock);
  4 p = search(mykey);
  5 if (p == NULL) {
  6    /* handle error condition */
  7 } else {
  8    /* access *p w/out modifying */
  9 }
 10 read_unlock(&list_lock);
  1 /* Read-only search */
  2 struct el *p;
  3 rcu_read_lock(); /* nop unless CONFIG_PREEMPT */
  4 p = search(mykey);
  5 if (p == NULL) {
  6    /* handle error condition */
  7 } else {
  8    /* access *p w/out modifying */
  9 }
 10 rcu_read_unlock(); /* nop unless CONFIG_PREEMPT */


Insertion into the list is illustrated by the second section of Table 2. Again, the write_lock() and write_unlock() are replaced by the simple spinlock primitives spin_lock() and spin_unlock(), respectively. The list_add_tail() is replaced by list_add_tail_rcu(). The rest of the code remains the same.

Searching the list is illustrated by the third section of Table 2. Locking is handled by the caller, so the two variants differ only in that the list_for_each() is replaced by list_for_each_rcu().

Searching the list for read-only access is illustrated by the last section of Table 2. The only difference is that the read_lock() and read_unlock() primitives are replaced by rcu_read_lock() and rcu_read_unlock(), respectively. Again, the bulk of the code remains the same for both cases.

Although this analogy can be quite compelling and useful, there are some caveats:

  1. Read-side critical sections may see ``stale data,'' that has been removed from the list but not yet freed. There are some situations (e.g., routing tables for best-effort protocols) where this is not a problem. In other situations, such stale data may be detected and rejected [Pugh90], as illustrated in Section 4.
  2. Read-side critical sections may run concurrently with write-side critical sections.
  3. The grace period will delay freeing of memory, which means that both the memory and the cache footprint of the code will be somewhat larger when using RCU than when using reader-writer locking.
  4. When changing to RCU, write-side reader-writer locking code that modifies list elements in place must often be restructured to prevent read-side RCU code from seeing the data in an inconsistent state. In many cases, this restructuring will be quite straightforward, for example, creating a new list element with the desired state, then replacing the old element with the new.

Where it applies, this analogy can deliver full parallelism with almost no increase in complexity. For example, Section 4 shows how applying this analogy to System V IPC yields order-of-magnitude speedups with a very small increase in code size and complexity.

RCU has also been used as a lazy barrier synchronization mechanism, as a mode-change control mechanism, as well as for more sophisticated list maintenance. Retrofitting existing code with RCU as shown above can produce significant performance gains, but of course the best results are obtained by designing RCU into the algorithms and code from the start.

Other methods have been proposed for eliminating locking [Herlihy93], and, when combined with more recent refinements [Michael02a,Michael02b], these methods are practical in some circumstances. However, they still require expensive atomic operations on shared storage, resulting in pipeline stalls, cache thrashing, and memory contention, even for read-only accesses.


next up previous
Next: RCU Implementation Up: Background Previous: RCU API
Paul McKenney 2003-03-28