select()

Next: ufalloc() Up: Scalable select() and ufalloc() Previous: Scalable select() and ufalloc()

select()

Consider an event-driven server process waiting for activity on any of a few thousand sockets. Recall from Section 2 that select() always performs a full scan through all of these sockets, either to find those few that are currently ready, or to indicate that a thread is waiting for events on each of the sockets.

A full scan is also performed after the protocol code processes an incoming packet and calls select_wakeup() to unblock a thread waiting inside select(). The full scan is performed even though only a few of the sockets are actually ready. This wasted effort is expended because, between the call to select_wakeup() and the invocation of do_scan(), we throw away the information about the identity of the socket that has become ready. selscan() then does a significant amount of work to rediscover the set of ready sockets.

The key idea of our design is to preserve information about the change in the state of a socket between select_wakeup() and do_scan(). We use this information to prune both the initial scan, and the scan after the select_wakeup(), to inspect only those sockets that need inspection. These are the sockets either about which we have no prior information, or for which we have state-change hints from the protocol-processing layer.

We changed the Digital UNIX kernel to keep track of three sets for each thread, named READY, INTERESTED, and HINTS. (The first two of these sets actually consist of three component sets, one for read-ready descriptors, one for write-ready descriptors and one for exceptions.) The INTERESTED set is the subset of sockets that the thread is currently interested in selecting on. The READY set is a subset of the INTERESTED set and includes those sockets which the kernel thinks are ready. The kernel maintains state-change information about sockets in the INTERESTED set, rather than for the full set of sockets open for a thread. This state-change information is maintained as the HINTS set. The HINTS set includes sockets that might have become ready since the last call to select(), and is updated by the protocol layer when a packet arrives for a socket.

Each call to select() specifies a SELECTING set for the thread, which is used to compute the new values of the READY and INTERESTED sets. select() uses the HINTS and READY sets to prune its initial scan. It checks only those sockets which are in the SELECTING set and either:

are not in the old INTERESTED set, or
are in the old READY set, or
are in the HINTS set

Mathematically, we can express the computation of these sets as:

eqnarray219

where expresses the computation of checking the status of descriptors in its argument set.

The computation of 's argument set above appears to have complexity proportional to the size of the SELECTING set. We took care to optimize this computation and its data-cache footprint. The resulting code has a very small cost relative to other parts of select().

The set returned from select() is:

A descriptor must be removed from the INTERESTED sets of all threads in a process at some point between the time that the descriptor is closed and the time that it is next allocated by any thread in the process.

For each socket, we record the set of processes that have a reference to the socket. In the protocol processing code, when a packet comes in for a socket, sowakeup() records a hint in the HINTS sets of each of the threads in the referencing processes for which this socket is present in the INTERESTED set of the thread. sowakeup() also wakes up all such threads that are blocked in select(). After a thread is woken up in select(), it scans only those sockets in its HINTS set.

Next: ufalloc() Up: Scalable select() and ufalloc() Previous: Scalable select() and ufalloc()

Gaurav Banga
Mon Apr 27 13:10:55 CDT 1998