Check out the new USENIX Web site. next up previous
Next: ufalloc() Up: Background Previous: Event-driven servers


The select system call allows a user process to wait for events on a set of descriptors. A process can indicate interest in three types of events on a descriptor: events that make a descriptor readable, those that make it writable, and exception events. This information is passed to the kernel using three bitmaps. In each bitmap the kth bit indicates interest in events of that type for the kth descriptor. These bitmaps are value-result parameters, and the returned bitmaps indicate the sets of ready descriptors. Stevens[23] describes the select() interface in detail.

We describe the Digital UNIX implementation of select(). However, the classical BSD implementation of select() is similar to the Digital UNIX implementation. The main differences are related to the multithreaded nature of the Digital UNIX kernel. Thus our discussion is fully applicable to 4.3BSD and most BSD-derived implementations. Also, we discuss how select() works for descriptors that represent sockets, but our discussion and algorithms can be trivially extended to include descriptors that refer to other kinds of objects, such as vnodes. (Vnodes are kernel data structures used to represent files and devices.)

In Digital UNIX, the select() function in the kernel starts by creating internal data structures containing summary information about sockets that are marked in at least one input bitmap. Subsequently, select() calls do_scan(), which calls selscan() to check the status of each of the entities (vnodes or sockets) corresponding to the selected descriptors.

For each selected socket, selscan() enqueues a record referring to the current thread on the select queue of the socket. This is done so that the thread can be identified as waiting inside select() for events on the socket. selscan() then calls soo_select() for each socket, which checks to see if the condition that the process is interested in (i.e. the socket is readable, writable, or has pending exceptions) is true. If none of the conditions that the user process is selecting on are true, then do_scan() goes to sleep waiting for any of these to become true.

Note that the linear search in selscan() covers every socket of potential interest to the selecting process, independent of how many are actually ready. Thus, the cost is proportional to the number of file descriptors involved in the call to select(), rather than to the number of events discovered by the call.

When a network packet comes in, protocol processing may cause a condition on which do_scan() is blocked to become true. The thread that performs protocol processing for an incoming packet calls select_wakeup(), which wakes up all threads that are blocked in do_scan() awaiting this condition.

A thread that is woken up in do_scan() calls selscan(), which calls soo_select() for all the sockets that the corresponding call to select() specified in its three bitmaps. do_scan() also calls undo_scan() to remove this thread from select queues of the selected sockets.

next up previous
Next: ufalloc() Up: Background Previous: Event-driven servers

Gaurav Banga
Mon Apr 27 13:10:55 CDT 1998