Check out the new USENIX Web site. next up previous
Next: Accessory Functions Up: Using Accessory Functions to Previous: Introduction


Dynamic Dispatch and Reuse

One argument made by advocates of object-oriented languages is that object-oriented programming can facilitate code reuse. A well-designed class or hierarchy of classes can be reused without knowledge of, or access to, its implementation (just as a well-designed function or procedure can be reused in other languages). In this paper, we will be concerned with two types of reuse of classes. In the first, which we call reuse by inheritance, a programmer represents a new kind of data by making an extension of some existing data type. In the second, which we call reuse in a function, a programmer uses a class in the implementation of a new function (perhaps for a local variable or parameter).

We will focus on reuse that can be accomplished without modification of the source code that is to be reused. This is obviously important if software is distributed without source code, or if programmers are not able to modify the source code. It also prevents unnecessary code management complexities when source code is available. If several groups of programmers each reuse the same class, their modifications to that class must be merged, and the merged code must be tested, if these extensions are ever to be used together.

Note that reuse by inheritance (as defined above) can occur even in languages that do not support inheritance directly. Consider for example the task of representing various kinds of expression nodes in a compiler's abstract syntax tree (A.S.T.). (This example was adapted from [2]; we have focused on a simple method that can be understood with minimal knowledge of compiler construction.) In languages like C, the programmer can use a single struct to represent all kinds of expression nodes, distinguishing among the different kinds of nodes by including a kind field in the struct and using a switch statement to select code that is appropriate for each kind of expression node. Reuse by inheritance occurs if the programmer adds a new kind of expression node (perhaps because a new kind of expression has been added to the language). However, this reuse requires access to (and modification of) the existing code - the programmer must add a new case to the switch statements in existing functions, even if existing cases do not need to be modified.

Figure 1: Dynamic Dispatch Example
\begin{figure}
\begin{center}
\begin{verbatim}class Exp { // abstract supercla...
...() +
\lq\lq +'' + rhs().print_rep() + \lq\lq )'';
}\end{verbatim}\end{center}\end{figure}

In an object-oriented language like C++, the programmer can define the original kinds of expression nodes with a collection of classes, each of which inherits from an ``abstract superclass'' Exp (shown in C++ in Figure 1). The abstract superclass gives methods that are shared by all kinds of expressions, such as a print_rep method to produce a string that gives a printable representation for any kind of expression node. These methods must be defined for every subclass of Exp, and we can therefore request the printable representation of any object denoted by a reference of type Exp & (which must be of a class derived from Exp). Methods that are specific to one kind of node are defined only in the appropriate subclass (and thus cannot be requested through references of type Exp & in statically checked languages like C++).

We rely on the fact that we can request the print_rep for any object referred to by an Exp & in the print_rep method for class Plus. This method uses the printable representation for the left and right operands of the sum. Dynamic dispatch ensures that the print_rep method for the correct class is used, even though the compiler cannot determine statically which method will be chosen. This approach lets the programmer add new kinds of nodes by deriving a new class (with an appropriate print_rep) for each new kind of node. This does not require any modification of existing source code, and dynamic dispatch will ensure that the new class's print_rep is used for the new nodes, even in existing code (such as the print_rep method for class Plus).

Dynamic dispatch shifts the responsibility of selecting the appropriate print_rep code from the programmer to the programming language. In single-dispatch languages like C++ and Java, dynamic dispatch can be implemented by associating, with each object, a table of pointers to the code for each of the object's methods. In our example, each object of a class that inherits from Exp has a pointer to its print_rep method at a fixed offset in its dispatch table - to perform a call to print_rep when the type of object is not known, the compiler can generate an indirect function call using the table.

Note that it is possible for the programmer to implement dynamic dispatch in non-object-oriented languages that allow pointers to functions, such as C. However, this requires that the programmer undertake the tedious and potentially error-prone task of initializing using the tables of function pointers.

Unfortunately, the use of inheritance and dynamic dispatch inhibits reuse in a function. This is a consequence of the fact that only methods listed in a class can be dynamically dispatched based on that class. Consider what would happen if we wish to add a new pass to our compiler (such as an operation to interpret an expression), rather than a new type of expression node. If we had used a single class with a kind field, we could simply create a new function that takes an expression node, checks the kind field, and interprets the node in the appropriate way. However, if we wish to add this operation to the collection of classes in Figure 1, me must edit the class definitions to add an interpret method.

Editing the existing class definitions would be appropriate if we were redesigning, rather than reusing, these classes. However, we do not believe that every new use that requires dynamic dispatch should be considered a redesign: If this were the case, the author of a class would have the responsibility of enumerating all cases in which dynamic dispatch is needed for objects of that type.

There are a number of other ways to add an interpret method, each of which we consider unsatisfactory. First, we could introduce an interpret function that is not part of the Exp classes (in Java, it must be a method of some other class, such as a class Interpreter). This function could use typeid (in C++) or instanceof (in Java) in a series of if statements to select the appropriate code for the kind of node being interpreted. In a language without an equivalent of typeid, the programmer can add a kind field (or operation). In either case, this approach simply sets up a future problem with reuse by inheritance - any programmer adding a new kind of expression node must edit the code for interpret to work with the new type.

We could also use inheritance to produce new subclasses that add an interpret operation (deriving a class Exp_with_interpret from class Exp). However, this introduces spurious uses of multiple inheritance, which we consider highly undesirable (though we have no problem with legitimate uses of multiple inheritance): If we are to apply interpret to each new kind of node through an Exp reference, then the new node classes must share a common superclass with an interpret function, which introduces a second superclass for the new node classes.

Thus, if dynamic dispatch is provided only to functions listed in the class, we are forced to choose between allowing reuse in functions (if we use an explicit switch) or reuse by inheritance (if we use dynamically dispatched methods). To allow both kinds of reuse, we must allow dispatch for functions outside of the class. This is allowed in some languages that provide multiple dispatch [3,9]. However, multiple dispatch has a higher run-time cost than single dispatch: techniques based on complete dispatch tables may require large tables, and other methods do not provide constant-time dispatch [1]. Accessory functions provide both kinds of reuse without the added complexity and cost of multiple dispatch.

Multiple dispatch can also be introduced as a programming technique rather than a language feature (for example, by using the visitor design pattern). This also introduces unnecessary overhead, and is less flexible than a compiler-generated dispatch, as we will see in Section 6.


next up previous
Next: Accessory Functions Up: Using Accessory Functions to Previous: Introduction

2000-12-09