Base-Class Composition with Multiple Derivation and Virtual Bases Lee R. Nackman and John J. Barton IBM Research Division Thomas J. Watson Research Center P.O. Box 704 Yorktown Heights, New York 10598 Abstract For systems of C++ classes using virtual functions, writing base classes that only specify virtual member functions and then writing other base classes that only implement those functions improves extensibility. When interface is separated from implementation, both interfaces and implementations can be extended separately by derivation. New classes can then be composed by multiple derivation, combining one interface and one implementation base class. We call this form of composition base-class composition. Programmers familiar with the advantages of base-class composition fail to use it because of the performance penalty of multiple derivation and virtual base classes. Consequently, compiler writers, failing to see extensive applications of multiple derivation and virtual bases, have little incentive to eliminate the performance penalty. We highlight the advantages of the base-class composition pattern and show how the performance penalty can be eliminated by compiler optimization. 1 Introduction In general, C++ classes combine interface and implementation, specifying both the member func- tions that can be called (interface) and object state (implementation). Derived classes can extend their base class's interface or they can reuse their base class's implementation or both. Building on the work of Martin [1 ], we observe that there are many advantages to writing base classes that either specify interface or provide implementation, but not both. With a few simple rules applied consis- tently, these two kinds of cooperating base classes become building blocks that can be combined using multiple derivation. We call this design strategy base-class composition. It allows independent extension of interface, alternative implementation of interface, reuse of implementation, encapsulation of implementation, and avoids recompilation of clients when implementation is altered. Explaining its uses and its advantages are the first goal of this paper. We believe that programmers familiar with the advantages of this composition fail to use it because of the performance penalty of multiple derivation and virtual base classes. Consequently, compiler writers, failing to see extensive applications of multiple derivation and virtual bases, have little incentive to eliminate the performance penalty. After we highlight the advantages of the base- class composition pattern, we show how the performance penalty can be eliminated by compiler optimization. We shall focus our attention on aiding the construction of classes to be used through virtual function calls. A function that uses a class is a client of that class; it uses some services provided by the class. Some functions use an instance of a class only through its virtual functions and have no need to know the exact type of the object being used. Such functions can be written to use pointers or references to a base class with virtual functions. Let's call such a base class an interface base class. For example, a function that calls virtual Shape::draw() on a Shape& uses the object through an interface base class Shape . The object itself could be a Circle , Square , or Triangle . Let's call such a function a use client. Base-class composition aids the construction and maintenance of classes for use clients. Two other important clients are creation clients, those that create objects (requiring specific types) and downcast clients, those that apply derived-type specific operations to objects given only references to common base classes of the objects. Base-class composition does not make creation clients more difficult to design nor does it make downcast clients less difficult. The next section gives an example of base-class composition. Section 3 discusses characteristics of base-class composition and Section 4 compares it to the alternatives. The performance penalties of using base-class composition are described in Section 5. We show in Section 6 that a straightforward optimization technique can eliminate the costs of multiple derivation with virtual bases under certain circumstances and then, in Section 7, we discuss a further optimization applicable to private bases used in base-class composition. Related work is discussed in Section 8. 2 An Example of Base-Class Composition To illustrate base-class composition, we use a simple but realistic example. Suppose we want to write a tool that manipulates C++ source code and that we need a way to represent C++ language elements that appear in the source code. We might define a class for each C++ language element to be represented. In particular, let's assume that we want to define classes to represent C++ 's union , struct , and class . These are all aggregates in the sense that they contain source code entities such as member functions, member data, nested classes, and nested typedefs. However, only class and struct can have base classes. It is reasonable to expect classes for representing C++ aggregates to meet the following criteria: o All three kinds of aggregates should provide common functions representative of aggregation. o The class and struct aggregates should provide common functions representative of aggregates that can have bases. o All three aggregates should be able to use a common implementation of aggregation. o The two aggregates that can have bases should be able to use a common implementation of access to base classes. To focus on the issues related to separation of interface and implementation, we limit our example function interfaces: all aggregates respond to numMembers() and aggregates that can have bases respond additionally to numBases() . We mean for these functions to be representative of larger commonalities and differences between aggregates and aggregates with bases. Figure 1 shows a class DAG meeting our design goals. Two classes, Aggregate and Aggre- gateWithBases, are interface base classes, meaning that we intend for client functions to use pointers or references to these classes and that we intend to derive from these classes to implement the interfaces they specify. These classes specify member functions common to their derived classes. For base-class composition we restrict interface base classes to be abstract base classes with neither member data nor constructors. Characteristics of all aggregates are specified by pure virtual member functions of Aggregate: class ostream; class Aggregate - public: virtual int numMembers() const = 0; virtual void kind(ostream&) const = 0; // name of kind of aggregate. // ... "; Figure 1: Class DAG for C++ aggregates using base-class composition. Boxes indicate interface base classes. Solid and dashed arrows indicate public and private derivation respectively. Aggregates that may have bases have all the characteristics of aggregates plus characteristics related to bases. This is specified by publicly deriving AggregateWithBases from Aggregate and declaring additional pure virtual member functions: class AggregateWithBases : public virtual Aggregate - public: virtual int numBases() const = 0; // ... "; The reason for using virtual derivation is explained below. AggregateImpl implements the numMembers() virtual function of Aggregate: class AggregateImpl : public virtual Aggregate - public: AggregateImpl(int num`members) : `num`members(num`members) -" virtual int numMembers() const - return `num`members; " // ... private: int `num`members; // ...representation of members... "; A UnionImpl would represent union objects by deriving from this class for code and data reuse: class UnionImpl : public virtual Aggregate, private AggregateImpl - public: UnionImpl(int num`members) : AggregateImpl(num`members) -" virtual void kind(ostream& os) const - os << "union"; " "; AggregateWithBasesImpl reuses the implementation of the Aggregate layer of its Aggre- gateWithBases interface by deriving privately from AggregateImpl . It also adds implementa- tion for the numBases() virtual function specified by AggregateWithBases: class AggregateWithBasesImpl : public virtual AggregateWithBases, private AggregateImpl - public: AggregateWithBasesImpl(int num`members, int num`bases) : AggregateImpl(num`members), `num`bases(num`bases) - " virtual int numBases() const - return `num`bases; " private: int `num`bases; "; Using multiple derivation here enables AggregateWithBasesImpl to implement the Aggre- gateWithBases interface on the one hand and to reuse the AggregateImpl implementation on the other hand. Finally, ClassImpl is derived from AggregateWithBasesImpl using the same style: class ClassImpl : public virtual AggregateWithBases, private AggregateWithBasesImpl - public: ClassImpl(int num`members, int num`bases) : AggregateWithBasesImpl(num`members, num`bases) - " virtual void kind(ostream& os) const - os << "class"; " // ... "; StructImpl would be similar. With these classes we can build and process collections of aggregates. For example, we could construct an array of Aggregate pointers that point to UnionImpl, ClassImpl, and StructImpl objects. Then a client function that computes, say, the average number of members for each aggregate would look like this: float avgNumMembers(int n`aggs, Aggregate* aggs[]) - int sum = 0; for (int i = 0; i < n`aggs; i++) sum += aggs[i]->numMembers(); return sum / n`aggs; " Likewise, we can build and process collections of aggregates with bases by creating an array of AggregateWithBases pointers that point to ClassImpl and StructImpl instances. We could then write a client function that asks both for the number of elements as an aggregate and for the number of base classes, say a function that computes the average number of members for classes and structs that don't actually have bases: float avgNumMembersInRoots(int n`aggs, AggregateWithBases* aggs[]) - int sum = 0; for (int i = 0; i < n`aggs; i++) - if (aggs[i]->numBases() == 0) sum += aggs[i]->numMembers(); " return sum / n`aggs; " This example illustrates how the interface for AggregateWithBases layers on top of the interface for Aggregate, expressing the idea that aggregates with bases are aggregates. These use clients don't require ClassImpl objects to be AggregateImpl objects: the implementation is completely hidden from use clients. 3 Characteristics Of Base-Class Composition Our example illustrates base-class composition. First notice that our example has two kinds of base classes. The interface base classes, Aggregate and AggregateWithBases, are base classes with virtual functions, but they are not general C++ classes. They have no implementation. The other kind of base class, AggregateImpl and AggregateWithBasesImpl could be called implementation base classes. They do not add specifications of virtual functions and they are not used as public base classes. Thus our classes separate interface and implementation. Next notice the relationship between these base classes. The AggregateImpl implementation base class derives from the Aggregate interface base class, extending it but only adding imple- mentation. The AggregateWithBases interface base class also derives from the Aggregate interface base class, extending it but only adding more interface specification. The extended inter- face is implemented in AggregrateWithBasesImpl by composing the extended interface with an implementation of the original interface (AggregateImpl) and adding implementation for the extended interface. This pattern of implementation, extension, and composition can continue to arbitrary depth; we call the pattern base-class composition. The C++ language features of virtual functions, multiple derivation, virtual bases, and the dominance rule for name lookup in the class DAG [2 ] combine to enable base-class composition. Virtual functions, of course, make the notion of interface possible. Multiple derivation enables an implementation class to derive from both its interface and an implementation base. The combination of virtual bases and dominance connect the interface and the implementation. Referring to Figure 1, we see a diamond-shaped pattern rooted at Aggregate. It has an interface base at the top of the diamond, with the interface extended on one leg (AggregateWithBases) and an implementation of the interface base on the other leg (AggregateImpl). The class at the bottom of the diamond (AggregateWithBasesImpl) completes the implementation of the extended interface specified on one path, building on the implementation along the other path. AggregateWithBasesImpl inherits names from its direct base classes, AggregateWith- Bases and AggregateImpl, and it inherits the names of the indirect base class, Aggregate. Since AggregateImpl and AggregateWithBases are derived virtually from Aggregate, the name numMembers from AggregateImpl dominates the same name inherited along the DAG path through AggregateWithBases. The function numMembers() declared in the public interface base class Aggregate is implemented in AggregateWithBasesImpl by the AggregateImpl base class. A degenerate triangular version of the diamond-shaped pattern also appears three times in the DAG of Figure 1. The degenerate version omits the extension of the interface. Two of the triangular patterns in Figure 1 are rooted at AggregateWithBases, with a partial implementation of the interface provided on one leg by AggregateWithBases, and the implementation completed on the other leg by ClassImpl (resp., StructImpl). Again, multiple derivation, virtual bases, and name dominance combine to yield base-class composition. The third triangular pattern is rooted at Aggregate. Two design conventions:separation of interface and implementation and virtual interface base classes must be adhered to by the programmer to make base-class composition work. Separation of Interface and Implementation. Base-Class composition relies on the separation of interface and implementation. This means that the specification of the functions callable for an object are separated from the implementation of those functions; for C++ , this means that member functions are specified in classes without member data. Classes with member data and implementation of the member functions derive from these classes. As Martin discussed [1 ], separating interface and implementation in C++ requires omitting member data in interface base classes and using virtual functions. It is also advantageous to declare the functions to be pure virtual so that the compiler will detect attempts to instantiate the interface and detect failure to override base class functions in the derived class. Other languages, including Modula-3 [3 ] and Ada, provide separate constructs for interfaces and implementations. Virtual Interface Base Classes. Derivations from interface base classes must be virtual if the base-class composition approach is to be applied [1 ]. Without virtual derivation, the interface base classes would be duplicated: ClassImpl would have three Aggregate interfaces. The Figure 2: A class DAG extending the DAG in Figure 1 to include additional implementations of the AggregateWithBases interface, AggregateWithZeroBases and ClassWithZeroBases. implementation names would not dominate the interface names along the AggregateWithBases branch and there would be no composition. These design conventions, separating interface and implementation and deriving virtually from interface bases, represent the programming-time cost of base-class composition. Balanced against these up-front costs are the benefits in maintenance of independent extension, compositions, segregation of use and creation clients, and avoided recompilation. We examine each of these in the following paragraphs. Independent Extension. Once we have adopted the separation of interface and implementation, the interface can be extended in two ways. Multiple implementations are one kind of extension. For example, we can add AggregateWithZeroBasesImpl as shown in Figure 2. This class eliminates the space for the member datum `num`bases : class AggregateWithZeroBasesImpl : public virtual AggregateWithBases, private AggregateImpl - public: AggregateWithZeroBasesImpl(int num`members) : AggregateImpl(num`members) - " virtual int numBases() const - return 0; " "; Of course the saving in this case is trivial because AggregrateWithBasesImpl is trivial. Use clients of AggreateWithBases work against both implementations. Adding virtual functions to the interface to create a richer interface is the other kind of extension. For example, we could create a Class interface derived from AggregateWithBases without affecting the implementation extensions derived from AggregateWithBases. These two kinds of extensions are for different purposes, alternative implementation versus additional interface. Reuse Through Composition. To reuse the AggregateImpl implementation of the Aggregate interface in the implementation of the AggregateWithBases interface layer, AggregateWith- BasesImpl derives publicly from the interface AggregateWithBases and privately from the implementation AggregateImpl. Together, these two base classes form a composition that imple- ments the Aggregate part of the AggregateWithBases interface. The virtual functions defined along the AggregateImpl branch (just numMembers() in this case) dominate the virtual functions in the Aggregate base class along the AggregateWithBases branch. Once we adopt separation of interface and implementation and use virtual interface base classes, composition can become a ubiquitous tool in class design. For example, in Figure 2 we can create ClassWithZeroBasesImpl simply by composition: class ClassWithZeroBasesImpl : public virtual AggregateWithBases, private AggregateWithZeroBasesImpl - public: ClassWithZeroBasesImpl(int num`members) : AggregateWithZeroBasesImpl(num`members) - " "; Here we attach to the extended interface AggregateWithBases and reuse one of it implemen- tations. The usage rules for base-class composition are always the same: the interface is public and virtual and the reused implementation is encapsulated with the same consideration as member data, usually as a private base class. Forcing Access Through Interfaces. In the preceding example code, using private derivation forces member functions to be called through interface references. For example, int totalItems(const ClassImpl& c) - // WRONG: AggregateImpl::numMembers() const is a private member return c.numBases() + c.numMembers(); " does not compile because public members inherited via private derivation are private. This seg- regates clients into use clients able to call through the interface only and creation clients unable to call interface functions on objects. This segregation encourages us to write code applicable to all Aggregate objects using Aggregate references or pointers and to avoid writing code tied to specific implementations of the interface like ClassImpl. Whether or not this segregation should be used is a matter of design: some libraries may wish to encourage object access through interface base classes only while others will allow functions at all levels of the class DAG to be called. Access declarations can be used to restore public access in cases in which forcing access through interfaces is not appropriate. Avoiding Recompilation of Use Clients. Base-class composition also avoids recompilation of use clients when private implementations are redefined, a vital advantage when building large systems. While encapsulation ensures that use clients do not depend on implementation, separating interface and implementation provides a stronger decoupling. Use clients can be compiled to the interface, implementations can be compiled to the interface, and they need only be connected at link time. 4 Why Alternatives are Less Robust In our experience, base-class composition provides a systematic and robust architectural pattern for object-oriented programs in C++ . We have outlined the technique in the preceding section. Here we support our claim that it is more robust---resistant to unforeseen errors---and more maintainable than alternative designs. We pose alternatives in terms of our example. Combined interface and implementation. We might dispense with the interfaces altogether. A class DAG like that shown in Figure 3 would provide the same implementation reuse as the one in Figure 1. Inheriting interface and implementation together combine four classes into two classes (AggregateCombined and AggregateWithBasesCombined) and eliminates all multiple de- rivation. While these may be counted as advantages, client functions are now tied to specific implementations. Modifying the implementation of AggregateCombined forces all use clients Figure 3: An alternative class DAG design to represent C++ aggregates that uses interface base classes combined with implementation. Compared to Figure 1, the content of Aggregate and Ag- gregateImpl are combined in AggregateCombined and the content of AggregateWithBases and AggregateWithBasesImpl are combined in AggregateWithBasesCombined. Public de- rivation (solid arrows) must be used here to expose the virtual functions specified in the base classes. Figure 4: Alternative class DAG using a fat interface in representing C++ aggregates. Compared to Figure 1, this DAG combines the Aggregate and AggregateWithBases interface base classes into one large interface. Public inheritance must be used here in the derived classes to expose the specifications in the interface. of AggregateCombined, AggregateWithBasesCombined, and any further layers to be recom- piled when anything is changed. Combining implementation and interface also prevents alternative implementations from being used by one set of client functions. For example, we cannot define a ClassWithZeroBasesImpl object as we did in the preceding section and then use it in use clients expecting AggregateWith- BasesCombined references or pointers. We can extend the interface of AggregateWithBasesCombined in the same manner that we built it from AggregateCombined. For this reason, combined interface and implementation classes work in systems that do not use virtual function interface clients as a major program design element. Fat interfaces. The number of classes can be reduced by lumping all of the interface functions into one interface base class, say AggregateWithOrWithoutBases, as shown in Figure 4. Those parts of the interface not pertinent to a given derived class type are coded in some hopefully harmless way. For example, we could code AggregateImpl::numBases() for AggregateImpl to always Figure 5: Alternative class DAG representing C++ aggregates using member function forwarding. Compared to the class DAG shown in Figure 1, this DAG has no multiple inheritance. The classes AggregateImpl and AggregateWithBasesImpl are used as private member data. return an impossible value like -1 or to throw an exception. Such lumped interfaces, called fat interfaces in [4 , x 13.6], replace static type checking with either runtime checks or undetected errors. In addition, the fat interface approach also makes us choose between implementation inheritance and encapsulation. In Figure 4, ClassImpl must be derived publicly from AggregateImpl to expose the AggregateWithOrWithOutBases interface. If we derive ClassImpl from the inter- face publicly and directly we cannot also inherit the implementation of AggregateImpl privately unless we build a structure equivalent to the base-class composition. If we extend the interface, adding functions to AggregateWithOrWithoutBases, all use clients must be recompiled, even if they use only the equivalent of the Aggregate interface. We can add new implementations in the fat interface approach and these implementations don't require recompilation of use clients. For this reason, fat interfaces appear in small projects with heavy use of virtual functions. Implementation Reuse via Member Subobjects. As a final alternative, we abandon implemen- tation reuse through inheritance but retain the layered interface, as shown in Figure 5. Instead of obtaining an AggregateImpl subobject via inheritance, AggregateWithBasesImpl could have a member subobject, like this: class AggregateWithBasesImpl : public AggregateWithBases - public: AggregateWithBasesImpl(int num`members, int num`bases) : `aggregate`impl(num`members), `num`bases(num`bases) - " virtual int numMembers() const - return `aggregate`impl.numMembers(); " virtual int numBases() const - return `num`bases; " private: AggregateImpl `aggregate`impl; int `num`bases; "; The function numMembers() is said to be forwarded to the member `AggregateImpl . This seems fine until you try to compile the class and get an error message. AggregateImpl is an abstract base class since it doesn't implement the pure virtual function kind() . To use forwarding in this situation, you must add a dummy implementation of kind() to AggregateImpl. Once the code is correct, this alternative is indistinguishable from the base-class composition as far as client functions are concerned: the interfaces are identical and the implementations are completely encapsulated. However, the forwarding functions themselves are a maintenance item not required by base-class composition. In systems of realistic complexity, this style of reuse becomes tedious and error-prone. Base-class composition expresses the same relationships without imposing the maintenance overhead of forwarding functions. In our experience, larger more mature systems designed to use virtual functions often adopt the member subobject approach, limiting the use of derivation to building subtype relations. 5 The Cost of Base-Class Composition If base-class composition is superior to alternatives, why isn't it used in more object-oriented C++ programs? History is part of the answer. Multiple derivation, virtual bases, and dominance lookup are relatively new additions to C++ and early experiences with these features were not altogether positive (see Section 8). But even those programmers aware of the advantages select alternatives as a practical engineering tradeoff. The problem is performance. As far as we are aware, current C++ compilers use roughly the scheme outlined in [2 ] to implement virtual bases with virtual functions. Minor optimizations aside, this scheme adds to each object one virtual function table pointer per base class and creates one virtual function table per base class for each class. In addition, one pointer to each virtual base class is needed for each subobject declaring a virtual base class. Thus we would expect ClassImpl in Figure 1 to take 11 words of memory: two words for the integer members, five words for virtual function table pointers, and four words for pointers to virtual bases. (The compiler we tried this on used ten words, saving a word by sharing two virtual function table pointers.) The size overhead carries a proportional runtime overhead since each pointer in the object must be initialized when the object is created. Since each layer of base-class composition adds at least two base classes with virtual functions, these overheads increase with increasing layers. Such scaling works against the application of base class composition to the very kinds of problems---large systems---at which the technique is most adept. 6 Optimization of Base-Class Composition The costs of base-class composition can be eliminated by a straightforward compiler optimization. This claim would be best proven by a C++ compiler that did not have the overheads of existing implementations. To convince compiler writers to implement such an optimization, we need both the motivation that we presented in the preceding sections and convincing arguments that the optimization will succeed. In this section we outline the optimization and argue that it will succeed. The core of our argument lies in recognizing that the restricted interface base classes we need for base-class composition only specify the contents of virtual function tables, contents that are determined at compile time. Thus we can transform the class DAG at compile time to eliminate the pointers needed for more general virtual base classes, as long as the virtual function tables are filled correctly. We begin with a definition: a pure abstract base class is an abstract base class (a class having at least one pure virtual function) with no data members and only C++ -generated constructors, possibly derived from other pure abstract base classes. This restricted kind of interface base class is the kind we advocate for base-class composition. A compiler can test for a pure abstract base class unambiguously by examining a class and its base classes. We claim that a DAG containing pure abstract base classes can be transformed into an equivalent DAG having no virtual base class derivation involving a pure abstract base class. By equivalent we mean that programmers will not be able to detect the difference by ordinary means---more on that shortly. Obviously, if this claim is true, no virtual base pointers are needed since there are no virtual bases. Removing virtual derivation "unfolds" the pure abstract base class portion of the DAG into a tree, replicating some of the pure abstract base classes. Each resulting tree branch can be implemented Figure 6: A hypothetical class DAG, (a), and the result of optimizing it (b). with a single virtual function pointer [2 ]. Consequently, the transformation eliminates virtual base pointers and most of the pointers added by multiple derivation. To understand this transformation, consider the example class DAG in Figure 6a. Classes P1 , P2 , P3 , and P4 are pure abstract base classes; the other classes are arbitrary. The class names starting with D border on the part of the DAG that will be transformed; those starting with B and L are just other parts of the DAG. Arrows point to base classes and virtual derivations are marked with a v . For example, the derivation of D43 from D3 is virtual but D3 is not a pure abstract base class. The transformed DAG is shown in Figure 6b. With the virtual derivations from pure abstract base classes changed to non-virtual derivations, the pure abstract base classes are duplicated in the resulting DAG. For the present DAG, base P1 appears 6 times as base classes for type L2 rather than once. However, since none of the classes that are duplicated contain data, the storage required for any object from any class in the transformed DAG is less than that of classes in the original DAG. Now we claim that these DAGs are equivalent in the following sense: the source code using the original DAG can be rewritten to use the transformed DAG such that the rewritten code works the same as the original in all ways that do not depend on implementation dependent object-layout details. Object sizes, offsets of members, and the structure of virtual function tables will change (good!), but otherwise the program will be the same. After the DAG transformation, the compiler must (a) forward member function calls defined along one replicated path to the other path, and (b) disambiguate pointer and reference conversions from derived classes to replicated pure abstract base classes. For example, member functions in D2 that override pure virtual functions in P1 or P2 will have to be overridden in D42 . P1 pointers initialized with D42 pointers will have to be converted up through one path, say P4 to P2 to P1 . All paths are equivalent since P1 is replicated by design. The rewriting is sufficient because there are only two things one can do with a pure abstract base class: (1) initialize a reference or pointer of the base type with a reference or pointer, respectively, of one of its derived types, or (2) call one of its member functions through a reference or pointer. The initializations can be done with replicated classes caused by removing the virtual specifiers just as well as to the shared classes because the contents of the replicated classes are fixed at compile time. Calls to member functions give the same result for both shared and replicated classes because the functions have the same definitions. Programmers using objects from classes derived from pure abstract base classes cannot detect whether or not these objects contain shared or replicated subobjects from these bases. Again, the lack of member data in pure abstract base classes allows the optimization: there is no shared data to point to or initialize. In Figure 6b, class D43 derives virtually from D3 even in the transformed DAG because D3 is not a pure abstract base class and cannot be optimized in the manner described here. 7 Optimization of Private Base Classes The particular pattern of base-class composition can be further optimized: Private base classes in the transformed DAG whose only public base classes are pure abstract base classes can be rewritten as private members without virtual functions of their own. For example, if D42 derived privately from D2 and D2 derived privately from B1 then the D2 portion of the DAG would be stored as a private member of D42 and, further, this member would not have any virtual function pointers inside of it. To accomplish this second optimization, the compiler must rewrite classes into a concrete and an interfaced form. For a class like D2 in Figure 6, call the type of the concrete form ConcreteD2 . Class ConcreteD2 is exactly like D2 , except that it does not derive from any pure abstract base class and its member functions are not declared virtual. A new interfaced D2 class is written that derives from the same pure abstract base classes of the original D2 and declares the same member functions as D2 , but implements these functions by forwarding to a private ConcreteD2 member datum. All clients of the interfaced D2 see the same behavior as they saw from the original D2 . However, the compiler now has a private representation class, ConcreteD2 , that does not have virtual functions and hence has no virtual function pointers. This private representation can be used whenever D2 appears as a private subobject, either as a private member datum or, as in the case of base-class composition, as a private base class. This second optimization succeeds because of encapsulation and because, unlike base-class pointers or references that can bind to different objects types at run-time, member objects have a fixed known type. As the compiler implements a private subobject, it has all of the source code that can call member functions on that subobject (encapsulation) and it can resolve virtual function calls on that subobject at compile time without going through a virtual function table (one type). Thus private subobjects can be implemented using the concrete private representation of a class. To illustrate the impact of these optimizations on a recognizable example, we return to the C++ aggregate example. Here is part of the result of applying these optimizations by hand: class ConcreteAggregateImpl - public: ConcreteAggregateImpl(int num`members) : `num`members(num`members) -" int numMembers() const - return `num`members; " private: int `num`members; "; class AggregateImpl : public Aggregate - public: AggregateImpl(int num`members) : `aggregate`impl(num`members) -" virtual int numMembers() const - return `aggregate`impl.numMembers(); " private: ConcreteAggregateImpl `aggregate`impl; "; class ConcreteAggregateWithBasesImpl - public: ConcreteAggregateWithBasesImpl(int num`members, int num`bases) : `aggregate`impl(num`members), `num`bases(num`bases) - " int numBases() const - return `num`bases; " int numMembers() const - return `aggregate`impl.numMembers(); " private: ConcreteAggregateImpl `aggregate`impl; int `num`bases; "; class AggregateWithBasesImpl : public AggregateWithBases - public: AggregateWithBasesImpl(int num`members, int num`bases) : `agg(num`members, num`bases) - " virtual int numBases() const - return `agg.numBases(); " virtual int numMembers() const - return `agg.numMembers(); " private: ConcreteAggregateWithBasesImpl `agg; "; The concrete version of each class provides pure implementation, without any connection to the rest of the class DAG. As a result, there is no overhead from virtual base pointers or virtual function pointers. These concrete classes provide implementation reuse. Since each non-interface class is derived through a single inheritance chain, only one virtual function table pointer is needed. Thus, the overhead of this scheme is one word: an instance of the transformed ClassImpl takes three words, two for the integer member data and one for a virtual function table pointer. Separation of interface and implementation is central to the success of the transformation. Imagine for the moment that our Aggregate base class had some member datum. As a virtual base class, a single copy of this datum would be included in a ClassImpl object. But in transforming this system to single-inheritance, a private AggregateImpl member datum would replace the private base class and two copies of the Aggregate member would be enclosed in a ClassImpl object: one from the AggregateImpl member and one remaining in the Aggregate base class. The pointers introduced by C++ compilers to support virtual function calls in the presence of multiple inheritance and virtual bases give offsets to adjust the this pointer to accommodate base class member data. Separation gives interfaces with no data; no data means no pointer offsets. 8 Related Work Separation of Interface and Implementation. Section 3 follows the work of Martin [1 ]. He started with an "ideal" model for interface-based programming and showed that a C++ programming style could almost be equivalent. He concluded with a suggestion that would improve the language support for this kind of style. Since Martin's paper appeared, the C++ language has been altered and the technique we call base class composition is the same as the style advocated by Martin but using the new language rules. As Martin predicted, the result is a clean mechanism for programming with interfaces, as we illustrate here. Martin's assessment of the performance penalty omitted runtime initialization of the virtual- base and virtual function table pointers. In our experience, C++ programmers prefer the cost of maintaining forwarding functions and even live with mixed interface and implementation rather than accept the space and time overheads. For this reason, we investigated compiler optimization for this important technique. We have also distinguished the kinds of program designs---those that rely on use clients---that will benefit from separation of interface and implementation. Programs that do not use virtual function calls will only see the programmer-time cost of base class composition and no benefit. We Need Multiple Derivation. Cargill has argued [5 , 6 ] that many cases of multiple derivation can be reimplemented using single derivation with some advantages and that, given the complex semantics of multiple derivation, it is not a useful language feature. In [6 ], he gives three examples using multiple inheritance. The two he argues are failures do not separate interface and implemen- tation; moreover the implementations are not layered to allow reuse. His third example, the one he thinks may be a reasonable use of multiple inheritance, uses two interfaces but no implementation reuse. We believe Cargill's conclusion was reached without the benefit of enough examples of large systems of classes designed for extensibility. Waldo[7 ] analyzed Cargill's arguments against mul- tiple derivation by suggesting that three kinds of base classes might be distinguished: 1) those for implementation inheritance, 2) those for interface inheritance, and 3) those for data inheritance. He argued that Cargill focused on the first, but that the latter two are more important in large systems and that multiple derivation is needed to support the use of such base classes. Waldo's classes for implementation inheritance correspond to our classes with combined in- terface and implementation. We agree with his arguments for interface inheritance, but we see a broader role for its use. By splitting combined interface and implementation classes then composing these classes via multiple derivation we achieve the benefits Waldo cites for interface inheritance and we achieve implementation reuse. Base-class composition provides simple and consistent guidelines allowing us to ignore the complexity of the general multiple derivation in C++ . We Need The Dreaded Virtual Base Diamond. Meyers [8 ] also argues against multiple derivation and especially against virtual bases used in the "dreaded diamond-shaped inheritance graph" (p. 165). He argues that one cannot predict when a virtual base should be used; we say that every derivation from an interface base should be virtual. He argues that constructors for virtual bases are problematic; we say don't put data in abstract base classes and you won't need constructors in virtual bases. He argues that ambiguities can arise in multiple derivation and that dominance is mysterious; we say that using multiple derivation in a disciplined way with one branch adding new virtual functions and the other implementing previous virtual functions harnesses the mysterious powers of the dominance rule. He argues that virtual bases do not allow casting a pointer to a base class to a pointer to a derived class; we agree with his assessment that this is not very important. Meyers then goes on to give an example of multiple derivation using public interface and private implementation base classes. He sees this example as useful and comprehensible, but says (p. 165) that "it's no accident that the dreaded diamond-shaped inheritance graph is conspicuously absent." However, if he simply added virtual derivation to his graph, he could have completely eliminated the member function definitions in his final class, together with their maintenance. Meyers, like Cargill, exposes many problems with the use of multiple derivation and the use of virtual bases. Their arguments focus on practical examples and they admit possible exceptions. We agree that the uses they explore do have problems. However, we blame failure to separate interface and implementation for most of the problems and find important uses for both multiple derivation and virtual bases used for base-class composition. Cargill [6 ] and Meyers [8 ] advocate manual rewriting as means of avoiding the overhead of multiple derivation and virtual base classes. In effect they favor a coding style that replaces base- class composition by hand optimization. Here we have demonstrated that base-class composition is simpler and easier to maintain; it can be a key building block for layered systems and the optimization can be done automatically. Not All Public Base Classes Should be Virtual. Sakkinen[9 ] discusses the C++ inheritance model and concludes that public base classes should always be virtual while private base classes should always be nonvirtual. Base class composition does conform to these guidelines, but we disagree with the guidelines in general. Without virtual functions, public base classes have a very different role in class designs and we cannot decide to make them virtual in all cases. With mixtures of interface and implementation, we must let the required object state dictate the choice of virtual or nonvirtual bases. In another paper,[10 ] Sakkinen discusses initialization problems with virtual base classes. Again, base class composition uses no constructors in virtual bases and initialization is therefore not relevant to the programmer. 9 Conclusion Base-class composition allows large, layered systems of interfaces to be implemented robustly and simply. Separate interface and implementation base classes are composed to form a base for further implementation without compromising extensibility or encapsulation. We believe that when virtual functions are used by client functions---when programming through interfaces---base class composition should become an important tool for C++ programmers. However, the performance barrier must be overcome first. We have proposed optimizations that allow compilers to eliminate all but one of the pointers needed to implement C++ for systems using base class composition. Acknowlegements We appreciate the suggestions for improving this paper that were made by Michael Karasick, Derek Lieber, Chris Laffra, Barry Rosen, Michael Fraenkel, Noel Sales, Ralph May, Paul Golick, and Lars Hougaard. We also thank Ernie Choi for his encouragment. References [1] Bruce Martin. The separation of interface and implementation in C++ . In USENIX C++ Conference Proceedings, pages 51-63. USENIX Association, April 1991. [2] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts, 1990. [3] Samuel P. Harbison. Modula-3. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1992. [4] Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley Publishing Company, Inc., Read- ing, Massachusetts, second edition, 1991. [5] T.A. Cargill. Does C++ really need multiple inheritance. In USENIX C++ Conference Proceedings, pages 315-323. USENIX Association, April 1990. [6] Tom Cargill. C++ Programming Style. Addison-Wesley Publishing Company, Inc., Reading, Massachu- setts, 1992. [7] Jim Waldo. Controversy: The case for multiple inheritance in C++ . Computing Systems, 4(2):157-171, 1991. [8] Scott Meyers. Effective C++ : 50 Specific Ways to Improve Your Programs and Designs. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts, 1992. [9] Markku Sakkinen. A critique of the inheritance principles of C++ . Computing Systems, 5(1):62-110, 1992. [10] Markku Sakkinen. How should virtual bases by initialized (and finalized)? C++ Report, 5(3):44-50, 1993.