Objecting to Objects

by Stephen C. Johnson, Melismatic Software

Abstract

Object Oriented Programming (OOP) is currently being hyped as the best way to do everything from promoting code reuse to forming lasting relationships with persons of your preferred sexual orientation. This paper tries to demystify the benefits of OOP. We point out that, as with so many previous software engineering fads, the biggest gains in using OOP result from applying principles that are older than, and largely independent of, OOP. Moreover, many of the claimed benefits are either not true or true only by chance, while occasioning some high costs that are rarely discussed. Most seriously, all the hype is preventing progress in tackling problems that are both more important and harder: control of parallel and distributed applications, GUI design and implementation, fault tolerant and real-time programming. OOP has little to offer these areas. Fundamentally, you get good software by thinking about it, designing it well, implementing it carefully, and testing it intelligently, not by mindlessly using an expensive mechanical process.

Define Your Terms

Object Oriented Programming (OOP) is a term largely borrowed from the SmallTalk community, who were espousing many of these techniques in the mid 1970's. In turn, many of their ideas derive from Simula 67, as do most of the core ideas in C++. Key notions such as encapsulation and reuse have been discussed as far back as the 60's, and received a lot of discussion during the rounds of the Ada definition. Although there have been, and will always be, religious fanatics who think their language is the only way to code, the really organized OOP hype started in the late 1980's. By the early 1990's, both Next and Microsoft were directing their marketing muscle into persuading us to give up C and adopt C++, while SmallTalk and Eiffel both were making a respectable showing, and object oriented operating systems and facilities (DOE, PenPoint, COBRA) were getting a huge play in the trade press--the hype wars were joined.

It is said that countries get the governments they deserve, and perhaps that is true of professions as well--a lot of the energy fueling this hype derives from the truly poor state of software development. While hardware developers have provided a succession of products with radically increasing power and lower cost, the software world has seen very little productivity improvement. Major, highly visible products from industry leaders continue to be years late (Windows NT), extremely buggy (Solaris) or both, costs skyrocket, and, most seriously, people are very reluctant to pay 1970's software costs when they are running cheap 1990's hardware. I believe a lot of non-specialists look at software development and see it as so completely screwed up that the cause cannot be profound--it must be something simple, something a quick fix could fix. Maybe if they just used objects...

To be more precise, most of what I say will apply to C++, viewed as a poor stepchild by most of the OOP elite. Actually, the few comments I will make about more dynamically typed languages like SmallTalk make C++ look good by comparison. I will also focus my concern fairly narrowly. I am interested in tools, including languages, that make it easier and more productive to generate large serious high quality software products. So focusing rules out a bunch of sometimes entertaining philosophical and aesthetic arguments best entertained over beer.

Heads in the Past

As pointed out above, most OOP ideas were alive and well 20 to 25 years ago. Why didn't they solve the software problem then?

I, with my gray hair, would be the last to argue against something solely because of age. After all, a lot of mathematics is 2000 years old and still works fine. But programming languages exist to organize and tame cycles. In 1967, when Simula was born, computers cost from a quarter million to five million dollars, most were single user systems with sign up sheets, or crude batch systems, displays were almost unknown, discs were almost unknown. It was a very different world.

Today's cycles are at least 1000 times cheaper (probably more like 10,000) than they were in 1967, and most of the other aspects of our computing world have experienced a few orders of magnitude change as well. It is a very difficult argument to claim that the best way to organize expensive slow cycles is also the best way to organize cheap fast ones.

It is a much more defensible position to argue that we are just inherently slow at adopting new ways of thinking, no matter what field, and the hardware folks have left our minds in the dust. As my grandfather used to day, "We can send a message around the world seven times in one second, but it still takes 20 years for that message to go through a quarter inch of human skull." This suggests that the latest techniques, be they OOP or otherwise, are a step in the slow process of our learning to think in MIPS rather than IPS, and even if there were no further advance in hardware we could still be decades in digesting the hardware we already have.

What Works in OOP

Those who report big benefits from using OOP are not lying. Many of the reported benefits come from focusing on designing the software models, including the roles and interactions of the modules, enabling the modules to encapsulate expertise, and carefully designing the interfaces between these modules. While most OOP systems allow you, and even encourage you, to do these things, most older programming systems allow these techniques as well. These are good, old ideas that have proved their worth in the trenches for decades, whether they were called OOP, structured programming, or just common sense. I have seen excellent programs written in assembler that used these principles, and terrible programs in C++ that did not. The use of objects and inheritance is not what makes these programs good.

What works in all these cases is that the programs were well thought out and the design was done intelligently, based on a clear and well communicated set of organizing principles. The language and the operating system just don't matter. In many cases, the same organizing principles used to guide the design can be used to guide the construction and testing of the product as well. What makes a piece of software good has a lot to do with the application of thought to the problem being addressed, and not much to do with what language or methodology you used. To the extent that the OOP methodology makes you think problems through and forces you to make hidden assumptions explicit, it leads to better code.

OOP Claims Unmasked

The hype for OOP usually claims benefits such as faster development time, better code reuse, and higher quality and reliability of the final code. As the last section shows, these are not totally empty claims, but when true they don't have much to do with OOP methodology. This section examines these claims in more detail.

OOP is supposed to allow code to be developed faster; the question is, "faster than what?". Will OOP let you write a parser faster than Yacc, or write a GUI faster than using a GUI-builder? Will your favorite OOP replace awk or Perl or csh within a few years? I think not.

Well, maybe faster than C, and I suppose if we consider only raw C this claim has some validity. But a large part of most OOP environments is a rich set of classes that allow the user to manipulate the environment--build windows, send messages across a network, receive keystrokes, etc. C, by design, has a much thinner package of such utilities, since it is used in so many different environments. There were some spectacularly productive environments based on LISP a few years back (and not even the most diehard LISP fanatic would say that LISP is object oriented). A lot of what made these environments productive was a rich, well designed set of existing functions that could be accessed by the user. An that is a lot of what makes OOP environments productive compared to raw C. Another way of saying this is that a lot of the productivity improvement comes from code reuse.

There is probably no place where the OOP claims are more misleading than the claims of code reuse. In fact, code reuse is a complex and difficult problem--it has been recognized as desirable for decades, and the issues that make it hard are not materially facilitated by OOP.

In order for me to reuse your code, your code needs to do something that I want done (that's the easy part), and your code needs to operate within the same model of the program and environment as my code (that's the hard part). OOP addresses some of the gratuitous problems that occasionally plagued code reuse attempts (for example, issues of data layout), but the fundamental problems are, and remain, hard.

An example should make this clearer. One of the most common examples of a reused program is a string package (this is particularly compelling in C++, since C has such limited string handling facilities). Suppose you have written a string package in C++, and I want to use it in my compiler symbol table. As it happens, many of the strings that a compiler uses while compiling a function do not need to be referenced after that function has been compiled. This is commonly dealt with by providing an arena-based allocator, where storage can be allocated out of an arena associated with a function, and then the whole arena can be discarded when the function has been processed. This minimizes the chance of memory leaks and makes the deallocation of storage essentially free (Similar techniques are used to handle transaction-based storage in a transaction processing system, etc.).

So, I want to use your string package, but I want your string package to use my arena-based allocator. But, almost certainly, you have encapsulated knowledge of storage allocation so that I can't have any contact with it (that is a feature of OOP, after all), so I can't use your package with my storage allocator. Actually, I would probably have more luck reusing your package had it been in C, since I could supply my own malloc and free routines (although that has its own set of problems).

If you had designed your string package to allow me to specify the storage allocator, then I could use it. But this just makes the point all the more strongly. The reason we do not reuse code is that most code is not designed to be reused (notice I said nothing about implementation). When code is designed to be reused (the C standard library comes to mind) it doesn't need object oriented techniques to be effective. I will have more to say about reuse by inheritance below.

One of the major long-term advantages of object-oriented techniques may be that it can support broad algorithmic reuse, of a style similar to the Standard Template Library of C++. However, the underlying language is enormously overbuilt for such support, allowing all sorts of false traps and dead-ends for the unwary. The Standard Template Library took several generations and a dozen of the best minds in the C++ community to reach its current state, and it's no mistake that several of the early generations were coded in Ada and SCHEME--its power is not in the language, but in the ideas.

The final advantage claimed for OOP is higher quality code. Here again, there is a germ of truth to this claim, since some problems with older methods (such as name clashes in libraries) are harder to make and easier to detect using OOP. To the extent that we can reuse "known good" code, our quality will increase--this doesn't depend on OOP. However, basically code quality depends on intelligent design, an effective implementation process, and aggressive testing. OOP does not address the first or last step at all, and falls short in the implementation step.

For example, we might wish to enforce some simple style rules on our object implementations, such as requiring that every object have a print method or a serialize method for dumping the object to disc. The best that many object- oriented systems can do is provide you (or, rather, your customer) with a run-time error when you try to dump an object to disc that has not defined such a method (C++ actually does a bit better than that). Many of the more dynamically typed systems, such as SmallTalk or PenPoint, do not provide any typing of arguments of messages, or enforce any conventions as to which messages can be sent to which objects. This makes messages as unstructured as GOTO's were in the 1970's, with a similar impact on correctness and quality.

One of the most unfortunate effects of the OOP bandwagon is that it encourages the belief that how you speak is more important than what you say. It is rather like suggesting that if someone uses perfect English grammar they must be truthful. It is what you say, and not how you say it.

Moby Code

As programs get bigger, a number of OOP features turn into problems. For example, there is a rough pragmatic rule of thumb that, as programs get bigger, C header files tend to grow as the log of the total program size. Otherwise said, as programs get larger header files tend to make up an ever smaller percentage of the total number of lines of code. By contrast, C++ header files tend to grow proportionally with the total code size, in addition to being larger to start with. Inlined functions, together with all the class private information, are included in the header file. This has some very bad effects, since header files tend to be included in lots of other files.

In C, many compilers were one-pass, and the time to compile a large application grew only slightly worse than linearly as a job got large. In C++, many compilers are multi-pass, and as the job gets bigger the header files tend to get bigger as well. As a result, the time to compile a large program may well grow quadratically or even cubically with the total program size. Of course, machines are faster now, so this effect is not that important for small to medium programs but becomes brutal for large ones. Since OOP is being adopted by many companies with the aim of bringing their development costs for large programs under control, the stage is set for some substantial disappointments.

As an example of this code growth, consider the character I/O provided in C and C++. In the offerings of one workstation vendor the C stdio.h header file is 122 lines, and about 3400 bytes in size. The equivalent in C++ is the stream.h header file, together with the header files it includes. These make up 946 lines, and over 23000 bytes in size. It is clear that stream.h has some real advantages over stdio.h in the C++ world, but are they worth a factor of seven increase in size (especially considering how widely these files are included)?

In the execution realm, OOP also burns cycles to achieve its ends. Many things that used to be structure references have become function calls (although some may be inlined back into structure references again). This not only costs the overhead of the call, but also tends to cripple optimizers (that are allergic to function calls in the inner loops of programs). This effect alone can easily cost a factor of two in performance. I believe that this effect alone makes OOP difficult to justify in any application where performance is important.

I predict that this effect will become sufficiently important that compilers that are designed specifically to optimize C++ will be developed and spread rapidly. There are already C++ compilers that compile header files becoming available (although they have to slice off some of the tips of the language to make this possible). and there are many more potential optimizations that would pay off. Ironically, these compilers will work very hard to compile C++ back into the kind of code quality you would have if you stuck with C in the first place, meanwhile burning cycles by the bushel in the name of optimization.

Is Multiple Inheritance the Real Problem?

While millions of programmers are trying to figure out what multiple inheritance is and when they should use it, there are some seriously difficult and important problems that aren't getting the attention they deserve. We still struggle hard to make even simple user interfaces. We have almost no tools, and little insight, to help us to design network and parallel applications. Even when objects are the right paradigm, we have difficulty coordinating and relating objects to each other. Finally, persistent objects, one of the most fundamental issues of OOP, looks like it will be "solved" in multiple incompatible ways, benefiting nobody.

To build a GUI, a good GUI-builder (or wizard, as Microsoft styles them) is worth two object-oriented languages. While many user interface packages have clearly been influenced by OOP ideas, and widgets are object-like in behavior, it is painful to control widgets in most object-oriented systems. Most widgets have dozens of attributes: size, position, various colors and shapes, text labels, and actions to be performed. It is painful beyond to belief to be forced to specify all these explicitly when a widget is created. It is also painful to create a bare widget and then make a dozen or more calls to set up the fundamental attributes, followed by another call that says "I'm through, go do it now."

Inheritance is sometimes seen as a way to address this-- extend the basic widget definition, set the defaults you want, and then limit the number of arguments to those most clearly needed. This is a very ponderous mechanism to solve a modest problem. More to the point, none of these techniques come close to matching GUI builders. GUI builders allow you to set a set of conventions, borrow defaults from one widget to another, and, most importantly, see the result on the screen exactly as it will look to the application. It will be hard for any language to beat that.

Another technique that has been found useful in controlling GUI's is having multiple threads of control to handle events. C++ does not support threads as first-class citizens in the language. This also limits its usefulness in the GUI area.

Similarly, in developing network and parallel code, the language should supply first-class abstractions of threads and synchronization, since these have proven very useful in this difficult and important area--C++ does not do this. But the problems of writing parallel code with an OO manner go deeper than that.

The key issue in writing parallel code is controlling locality of reference. You want to divide computations between machines so that most of the data a machine needs is stored locally, and you can readily identify and transmit that data that is not local. Operator overloading is inadequate to deal with this problem, since it looks only at a single operator and two operands, while locality is a global issue.

Close Relations

Many common programming problems can be viewed as relating two or more objects with one another. For example, to have a widget reflect the value of a variable in an application, you need to relate the widget object with the value of the object containing the variable in the program. To move an object around on the screen, you want to relate position of the cursor object to the position of the widget. And so on.

Currently, simple relations like these are almost unbearable to program. Because we cannot express the relation directly, we must turn changes of values into events (such as 'mouse motion' events), that in turn cause messages to be sent to the widget. Future generations will hardly believe that we went to all this trouble, when we could just tell the machine the relationship we wanted and let it decide how to make it happen. We already do this with makefiles, and to some degree with Yacc, lex, and awk programs. Tcl has a nice feature (although it is almost totally hidden from casual view) that allows you to get control when a variable's value changes--this would be a nice first step to a system such as I describe.

Linguistic support of these object relations would need to consider relations as first-class citizens, capable of being created, destroyed, and potentially stored on files. There are some great language issues here, and potentially a huge payoff. So why are we worrying about inheritance?

Here Today, Gone Today

Objects in most systems today are ephemeral. If you want to save an object, you must write it out, one byte at a time, onto a conventional file system, and then hope that somebody else who understands what bytes you wrote and why can put Humpty together again when you need to read the object. Anyone who has ever written such code knows that it is exceptionally hard to write, even harder to change, and almost impossible to debug (since the object may not be "there" yet, you often cannot use the standard debugging methods for the object). Persistent object systems are supposed to fix all this: you just say "write" and the object is written.

I am not convinced that C++, in particular, has faced up to some of the serious issues involved in supporting persistent objects. I think it will be very hard to support such language constructs as void pointers, static and global storage, and address arithmetic in the context of persistent objects. Moreover, a single object is rarely what you want--you often have a collection of objects bound together into lists, sets, and graphs, and the same object may be involved with several such structures. Even some very simple cases (such as an object being on two different lists, both of which need to be saved) severely challenge the current mechanisms. Some very smart people are working on this problem--unfortunately, they mostly aren't talking to each other, and it's not clear which groups, if any, will come up with good solutions. I would consider a good object persistence mechanism to be a reason to rethink a lot of my objections to objects.

He said that She said that He had Halitosis

Using a computer language is a social, and even political act, akin to voting for a candidate or buying a certain brand of car. As such, our choices are open to manipulation by marketeers, influence by fads, and various forms of rationalization by those who were burned and have trouble admitting it. In particular, much of what is "known" about a language is something that was true, or at widely believed, at one point in the language's history, but may not be true currently. At one point, "everybody" knew that PL/I had no recursive functions, ALGOL 68 was too big a language to be useful, Ada was too slow, and C could not be used for numerical problems. Some of these beliefs were never true, and none of them are true now, but they are still widely held. It is worth looking at OOP in this light.

Some of the image manipulators target nontechnical people such as our bosses and customers, and may try to persuade them that OOP would solve their problems. As we have seen, however, many of the things that are "true" of OOP (for example, that it makes reuse easy) are difficult to justify when you look more carefully. As professionals, it is our responsibility to ask whether moving to OOP is in the best interests of ourselves, our company, or our profession. We must also have the courage to reject the fad when it is a diversion or will not meet our needs. We must also make this decision anew for each project, considering all the potential factors. Realistically, the answer will probably be that some projects should use OOP, others should not, and for a fair number in the middle it doesn't matter very much.

Summary

The only way to construct good software is to think about it. Since the scope of problems that software attempts to address is so vast, the kinds of solutions that that we need is also vast. OOP is a good tool to have in our toolbox, and there are places that it is my tool of choice. But there are also places where I would avoid it like the plague. It is important to all of us that we continue to have that option.

6/25/97jd