The Beauty of Static Types

SIGINFO

April 28, 2021

Column

Authors:

Article shepherded by:

Rik Farrow

C is more than forty-five years old, and C++ in its current form has been around since 1985. Nevertheless, compiling old code with modern compilers can be quite a challenge. In this column, I’ll explore some of the reasons why it’s hard to get old code to run on modern systems, share my experiences porting a 10-year-old program to C++17, and end with some suggestions for people who want to keep the skills current.

C++ is a beautiful language.

I didn’t think that this was always the case. Compared with Objective-C and SmallTalk, I found C++ to be overly complex. I thought that the language valued runtime efficiency over programmer productivity. This made sense if you had a small programming team that was writing code to run on millions of systems across the country, I reasoned, but it didn’t reflect the cost/benefit calculus of most software developers.

Looking back, I see that I was mistaken. By rigorously enforcing types, C++ was actually forcing me to spend time on software design, and less time churning out lines of code. This is an efficient tradeoff, but it’s also a painful one. Like many programmers, at the time I found design to be considerably more difficult. That’s not surprising: I lacked both training and experience in software design. I was good at figuring out the interface for a few individual classes, but I had a hard time designing a consistent set of interfaces so that a large-scale application would function as a unified whole.

After my initial foray into C++ in the early 1990s, I put it aside until 2006, when I started working on a digital forensics program that required both speed and portability. Some of the program’s requirements were that it had to never crash, no matter what inputs it was given; it had to run as fast as possible, ideally using all of the available cores on a system’s microprocessor; and it had to run on Linux, MacOS and Windows. The only way to meet these requirements was to write the program in C++.

Many program crashes are the result of logic errors that arise from unexpected data inputs. One way to defend against this is through rigorous input checking—not just for program inputs, but between different modules within the same program. At the time, many programmers in my circle thought that this was a fine idea as long as we didn’t get carried away. For example, they said, one of the reasons that we were using C++ rather than a higher-level language was that we couldn’t afford to have array checking on every access. So when I was at the Naval Postgraduate School, I worked with Software and Systems Division at the National Institute of Standards and Technology (NIST) to study the impact on our software of performing run-time bounds checks on every memory access [1]. We discovered that branch prediction on then-modern superscalar CPUs resulted in a measurable degradation of less than 1% on our microbenchmark. With this result in hand, we decided to make the maximal use typesafe containers provided by the C++ Standard Template library, and we built our own classes on top of them that provided additional checks.

To make optimal use of all the cores, we developed a simple worker/consumer and thread pool system for the application, built on top of POSIX Threads. This turned out to be a good choice: in 2010 when we purchased a computer with 64 cores, our application was able to peg each core at 100% CPU usage.

For portability, we painstakingly mastered GNU autoconf, adding macros as necessary so that the application could compile on many different versions of Linux, Unix and MacOS. We supported Windows by cross-compiling with the mingw cross-compiler on Fedora, rather than using Cygwin, because the Cygwin DLL was not multi-threaded.

Although some of the code in my digital forensics tool dated back to a project I had worked on in the 1990s, most of the code was developed between 2006 and 2012. The program continued to be used after that, but largely went without active development.

Something has changed

There’s lots of reasons why it’s hard to get old code operational on modern environments, but at their core, all of those reasons are the same: something has changed.

Sometimes what’s changed is the programming language itself. It’s hard to remember today, but C originally assumed that arguments of a function were integers unless declared otherwise, and the compiler allowed programs to call functions even if they were not declared in #include files. The compiler didn’t even generate warnings for such sloppiness! Today this is a source of confusion for programmers who weren’t alive when K&R [2] was first published.

Consider the classic “hello, world” program from page six of K&R First Edition:

main()
{
printf("hello world\n");
}

To get this same program to compile and run today, you’ll need to add a few decorations:

#include <stdio.h>
int main(int argc, char **argv)
{
printf("hello world\n");
return 0;
}

Other times what changes is the underlying language itself. The C99 standard added _Bool as a fundamental type—a type recognized by the compiler itself, rather than defined in the #include files. Twelve years later, the C11 standard added _Thread_local as a storage-class specifier.

One of the reasons that software requires maintenance and upkeep is that the underlying standards evolve. I thought I was fairly safe using POSIX threads back in 2006—after all, POSIX itself was a standard. And indeed, POSIX threads still work. But both the C and the C++ standardization committees each developed their own, improved, threading models. As new releases of operating systems came out between 2012 and 2020, I needed to make minor changes to my code to get it to compile without warnings.

A big source of warnings that crept into my code was the significant performance improvement to C and C++ compilers over the past fifteen years. Today’s compilers produce code that is generally faster because they produce an object that is more highly optimized, both to the underlying microprocessor and to the semantics of the program that is actually being compiled. An unfortunate side effect of this process is that code that once compiled and ran without errors can develop problems as compiler writers learn to exploit aspects of the programming language that they had previously ignored.

Two papers that demonstrates this problem and that made a huge impact on me are Wang et al’s “Undefined Behavior: What Happened to my Code?” [3] and D’Silva et al’s “The Correctness-Security Gap in Compiler Optimization,” [4]. These papers—and a lot of follow-up work—show how improved compilers can result in the removal of security checks, if the checks inadvertently occur in the context of undefined behavior. That’s because the compilers are rigorously following the standards, and the standards were written by language experts, in part, to make it possible to create compilers with more aggressive optimizers. It’s the code that needs to be fixed. Unfortunately, there’s a lot of broken code out there.

As a result, popular C/C++ compilers have a growing number of options that allows software developers to disable various optimizations and warnings to allow old code to compile on new systems without undergoing a major rewrite. Using these options means that your code may run less efficiently and may contain hidden bugs or even security vulnerabilities. The alternative is to stay on top of your build environment, aggressively turn on every warning possible, and develop the cleanest code that you possibly can.

Faced with a growing number of warnings in my own code, in the spring of 2020 I started on the task of revising my digital forensics tool to take advantage of developments in the C++ standard, compiler technology, and development tools that had taken place over the past decade. This turned out to be a huge project, as I had largely stopped following C++ in 2012 and have been developing almost exclusively in Python since 2014.

Fifteen Years of Technical Debt

My software update project is taking much longer than I originally anticipated, in part because it also became an opportunity to refactor the codebase and because this is a side project. But it’s also because I’m trying to build a platform that I hope will last another 10-20 years.

The first thing I needed to do was to figure out which C++ standard to learn. Back when I was deploying code in 2010, one of the problems that I frequently encountered was that some of my users were trying to compile it with C++ compilers from the early 2000s. These were typically government users who were running the code on old operating system releases. When these users had problems compiling my code, I suggested that they update their compilers, and that usually did the trick, but not always. It turns out that different C++ compilers on different platforms all implemented slightly different versions of C++. There were also numerous incompatibilities in the Standard Template Library (STL) on each of the systems as well. This caused me to minimize the number of advanced features I wanted to use—especially features that required a good optimizer to implement them efficiently.

The great thing about the C++ standardization process is that it makes the process of porting code to different environments far more predictable. After doing some reading, I discovered that I could simply decide upon which version of the C++ standard to use (C++11, C++14 and C++17 were the obvious choices) and then set a compiler flag saying “use this version of the standard.” Initially I picked C++11, because it has the best support. After discussion with colleagues, however, I decided to use C++17, which seems to have pretty good support as of 2020, and should have excellent support next year when my software is finally released. I decided against going with C++20, reasoning that it’s just too new.

The next thing I did was start reading Bjarne Stroustrup’s 1376-page tome, The C++ Programming Language, 4th Edition (updated for C++11). A colleague gave me his copy years ago when I was switching jobs. It turns out that C++11 was a significant departure from the previous version, whereas the versions since have mostly filled in the gaps and expanded functionality. Reading the book from cover-to-cover has really helped me understand the importance of trusting the compiler and concentrate on writing code that’s clean. Yes, it’s important to write code that’s efficient, but I should focus my attention on writing clear, efficient algorithms, not on trying to squeak out milliseconds by using hand-coding specific memory representations that I hope will achieve some kind of micro efficiency.

One of the things that I learned, for example, is that I can write code that’s more reliable and easier to debug by passing around C++ references rather than pointers. Previously my code was filled with pointers to objects, many of which I had created with new. This was a hold-over from my days as a C and then an Objective-C programmer. My code ran just as fast when I migrated this to C++ references, but my code also got smaller because I no longer needed to check to see if pointers were NULL (or, more accurately, the nullptr). Now granted, I should have done this from the beginning, despite reading Andrew Koenig’s 1989 classic “C Traps and Pitfalls,” I had somehow missed out on reading a good book on proper C++ style.

Reading Stroustrup’s 4th Edition taught me about changes in the underlying language that I had missed out on. For example, C++11 added a new keyword called override which tells the compiler that I have created a method that overrides a method in a base class. This lets the compiler flag a whole bunch of error conditions, such as failing to mark the method being overridden as virtual, or using a const method to override a method that is not const.

Having decided to thoroughly modernize my code base, the next big decision I made was that I would not make a series of small, incremental changes to the code, but keep the system running at all times. No, I decided that I would pursue massive, release-breaking changes. I decided to rip the entire program apart, break all of the internal interfaces, and slowly reassemble it piece-by-piece.

Let’s be clear: ripping something to pieces and then slowly putting it back together is a much riskier approach than slowly evolving the code base. There was a chance that I would never finish. But the need to maintain a working code base would have also significantly increased my overall development effort, which I think would have prevented me from making radical changes that I believe will significantly simplify the system overall.

Unit Tests and Code Coverage

As I rebuild the underlying system, I’m taking advantage of other advances in software engineering that have come available over the past fifteen years. Most important among these are improvements in automated testing.

Test-driven development and automated testing should be part of every software project, but in practice many programmers view testing as a compliance exercise and don’t understand how they could personally benefit from this methodology.

Without some kind of automated testing, you have no way of telling if you accidently break one aspect of your program when you add a feature or fix some other bug. So it’s common to write tests after an application is finished to make sure that the application is functioning the way that’s expected. Think of these as end-to-end tests.

Test-driven development turns this process on its head. Instead of writing a piece of code first and then writing some kind of test, you start by writing the test, and then you write the code that’s being tested. The idea is it should be easier to test if code is working properly or not than to actually write the code in the first place. (Deep underneath, this is related to the whole question of P vs. NP.) It’s also the case that code that’s written to be tested tends to have stronger isolation between modules and better-specified APIs. As a result, it’s my expectation that code that’s easier to test tends to be better written and more reliable in the first place, although I don’t have any research to back this up.

“Test coverage” refers to the portion of the program that’s actually tested by the unit tests. For example, a program that counts the number of words in a file will typically generate an error message if the file can’t be opened. But unless you have a unit test that explicitly tests the program with a non-existent file, this aspect of the program might have a lurking bug. GCC has had the ability to instrument test programs to generate coverage reports since 1989 or so; it’s been operational in the Linux kernel since 2003 [5].

Here’s how the tools work: You compile your programs with a special flag which causes the resulting binaries that it generates to be “instrumented” so that each line of code sends its filename and line number to a specific file each time it executes. You then run your unit tests and then analyze this file with a coverage analysis program and it tells you which lines were run during testing—and thus, were presumably tested—and which lines were not.

All of this technology has been part of GCC and its descendant compilers for more than thirty years, but I never used it until this summer when I added automatic unit testing to my forensics program as part of a continuous integration workflow on GitHub. Now, whenever I do a push to my development branch, a GitHub Action automatically compiles my program, runs the unit tests, and finally uploads the unit tests to a website that lets me monitor the test coverage of each source file. If any part of this doesn’t work, GitHub sends me email. This was all so ridiculously easy to set up that you would think that every software project would be using such tools and that every computer science program would be teaching its students to use them. Confusingly, this does not seem to be the case.

The Outlook is Cloudy

My personal C++ modernization project has been a lot of fun and I’ve learned a lot, but it’s also been filled with frustrations. It’s been gratifying to see how the C++ language has matured and stabilized over the past fifteen years: by aggressively following the standard, I’ve been able to remove the majority of the #ifdef and #pragma compiler directives from my source code. By going through my code line-by-line, I’ve found many opportunities to improve concurrency, improve reliability, and decrease memory copies. I’m pretty confident that the new version of the program will run measurably faster than the previous version.

At the same time, the core C++ standard has some perplexing holes. For example, the C++11 standard has support for Unicode UTF-8, UTF-16 and UTF-32 encodings, but it lacks support for many Unicode operations such as Unicode normalization. Also missing is full support for the std::u32string type: there’s no straightforward way to lowercase a UTF-32 string, for instance. These features are provided by both the popular Boost library and by IBM’s ICU library, but it would certainly be more straightforward to have better Unicode support in the standard itself.

Likewise, C++ is still missing automatic memory management in the form of garbage collection, as well as some way to disable all potential unsafe memory operations entirely. For these reasons, my security-conscious system programming friends keep telling me to learn Go or Rust, but those languages are even less popular than C++, at least according the PYPL PopularitY of Programming Language index on GitHub.

Looking forward, it’s my hope that the increasing interest in both energy-efficient computing and writing code that runs on extraordinarily limited “Internet-of-Things” devices will generate new interest in C++, and that better compilers and increased use of formal methods will make C++ easier to write while eliminating the need for debugging.

As for my forensics program, hopefully it will be ported to C++17 and available for download on GitHub by the time you read this.

Appendix

References:

[1] David Flater and William Guthrie, “A Case Study of Performance Degradation Attributable to Run-Time Bounds Checks on C++ Vector Access,” Journal of Research, National Institute of Standards and Technology (NIST), May 22, 2013. https://doi.org/10.6028/jres.118.012

[2] Brian Kernighan and Dennis Ritchie, The C Programming Language, Prentice Hall, 1978.

[3] Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zeldovich, M. Frans Kaashoek, “Undefined Behavior: What Happened to My Code?” APSys ’12, July 23-24, 2012. Seoul, S. Korea. https://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf

[4] Vijay D’Silva, Mathias Payer, Dawn Song, “The Correctness-Security Gap in Compiler Optimization,” IEEE Security and Privacy Workshops, May 21-22, 2015. San Jose, CA. https://ieeexplore.ieee.org/document/7163211

[5] Paul Larson, Nigel Hinds, Rajan Ravindran, Hubertus Franke, “Improving the Linux Test Project with Kernel Code Coverage Analysis,” Linux Symposium 2003, pp. 260-274. https://www.kernel.org/doc/ols/2003/ols2003-pages-260-274.pdf

Article Categories:

Programming

Last updated February 8, 2023

Authors:

Simson L. Garfinkel is a Senior Computer Scientist at the US Census Bureau and a researcher in digital forensics and usability. He recently published The Computer Book, a coffee table book about the history of computing.

[email protected]

The Beauty of Static Types

Comments

Thanks for a great article!