Redesigning Hardware to Support Security: CHERI

Pointer Abuses Will Be Ended with CPU Architecture Changes

June 25, 2022

Deployed System

Authors:

Article shepherded by:

Rik Farrow

CHERI represents a new system design that blocks exploits. Architectural changes to the CPU and memory systems add integrity checks to pointers that prevent reading, writing, or executing from memory that is out of bounds or using corrupted pointers, the most common classes of severe vulnerabilities. CHERI is supported by a complete compiler toolchain, multiple operating systems, is open source, and already implemented for several different RISC CPUs including Arm’s recently released Morello prototype.

The developers of CHERI have described this system as architectural capabilities. Architectural because CHERI involves adding registers and changing data pathways. Capabilities because pointers are replaced with a bounded address range that cannot be extended beyond its initial bounds, granting controlled access. The address and its metadata are guarded by a tag that protects its integrity, and also rules that control its manipulation: the rights of a capability are monotonously non-increasing. Attempts to access memory outside the bounds of an architectural capability, or deference to a corrupted or invalid capability, results in a fault, stopping potential exploitation. The researchers at SRI and Cambridge, later joined by Arm, Microsoft, and Google, have been working on this project for over a decade, and have published numerous papers [1] as well as modifying several operating systems that work with CHERI.

Operating-System Capabilities

Linux and Solaris have capabilities, but they are different from what I’ll be addressing here. These capabilities limit root privileges, so instead of having programs that execute as root having all the root privileges, they have a limited subset of superuser capabilities. This is particularly important in SUID root-owned programs that can be generally limited instead of extremely privileged. These capabilities have been a software feature in kernels for two decades.

Robert Watson and his fellow FreeBSD developers took a different course. They developed Capsicum, kernel and library modifications that allowed programmers to specify exactly which capabilities a running program would possess. Programs using Capsicum open the resources expected, for example, a network socket or a file opened for reading and writing, then disable the ability to gain more resources. Think of an open file handle or socket as a capability to read or write that file or listen to that socket. Any other capabilities that an attack might attempt to coerce out of the executing program are prohibited. Thus, a buffer overflow containing software designed to open a network connection to a remote system and attach a shell to this socket cannot succeed because the capability to create a new socket doesn’t exist.

Capsicum works well in practice, and over 50 FreeBSD programs have been rewritten to use it. But you may have noticed two problems with this approach. First, programmers have to revise programs for Capsicum to work properly, and doing this requires a better than average programmer who can understand just what capabilities may be required. There are still unadapted programs waiting to be fixed on the freebsd.org site.

C/C++ Memory Safety

I haven’t mentioned the second problem yet, but it’s the most important one I’ll be discussing. Exploiting software so that it misbehaves requires doing things, like heap spraying or buffer overflows, that often rely on misusing pointers in the C and C++ programming languages. C and C++ offer perhaps too much flexibility, features that have been used to create hundreds of millions of lines of working, mostly debugged code in operating systems, libraries and software in general. And compilers have matured to the point where they often fix the mistakes that programmers make, for example, by optimizing away code loops that don’t do anything with their results. But not pointer abuse.

There have been many attempts to fix pointer problems in C. Around the turn of the century, researchers developed stack canaries, values written to stacks that could be checked. A buffer overflow designed to overwrite the return address would also overwrite the canary value, and software executed on function return would check the canary’s value and exit when the return pointer has been modified. Attackers soon learned to bypass stack canaries. Changes to memory management units (MMUs) that prevented code written in the stack’s memory being fetched and executed also helped to mitigate buffer overflows. That led to other exploitation techniques, such as return-oriented programming (ROP), where the exploit consists of returning into short sections of code, called gadgets, that each performed a portion of the exploit. That, in turn, led to annotating code to monitor control flow, so that unusual paths through the code, such as ROP, would be detected and the program halted.

Other viable approaches include newer programming languages that avoid the issues that C has. For example, Go lang has bounded arrays, no pointers, and garbage collection for managing memory allocated during program execution. Rust does away with garbage collection by strictly limiting the scope of dynamic variables, but writing code in Rust is difficult. Java, a decades old language, avoided most of the problems inherent in C. But all three of these languages rely heavily on libraries that were written in C and C++.

We haven’t managed to fix C. Instead, what Watson and other researchers have been working on are hardware extensions to current RISC CPUs that fix most of the problems that lead to exploits in C programs. How much, you ask? Based on a test suite with over two hundred different methods for exploiting pointer misuse in C, close to 100%. A Microsoft study suggests that about two thirds of their critical memory safety vulnerabilities are mitigated deterministically by CHERI. That still leaves problems not specific to C, such as parsing and logic errors, as well as use-after-free bugs, but goes a very long way to producing safe code.

Hardware Capabilities

Unlike Capsicum, which uses the concept of software-based capabilities, CHERI is based on hardware changes system-wide. Registers in the CPU control which ranges of memory can be used for loading or storing data, and for fetching code. Memory itself is changed by adding a tag bit that indicates that a region in memory contains a capability. And a capability can only be changed, without resetting the tag, by capability instructions. Capabilities are created at boot time, and capabilities derived from these initial ones are more restrictive. The new instructions for manipulating capabilities also mean changing the instruction set architecture, or ISA.

What are these hardware capabilities? An easy way to start thinking about them is to imagine them as segment registers, although these are far short of what a capability can do. In the 80s, Intel 80286 systems had a simpler segmentation system, used to extend the 64k of memory to 640k by allowing up to ten segments. Segment descriptors appeared in Multics systems in 1965, defining a base location, size, and a list of users and access permissions for this region of memory. This design allowed the controlled sharing of memory between programs.

Multics segments sound a lot like virtual memory, but there are important differences. Virtual memory was created to allow the use of an address space that is larger than the physical memory. Virtual memory today is also used to manage memory by producing a mapping between a process’ address space and physical memory. That mapping must be done for every access of memory and uses special hardware, including the MMU and support for caching and searching page table entries. In current systems, even in desktop systems, there are many thousands of times more pages than page table entries that can be cached. Each page table entry miss requires many accesses to memory to load the correct page tables.

Virtual memory pages are associated with a process, and do include access permissions for that process: read, write, execute. Pages can be shared, but because page table entries are relatively scarce, pages are large, ranging from 4K to megabytes. Sharing memory between processes means that memory gets allocated in large, inflexibly sized, chunks.

In Multics hardware, each process had access to many segment descriptors, allowing fine grained protection within each process that was useful for defense against exploits. Segment descriptors could also be shared between processes, providing both fine grained access control and flexibility in choosing the size of the memory that is shared. Segments had owners, names, and permissions that controlled sharing.

CHERI’s architectural capabilities are unlike Multics segments: they are anonymous, specify a range of addresses that can be accessed, and how they can be accessed (permissions). Every memory access gets checked against an architectural capability, and an access outside of the specified range results in a fault. Capabilities are stored in memory that includes an extra bit, a tag, that is set when a valid capability is stored. Modifying a capability can only be done by special capability instructions without resetting the tag. Instructions that create a delegated version of a capability ensure that the delegated capability cannot be less restrictive, that is, cannot increase the memory range or the permissions. In CHERI, this property is called monotonicity, or more properly, "non-increasing monotonicity".

Another way of understanding capabilities is to compare them to Java references. In Java, you cannot create references, that is, they are unforgeable, they refer only to the memory range of the intended object (bounded), and cannot be treated like integers, for example, incrementing a pointer reference as one can do in C to index through an array.

Perhaps this is easier to understand if considering an example. A string in C is an array of characters terminated by a null, or zero byte. A buffer overflow exploit takes advantage of this simple design by storing more characters than have been allocated for the array with the goal of overwriting an area of memory that will be used for control flow, such as a return address or function pointer, or inserting code that will be executed later.

Architectural capabilities prevent this by allowing access only to the memory allocated for the array. Attempts to write beyond the end of the array cause a fault, and processing that fault will terminate the process.

Architectural capabilities fit into the two paths from the CPU to memory: one for data and the other for code. All addresses for loads or stores, and code fetches, must pass through capability registers: there is no way to bypass these address checks. CHERI-based systems can be used without capabilities by setting the default capabilities to null, and systems can also be used in a hybrid mode where not all code in a system has been converted to using capabilities.

Pure Capability Systems

CheriBSD is an example of a pure capability system. A FreeBSD kernel, as well as its boot chain, libraries and user programs, have all been compiled using a CLANG/LLVM C compiler that replaces explicit and implied pointers with capabilities. Converting the kernel and libraries did take additional work, more than just recompilation because of issues with the way pointers have been used in the past. Capabilities get initialized during the boot process, and get narrowed each time those capabilities are granted to less privileged processes. The kernel itself can use capabilities, and the system call interface must also be changed to take advantage of capabilities. Obvious examples of system calls include program execution and memory allocation.

The kernel creates capabilities for an executed program that limits its access to both code and data. This is not unsimilar to what is commonly done using virtual memory. But the initial capabilities provided to a program become the basis for capabilities within the program, for example, a capability pointing to an array representing a string or a structure. A return address becomes a capability, one that cannot be changed by overwriting memory. Function pointers, such as those in C++ vtables, also become capabilities that cannot be overwritten without resetting the tag that marks them as valid capabilities. The addresses contained in capabilities can be changed, but only within the bounds they were created with, and only with special capability instructions. Thus, pointer arithmetic still works, but is bounded. The permissions associated with a capability, load, store, fetch (think read, write, execute), can be reduced but not increased.

When memory gets allocated by malloc(), instead of a pointer being returned for that memory, a capability that bounds the new memory allocation gets returned. If objects are created within that memory block, they get assigned capabilities with reduced bounds.

In the CheriBSD ASPLOS paper [2], the authors describe some of the problems encountered when building a capability version of FreeBSD. The kernel is in hybrid capability mode, meaning that pointers within the kernel have not been converted to capabilities. Quoting from that paper:

In our prior work, the kernel interacted with capabilities via assembly stubs. Our enhanced version of the CheriBSD kernel is a hybrid C program where nearly all interactions with userspace are via explicitly annotated capability pointers. Of the 675 C and 8 assembly language files in our test kernels, 26 were created to support capabilities and 146 required adaptation for capabilities. In the full kernel source, about 750 files were touched. Other than a single file, implementing the CHERI-MIPS specific portions of CheriABI, the changes for CheriABI apply to any CHERI implementation. We continue to support the large suite of “legacy” mips64 userspace applications that adhere to the SysV ABI, alongside CheriABI userspace programs.

In a more recent project [3] (2020), Robert Watson, Alex Richardson and Ben Laurie converted the software stack for an open-source desktop. They needed to modify only .026% of the six million lines of code, as general purpose C/C++ code needs less adaptation than OS code. They also refined CHERI C/C++ along the way.

While pointers in C/C++ are the size of integers, capabilities have twice the size and must be aligned in memory. The first part of a 64 bit capability includes the permissions, bounds, and object type. The second 64 bits include the address. Structures and other places in existing C code that used integer-sized allocations as pointers will not be properly aligned and involve refactoring. Notably, the amount of space used for pointers gets doubled. The doubling of memory for pointers does harm performance in some applications, but in a few cases, because capabilities are loaded into registers, some applications and benchmarks actually perform better.

Sealed Capabilities

Sealed capabilities offer another advantage. A capability is sealed when the object type field is negative (the high bit is set). Sealed capabilities can be used in environments where there is a flat memory space instead of the typical way that operating systems are designed, with the kernel running as privileged and processes also running in a flat memory space. Sealed capabilities provide a means to partition a flat memory space into regions that are isolated from one another, but still accessible. They can exchange data without interprocess communication (IPC) and execute more privileged code without the penalty of the system call interface. These features are particularly useful for IoT applications and would add to the security of microkernels.

Sealed capabilities also allow the splitting of a single process into partitions that can be accessed only via the capabilities. Sealed capabilities include both the address to jump to and a capability that contains the range of memory that can be used for data. Sealed capabilities nicely express objects with the additional feature that hardware, the architectural capability, provides an inviolable partition within which the object gets manipulated.

Security

The whole point of having architectural capabilities is to improve the security of applications. While the main targets are applications written in C or C++, all applications in any programming language use pointers internally, as they are an integral part of ISAs that allow indirect addressing.

A pointer is a unit of data that represents a location in memory. But pointers in most CPUs are not represented by using a type of registers specific to pointers. There are integer registers and floating point registers, but not pointer registers. Pointers are dealt with as if they were integers, and that has been the primary source of the problems with pointers.

In CHERI, pointers are a specific data type that have been enshrined in architecture. They get loaded in special registers, and flagged in memory with tag bits that are not accessible as ordinary memory, but only via interactions with the CHERI ISA.

The design of CHERI architectures for MIPS, RISC-V, and Arm began with the Sail instruction set definition language. Sail generates sequential emulators, in C and OCaml, and theorem-prover definitions, in Coq, HOL4, and Isabelle, and SMT. These versions allow for both a means of testing in emulation and creating formal proofs of the correctness of the design.

Another way of considering if CHERI works in practice is to test the architecture using software designed for testing static analysis tools. BOdiagsuite [4] is a suite of 291 programs for probing memory safety. There are four ways to run each program: with no memory violations, with a one byte memory violation, with an eight byte violation and with a 4K violation. In their CHERI-ABI paper, the authors tested their version against Address Sanitizer [6] and had better coverage, lower memory overheads and better performance.

Future Work

In January of 2022, Arm announced the Morello board, a system-on-chip (SoC) design that implements CHERI. Morello represents the first time that CHERI has been implemented in silicon, rather than emulations (QEMU) and FPGA versions of MIPS and RISC-V. All versions of CHERI include a toolchain of CHERI Clang/LLVM/LLD and GDB, Cheri-FreeRTOS, and CheriBSD.

The software is all under an open source license, as are the architecture specifications. Arm and Cambridge also have published the intellectual property for the ISA.

CHERI is not the first hardware attempt at protecting pointers. There are both earlier research work, although none that made it beyond emulation, and an actual implementation that’s been in Arm processors for years. Pointer Authentication Code (PAC) uses some of the highest bits of pointer addresses to store a hash of the lower bits of the address. The size of the hash used for authentication depends upon the size used by virtual addresses, and can vary from 3 to 24 bits [5]. Arm CPUs include hardware for calculating the hash and checking it upon pointer use. An MIT team of researchers [7] recently found a method that combines speculative execution and side-channels to bypass PAC, although their technique involves physical access to the CPU, in their case, an Apple M1.

Another earlier approach in Arm includes using the highest four bits of memory addresses to select one of 16 different address colors [8]. When a pointer is dereferenced, the four bits of the region address must match that of the address referenced. If a pointer is modified by an exploit, the odds are that the region in the modified pointer will not match the address referenced.

I asked Robert Watson to compare these approaches to CHERI. “CHERI does something much stronger: It provides general fine-grained memory safety and scalable compartmentalisation. As part of doing that, [CHERI] provides pointers with strong integrity and provenance validity properties that, along the way, protect against a largely similar adversary model with respect to pointer injection. Only unlike PAC, the protections are deterministic and not secrets-based — there’s no key you can guess, hash that you can collide with, and leakage of an in-memory pointer value doesn’t enable re-injection.”

The approach used with CHERI is still evolving. More testing and development still needs to be done. Just getting to this point took many people many years of work, both in architecture design and software development. The Morello program contains a wider set of CHERI-specific instructions in the ISA than may wind up in a final design. Or, perhaps more capability-specific instructions will be needed for security or performance.

CHERI focuses on what is called spatial memory safety, the prevention of abuse of pointers to access memory that is out-of-bounds or to influence control flow. There is another issue with pointers, known as temporal memory safety. In a blog post, some Google engineers described use-after-free (UAF) bugs as the most serious security problem that still exists in C or C++ code today, although I think this is an exaggeration. Use-after-free refers to use of pointers after the memory they reference has been freed or the object they refer to has been deleted. CHERI may also help with temporary memory safety in that capabilities are tagged in memory. Existing software solutions, including the one proposed in the blog post, that put freed pointers into quarantine currently involve performance expensive searches through program memory for freed pointers.

Researchers at Microsoft have been working on a temporal memory safety solution, called Cornucopia, that builds upon CHERI. Like the Google engineer's proposed solution, Cornucopia also searches through memory for freed pointers, but takes advantage of a change in virtual memory by marking pages with pointers with a flag, so pages without pointers never get searched. With the inclusion of UAF checking, Microsoft has stated that CHERI reduced software vulnerabilities by 67%.

Only the passing of time will reveal if architectural capabilities will become the most important security advance in CPU architecture, the predecessor to a better design, or just a footnote in computer research history. I think the potential is there for an effective improvement that can appear in RISC CPUs soon.

Acknowledgements

The author wishes to thank Robert N. M. Watson and Peter G. Neumann for their assistance in writing this article.

Appendix

References:

[1] CHERI Publications, a good place to start learning more about CHERI: https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-publication...

[2] Brooks Davis, Robert N. M. Watson, Alexander Richardson, Peter G. Neumann, Simon W. Moore, John Baldwin, David Chisnall, Jessica Clarke, Nathaniel Wesley Filardo, Khilan Gudka, Alexandre Joannou, Ben Laurie, A. Theodore Markettos, J. Edward Maste, Alfredo Mazzinghi, Edward Tomasz Napierala, Robert M. Norton, Michael Roe, Peter Sewell, Stacey Son, and Jonathan Woodruff. CheriABI: Enforcing Valid Pointer Provenance and Minimizing Pointer Privilege in the POSIX C Run-time Environment. In Proceedings of 2019 Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). Providence, RI, USA, April 13-17, 2019

[3] Watson, R.N., Laurie, B., and Richardson, A. Assessing the Viability of an OpenSource CHERI Desktop Software Ecosystem, report: http://www.capabilitieslimited.co.uk/pdfs/20210917-capltd-cheri-desktop-...

[4] A Rescued Copy of BODiagSuite: https://github.com/CTSRD-CHERI/bodiagsuite

[5] Hans Liljestrand, Thomas Nyman, Kui Wang, Carlos Chinea Perez, Jan-Erik Ekberg, N. Asokan, PAC it up: Towards Pointer Integrity using Arm Pointer Authentication; USENIX Security Symposium (USENIX Security 19), pages 177-194: https://www.usenix.org/system/files/sec19fall_liljestrand_prepub.pdf

[6] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov. Address-Sanitizer: A fast address sanity checker. Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), pages 309–318, Boston, MA, 2012. USENIX.

[7] Ravichandran, J., Na, W.T., Lang, J., and Yan M. PACMAN: Attacking Arm Pointer Authentication with Speculative Execution, ISCA ’22, June 18ś22, 2022, New York, NY, USA. https://pacmanattack.com/paper.pdf

[8] Serebryany, K. Arm Memory Tagging Extension and How It Improves C/C++ Memory Safety; ;login: Vol 44, No. 2; Summer 2019. https://www.usenix.org/system/files/login/articles/login_summer19_03_ser...

Article Categories:

Security

Programming

Last updated February 8, 2023

Authors:

Rik Farrow has been a consultant for over 40 years. He has written two books, as well as worked as the technical editor for a UNIX magazine and for two editions of a popular operating system book. He also taught UNIX system administration and Internet security during the 90s internationally, consulted in security, and worked as a volunteer for USENIX program and steering committees. Rik has been the editor of ;login: since 2005.

[email protected]

Redesigning Hardware to Support Security: CHERI

Comments

A very useful article about