FloatZone: How Floating Point Additions can Detect Memory Errors

November 1, 2023

Research

Authors:

Floris Gorter, Enrico Barberis, Raphael Isemann, Erik van der Kouwe, Cristiano Giuffrida, Herbert Bos

Article shepherded by:

Rik Farrow

Programmers make mistakes, and these mistakes leave computer software vulnerable to exploitation. Even today, programmers often rely on unsafe languages such as C and C++ for their high performance and large control over memory management. Unfortunately, secure programming is difficult to get right, and when the programmer manages memory improperly, it often results in memory safety vulnerabilities. The consequences of memory errors can be severe, as for example demonstrated by the notorious Heartbleed [3] vulnerability, which left a large number of web servers open to attacks. Even after decades of combating such memory errors, Microsoft and Google report that about 70% of their serious security bugs continue to be memory safety issues [1, 2].

Thankfully, we can detect these memory errors through software testing tools, but they often come at the cost of significantly slowing down programs at runtime. We have developed FloatZone to overcome these performance issues by leveraging an underutilized unit of modern CPUs. In this article, we explain how FloatZone works and show that it can detect memory errors faster than existing solutions.

The rise of Memory Sanitizers

Driven by the many security incidents involving memory errors in system software, sanitization for memory safety has become a standard technique for bug discovery in software testing. Sanitizers are powerful tools usually aimed at discovering two main categories of bugs: spatial and temporal memory violations.

Temporal safety: all memory accesses to an object must happen during its lifetime. For example, use-after-free and double-free bugs are violations of temporal safety.
Spatial safety: all memory accesses must occur within bounds of the referenced object. For example, heap and stack buffer overflows are violations of spatial safety.

Unfortunately, identifying which memory accesses are in fact memory safety violations is not easy. Consider the following small example: int *ptr = …; *ptr = 123;We would like to know whether this memory access is safe, since at runtime the pointer could be out of bounds, or point to deallocated memory, thereby potentially introducing security issues. A common method to detect memory errors is by surrounding memory objects with inaccessible redzones. The sanitizer ensures memory accesses within the redzone will fault, thereby detecting spatial memory errors. Deallocated memory can be similarly guarded to detect temporal memory errors by marking the freed memory as a redzone. To clarify, when performing malloc(), the resulting memory layout with redzones looks like this:

Figure 1: Example memory layout with redzones guarding a memory object.

In order to detect erroneous memory accesses, the sanitizer must distinguish between accessing valid and invalid (redzone) memory. To achieve this, sanitizers, such as AddressSanitizer [4] (ASan), commonly accompany every memory access with a runtime check for validity. These checks have the following form:

if(lookup(ptr) == REDZONE) { // check error_and_exit(); } *ptr = 123; // original code

Listing 1: The common structure of sanitizer checks.

While sanitizers are crucial for identifying and debugging potentially exploitable bugs, these memory error detection capabilities come with a steep cost. ASan slows down target programs by roughly 2x, and therefore typically does not see production deployment [8, 9], but the high overhead also negatively impacts the number of executions in an automated software testing campaign (e.g., fuzz testing). The main component of this slowdown originates from the pervasive checks for validity. In fact, a recent analysis attributes approximately 80% of ASan’s overhead to the checks [5].

Checks under the microscope

When we put the checking logic under a microscope, we observe that the resulting operations can be expressed as a “lookup, compare, and branch” paradigm.

int res = lookup(ptr); // Lookup: gather redzone metadata if(res == REDZONE) { // Compare: check if redzone or not error_and_exit(); // Branch: abort or continue }

Listing 2: The traditional three-step checking paradigm: lookup-compare-branch.

Since the sanitizer must apply these checks to every potentially unsafe load and store operation, speed is of the essence. For this reason we have to carefully inspect the cost of each component to understand the performance of such checks. Depending on how the sanitizer maintains its redzone metadata, the lookup(ptr) operation can be as little as a single load, but it can also involve more complex logic like pointer arithmetic followed by a load (for lookup tables). On top of this, the subsequent compare-and-branch steps induce a branch-heavy control flow, forcing the CPU to waste precious clock cycles on jumping around instead of performing the original workload.

With all these challenges in mind, we ask the question: can we accelerate these checks to achieve fast memory error detection?

Exceptions are Comparisons!

In our research, we find that a floating point addition can be made to generate an exception if it processes redzone data. We achieve this by configuring a floating point addition to result in an exception if and only if one of the operands is equal to our redzone poison value. By instrumenting load and store operations with the addition, we ensure that redzone accesses raise an alarm, as visualized in Figure 2:

Figure 2: Verifying memory accesses using floating point additions.

By expressing checks using floating point exceptions, we can gain great benefits:

The cumbersome compare and branch steps are replaced by the existing CPU hardware logic that implicitly detects exceptions.
Keeping the control flow of the program as-is (no additional branching) promotes efficient use of the CPU frontend and branch prediction.
The addition is performed on an execution unit that is underutilized in most programs (the FPU), allowing for high instruction-level parallelism.

We found the specific operands of the floating point addition by considering various constraints. First, we need to avoid collisions with other numbers as much as possible to avoid false positives. In essence, we look for a fixed value x such that we can find only one (or few) y value(s) where x + y generates an exception. Second, we require that y follows a byte-wise repetitive pattern (e.g., 0x4a4a4a4a) for redzone alignment reasons.

With all these constraints in mind, by searching the floating point number space in a brute-force manner, we discovered a suitable configuration. Specifically, with x = 5.375081 · 10⁻³² (x=0x0b8b8b8a), x + y causes an underflow exception only with y = −5.3750813 · 10⁻³² or y = −5.37508 · 10⁻³² (y=0x8b8b8b8b or y=0x8b8b8b89). These exceptions occur because the result of the addition is such a small number that it cannot be correctly represented in a single-precision (32-bit) floating point number. Also, note that y=0x8b8b8b8b is a byte-wise repetitive pattern.

The specific combination of numbers that we discovered allows us to express the comparison in Listing 3 by performing float(y) + float(0x0b8b8b8a). Note that the exception is only raised if y is equal to one of the two identified values.

// When performing this FP addition float(y) + float(0x0b8b8b8a) // The CPU implicitly checks for this exception condition if( y == 0x8b8b8b8b || y == 0x8b8b8b89 ) { goto fp_exception_handler; }

Listing 3: Pseudocode view of the implicit check as a result of performing float(y) + float(0x0b8b8b8a).

FloatZone the Sanitizer

Now that we have an efficient means to evaluate if a four-byte memory location holds 0x8b8b8b8b or 0x8b8b8b89, we briefly visualize how we built FloatZone: a sanitizer for spatial and temporal memory issues. See Figure 3 for an overview of how FloatZone inserts redzones for spatial and temporal memory error detection on the stack and the heap. We place our four-byte float constant (0x8b8b8b8b) around each memory object, repeating it as necessary (for example to create 16-byte redzones). We employ a repetitive poison pattern (0x8b8b8b8b), allowing us to read four bytes from the starting point of any memory access without further requirements for alignment. This concept is visualized in Figure 2 (above), where it is clear that any four-byte access within the redzone results in the same poison pattern being read. When discovering our repetitive poison value, we noticed that there is one additional colliding value: 0x8b8b8b89. Since the 0x89 byte is stored as the first byte in little endian representation, we can use it as a start marker for our redzone (see Figure 3), which ensures that we can separate objects ending in 0x8b from the start of the redzone.

Figure 3: Redzone management on the heap (top) and stack (bottom). When heap memory is freed it is marked as a redzone to detect use-after-free. In contrast, stack memory must be marked as valid (zeroed out) when the lifetime of the variable ends to avoid false positives caused by phantom redzones.

Evaluation: Float Arithmetic Checks

Our performance evaluation shows that floating point additions are significantly faster than compare-and-branch instructions, and newer CPU architectures widen the gap even further. We created two compile-time instrumentation passes that either insert a floating point addition or a compare and (non-taken) branch on every memory access, and apply this to the SPEC CPU2006 benchmarking suite. Figure 4 displays the runtime overhead results of this experiment, across various CPU generations.

Figure 4: SPEC CPU2006 geomean runtime overhead of instrumenting load and store operations with a compare+branch or a floating point addition across CPU generations.

For the compare-and-branch instrumentation, we observe that across all Intel microarchitectures the relative runtime overhead remains nearly identical at a geomean of 50%. In the most recent generation Intel CPU (i9-13900K), the geomean runtime overhead of inserting a float add is 24.9%, which is half the relative cost of a cmp+branch, and also 8 percentage points lower than the float add pass on the less recent i7-10700K. These results suggest that the FPU has become significantly faster in recent Intel generations. This hypothesis is supported by the fact that two additional FADD (Fast Add) units were added to the pipeline on the Golden Cove microarchitecture [6]. We additionally verify that the FP addition benefits are not Intel-specific by performing the same evaluation on an AMD Ryzen 9 5950X with the Zen 3 microarchitecture and an Apple M1 (Firestorm).

Evaluation: FloatZone

Next, we evaluate the runtime overhead of FloatZone using floating point exceptions to sanitize memory errors. Figure 5 displays the runtime overhead of FloatZone on each individual SPEC CPU2006 binary. Overall, FloatZone results in a geomean runtime overhead of 36.4%.

Figure 5: SPEC CPU2006 runtime overhead buildup of FloatZone.

Most of the overhead originates from the floating point checks (25%, as also seen above in Figure 4). After the float checks, the heap quarantine is the second largest factor in the overhead buildup. While quarantines are orthogonal to our design, they do introduce overhead to ensure heap memory is not immediately re-used to guarantee temporal memory error detection. The remaining overhead originates from managing redzones on the heap and the stack, combined with the small costs of enabling floating point exceptions.

To put the overhead of FloatZone into better perspective, we compare the overhead to a state-of-the-art sanitizer: ASan (and ASan--, an optimized version of ASan), using both the SPEC CPU2006 and CPU2017 benchmarking suites. In comparison, ASan and ASan-- report overheads that are significantly higher (see Figure 6): 77.8% and 65.9% on SPEC CPU2006, respectively. Here, FloatZone is more than twice as fast as ASan. On the more recent SPEC CPU2017 benchmarking suite, FloatZone shows a similar performance with a geomean runtime overhead of 37%, which is again significantly faster than ASan(--).

Figure 6: SPEC CPU2006 and CPU2017 geomean runtime overhead comparison with ASan(--).

Conclusion

Sanitizers for memory safety have become a standard in software testing, and despite recent optimizations, state-of-the-art bug detection tools still incur high runtime overhead. We improved performance by introducing a faster check for validity on commodity hardware. We showed that we can use floating point arithmetic to express the common compare-and-branch sanitizer paradigm. We used this primitive to build a memory sanitizer called FloatZone, that relies on carefully crafted floating point underflow exceptions to identify memory violations. We showed these checks using floating point additions are indeed notably faster than comparison instructions. Moreover, we showed that our resulting sanitizer significantly outperforms the state-of-the-art solutions.

For more information on FloatZone, we refer to our publication: ‘FloatZone: Accelerating Memory Error Detection using the Floating Point Unit’, in USENIX Security 2023 [7].

Appendix

References:

[1] Matt Miller. "Trends, challenges, and strategic shifts in the software vulnerability mitigation landscape." https://github.com/Microsoft/MSRC-Security-Research/blob/master/presenta... 2019

[2] Chromium. "Memory safety". https://www.chromium.org/Home/chromium-security/memory-safety/

[3] Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beekman, Mathias Payer, Nicolas Weaver et al. "The matter of heartbleed." In Proceedings of the 2014 conference on internet measurement conference, pp. 475-488. 2014.

[4] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy Vyukov. "AddressSanitizer: A fast address sanity checker." In 2012 USENIX annual technical conference (USENIX ATC 12), pp. 309-318. 2012.

[5] Yuchen Zhang, Chengbin Pang, Georgios Portokalidis, Nikos Triandopoulos, and Jun Xu. "Debloating address sanitizer." In 31st USENIX Security Symposium (USENIX Security 22), pp. 4345-4363. 2022.

[6] Intel. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 248966-046, 2023.

[7] Floris Gorter, Enrico Barberis, Raphael Isemann, Erik van der Kouwe, Cristiano Giuffrida, and Herbert Bos. "FloatZone: Accelerating Memory Error Detection using the Floating Point Unit." In 32nd USENIX Security Symposium (USENIX Security 23), pp. 805-822. 2023.

[8] Dokyung Song, Julian Lettner, Prabhu Rajasekaran, Yeoul Na, Stijn Volckaert, Per Larsen, and Michael Franz. "SoK: Sanitizing for security." In 2019 IEEE Symposium on Security and Privacy (SP), pp. 1275-1295. IEEE, 2019.

[9] Vlad Tsyrklevich. "GWP-ASan: Sampling heap memory error detection in-the-wild". https://www.chromium.org/Home/chromium-security/articles/gwp-asan/ 2019

Article Categories:

Security

Programming

Last updated November 2, 2023

Authors:

Floris Gorter is a PhD student at VUSec, the Systems and Network Security group at Vrije Universiteit Amsterdam (the Netherlands). His research focuses on software security, memory safety, and malware analysis. Recently, he published research on efficient use-after-free detection (DangZero) and accelerating memory error detection using the FPU (FloatZone).

[email protected]

Enrico Barberis is a PhD candidate at VUSec. His current research focuses on microarchitectural attacks and all intrinsic threats introduced by hardware design flaws. In his recent works, he disclosed microarchitectural vulnerabilities such as Floating Point Value Injection and Branch History Injection.

[email protected]

Raphael Isemann is a PhD student at VUSec, the Systems and Network Security group at Vrije Universiteit Amsterdam (the Netherlands). His research is focused on the reliability and accuracy of error detection systems.

[email protected]

Erik van der Kouwe is an Assistant Professor in the Department of Computer Science at the Vrije Universiteit Amsterdam (VUA). Erik has a broad interest in computer systems and security, with a particular focus on techniques to mitigate vulnerabilities and benchmarking practices used when evaluating such techniques.

[email protected]

Cristiano Giuffrida is an Associate Professor in the Computer Science Department at the Vrije Universiteit Amsterdam. His research interests span across several aspects of Computer Systems, with a focus on systems security.

[email protected]

Herbert Bos is full professor at Vrije Universiteit Amsterdam where he co-leads the VUSec Systems Security group. He is very proud of his current and former students who are all much cleverer than he is. Also, he loves the Beatles.

[email protected]