fTPM: A Software-only Implementation of a TPM Chip

Himanshu Raj, Stefan Saroiu, Alec Wolman, Ronald Aigner, Jeremiah Cox, Paul England, Chris Fenner, Kinshuman Kinshumann, Jork Loeser, Dennis Mattoon, Magnus Nystrom, David Robinson, Rob Spiger, Stefan Thom, David Wooten

Microsoft
Many systems in industry & research rely on TPMs
- Bitlocker, trusted sensors, Chrome OS, etc...

Challenge: Smartphones & tablets lack TPMs today
- TPM: never designed to meet space, cost, power constraints

Observation:
Big Problem

These CPU features omit several secure resources found on trusted hardware
Research Question

Can we overcome these limitations to build systems whose security ~trusted hardware?

Answer: Yes

Contributions:
• 3 approaches to overcome TrustZone’s limitations (lessons relevant to SGX also)
• Security analysis of fTPM vs TPM chips
• fTPM shipped millions of Microsoft Surface & WP
Outline

- Motivation
- Background on TPM
  - ARM TrustZone and its shortcomings
  - High-level architecture & threat model
  - Overcoming TrustZone limitations: three approaches
- Performance evaluation
- Conclusions
What are TPMs?

- Hardware root of trust offering:
  - Strong machine identity
  - Software rollback prevention
  - Secure credentials store
  - Software attestation
What are TPMs good for?

- **Shipped Products by Industry:**
  - Protects “data-at-rest” (Google, Microsoft)
  - Prevents rollback (Google)
  - Virtual smart cards (Microsoft)
  - Early-Launch Anti-Malware (Microsoft)

- **Research:**
  - Secure VMs for the cloud [SOSP’11]
  - Secure offline data access [OSDI ‘12]
  - Trusted sensors for mobile devices [MobiSys ’11, SenSys ‘11]
  - Cloaking malware [Sec ‘11]
TPM: 1.0 → 1.1 → 1.2 → 2.0

- **Late 1999**: TCPA is formed (IBM, HP, Intel, Microsoft, ...)
- **2001**: TPM specification 1.0 is released
  - Never adopted by any hardware AFAIK
- **Late 2001**: TPM 1.1 is released
- **2002**: IBM Thinkpad T30 uses first discrete TPM chip
- **2003**: TCPA morphs into TCG
- **2007**: pin reset attack
- **2008**: TPM 1.2
  - Very popular, many hardware vendors built chips
- **2014**: TPM 2.0
New in TPM 2.0

- Newer cryptography
  - TPM 1.2: SHA-1, RSA
  - TPM 2.0: SHA-1, RSA, SHA-256, ECC

- TPM 2.0 provides a reference implementation
  - “the code is the spec”

- Much more flexible policy support
  - Read this as “more (useful) bells and whistles”
Motivation

Background on TPM

ARM TrustZone and its shortcomings

High-level architecture & threat model

Overcoming TrustZone limitations: three approaches

Performance evaluation

Conclusions
Secure Monitor Layer (software)

ARM Hardware

Normal World (NW)

Secure World (SW)
Booting Up
Booting Up

Secure Monitor Layer (software)

ARM Hardware
Booting Up

Allocates memory
Restricts its access to Secure World-only
More setup...

- Secure Monitor Layer
- ARM Hardware
Booting Up

Secure Monitor Layer

ARM Hardware

Secure World (SW)
Booting Up

Secure Monitor Layer

Secure World (SW)

ARM Hardware
ARM TrustZone Properties

- Isolated runtime that boots first
- Curtained memory
- Ability to map interrupts delivered to Secure World
  - Secure monitor dispatches interrupts
ARM TrustZone Limitations

- Lack of virtualization
- Lack of accessibility
Outline

- Motivation
- Background on TPM
- ARM TrustZone and its shortcomings
  - High-level architecture & threat model
- Overcoming TrustZone limitations: three approaches
- Performance evaluation
- Conclusions
High-Level architecture

- TEE: trusted execution environment (small codebase)
  - Monitor, dispatcher, runtime
- Most hardware resources mapped to Normal World
  - For better perf.
## Threat Model: What Threats are In-Scope?

<table>
<thead>
<tr>
<th>Goals</th>
<th>fTPM</th>
<th>TPM chip</th>
</tr>
</thead>
<tbody>
<tr>
<td>Malicious software (e.g., malware, compromised OS)</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Time-based side-channel</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Cache-based side-channel</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Denial-of-Service</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Power analysis-based side-channel</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Memory attacks (e.g., coldboot, bus sniffing, JTAG)</td>
<td>✔</td>
<td>❌</td>
</tr>
</tbody>
</table>

See “Memory Attacks” (ASPLOS 2015)
Outline

- Motivation
- Background on TPM
- ARM TrustZone and its shortcomings
- High-level architecture & threat model
- Overcoming TrustZone limitations: three approaches
- Performance evaluation
- Conclusions
Helpful observation: huge ARM eco-system out there

- eMMC controller present on many ARM SoCs
  - Has provisions for trusted storage
- Secure fuses: write-once, read-always registers
  - Can act as “seed” for deriving crypto keys
- Entropy for TrustZone can be added easily
ARM Eco-system Offers eMMC

- eMMC controllers can setup one partition as Replay-Protected Memory Block (RPMB)

- RPMB primitives:
  - One-time programmable authentication keys:
    - fTPM uses “seed” from secure fuse to generate auth. keys
    - fTPM writes auth. keys to eMMC controller upon provisioning
  - Authenticated reads and writes (uses internal counters)
  - Nonces
ARM TrustZone Limitations

eMMC & Secure fuses

Entropy

Timer & changed semantics of TPM commands
Three Approaches

1. Provision additional trusted hardware
2. Make design compromises
3. Change semantics of TPM commands

Do not affect TPM’s security!
Problem: Long-Running Commands

- Design requirements:
  - Code running in secure world must be minimal
    - e.g., TEE lacks pre-emptive scheduler
  - fTPM commands cannot be long-lived
    - Commodity OS “freezes” during fTPM command

- Creating RSA keys can take 10+ seconds on slow mobile devices!!!
Solution: Cooperative Checkpointing

- TPM command
- TPM command checkpointed
- Resume TPM command

Oops, it’s been a long time

Normal World  Secure World
Three Approaches

1. Provision additional trusted hardware
2. Make design compromises
3. Change semantics of TPM commands

Do not affect TPM’s security!
Background: TPM Unseal

- Guess PIN 1st time
- Guess PIN 2nd time
- Guess PIN 3rd time
- TPM w/ storage
- Failed Attempts++
- Failed Attempts++
- Failed Attempts++
- Lockout Period
Problem: Dark Periods

- During dark periods:
  - Problem: storage unavailable
  - Danger: TPM Unseal commands not safe

- Example of dark period: During boot:
  - Firmware (UEFI) finished running and unloaded
  - OS loader is running (OS not fully loaded)
Possible Attack during Dark Period

<table>
<thead>
<tr>
<th>PIN Attempt</th>
<th>Result</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>1st time</td>
<td>Failed</td>
<td>TPM without storage</td>
</tr>
<tr>
<td>2nd time</td>
<td>Failed</td>
<td>Attempts++</td>
</tr>
<tr>
<td>3rd time</td>
<td>Failed</td>
<td>Attempts++</td>
</tr>
<tr>
<td>4th time</td>
<td>Reboot</td>
<td>Dark period entered here</td>
</tr>
</tbody>
</table>
Solution: Dirty Bit

- Write dirty bit to storage before enter dark period
- If dark period exited, dirty bit is cleared

- If machine reboots during dark period, bit remains dirty
  - Possibility #1: Legitimate user reboots machine
  - Possibility #2: Attacker attempts to guess PIN

- Solution: Upon fTPM bootup, if bit dirty enter lockout
Dirty Bit Stops Attack

-Guess PIN 1st time
-Failed Attempts++
-Set Dirty Bit

-Guess PIN 2nd time
-Failed Attempts++

-Guess PIN 3rd time
-Failed Attempts++
-Reboot

-Lockout Period

-Dark period entered here
Outline

- Motivation
- Background on TPM
- ARM TrustZone and its shortcomings
- High-level architecture & threat model
- Overcoming TrustZone limitations: three approaches
  - Performance evaluation
- Conclusions
Methodology

- Instrumented and measured various TPM commands
  - Create RSA keys, seal, unseal, sign, verify, encrypt, decrypt

<table>
<thead>
<tr>
<th>TPM</th>
<th>CPU</th>
</tr>
</thead>
<tbody>
<tr>
<td>fTPM1</td>
<td>1.2 GHz 1.2 GHz Cortex-A7</td>
</tr>
<tr>
<td>fTPM2</td>
<td>1.3 GHz Cortex-A9</td>
</tr>
<tr>
<td>fTPM3</td>
<td>2.0 GHz Cortex-A57</td>
</tr>
<tr>
<td>fTPM4</td>
<td>2.2 GHz Cortex-A57</td>
</tr>
<tr>
<td>dTPM1</td>
<td></td>
</tr>
<tr>
<td>dTPM2</td>
<td></td>
</tr>
<tr>
<td>dTPM3</td>
<td></td>
</tr>
</tbody>
</table>
Result: fTPMs much faster than dTPMs for RSA-2048 (w/ OAEP & SHA-256).
Conclusions

- fTPM leverages ARM TrustZone to build TPM 2.0 running in-firmware

- Three approaches to build fTPM:
  - Additional hardware requirements
  - Design compromises
  - Modify TPM semantics

- fTPMs offer much better performance than dTPMs
Discussion of SGX Limitations

- Lack of trusted storage, secure counters, and clock
  - Due to fundamental process limitations
- Lack of Intel eco-system (unlike ARM):
  - Intel needs to decide to equip their devices with eMMC
- One plus: SGX encrypts memory
  - No need to worry about memory attacks
- One minus: SGX can only run ring-3 code
  - No secure interrupts available
  - More concerns about side-channel attacks
Questions?

- ssaroiu@microsoft.com