## Persistent Memory Preview Session

Ian Neal University of Michigan, CSE

iangneal@umich.edu (about.iangneal.io)



### What is Persistent Memory (PM)?

- Like memory, but persistent!
- Retains data without active power



#### Abstraction



#### **Powered Backups**

Source: Viking Technology



#### **Non-volatile Media**

Source: Intel Corporation



### Timeline



### **Phase Change Memory**

- Phase-change RAM (PCM)
- 3D XPoint (*high density* PCM)
- 2015: joint Intel+Micron announcement
- 2017: Optane SSDs available
- 2019: DIMMs (Series 100) available

3D XPoint 2 layer diagram.

A PCM cell in low-resistance state (left) and high-resistance state (right).





### **Modern Production**

• PM, NVM, SCM ≅ Intel Optane DC Persistent Memory Module



Series 100

- 2-3x slower than DRAM (~300ns)
- Requires explicit cache flushing (CPU cache is volatile, PM isn't)



Series 200

- eADR (extended ADR) auto-persists updates in CPU cache
- ~32% higher bandwidth than Series 100



### **Modern PM Research**

- 1. Persistent Data Structures and Storage Systems
- 2. Volatile Use Cases
- 3. Developer Tools



## **PM Storage Systems (Area 1)**

- Only 2-3x slower than DRAM
- ~8x denser (128-512GB devices)
- Different read/write, seq/rand perf
- Software overhead easily exposed



Read latency of Optane DC memory on a cache miss.





#### Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes [OSDI]

Qing Wang, Youyou Lu, Junru Li, and Jiwu Shu

#### Location of PM accesses must be accounted for!

2.3x, 1.6x higher throughput for write, read-intensive workloads!



# **Characterizing and Optimizing Remote Persistent Memory with RDMA and NVM** [ATC]

Xingda Wei, Xiating Xie, Rong Chen, Haibo Chen, Binyu Zang

#### Systems build on *emulated* NVM perform poorly on *real* NVM!

| A2. Access pattern (§4.2) | H4. Use ntstore instead of store for large writes                                          | - | 1 |
|---------------------------|--------------------------------------------------------------------------------------------|---|---|
|                           | H5. Use XPLine granularity for writes                                                      | 1 | 1 |
|                           | H6. Use PCIe DW granularity (64B) for small writes (i.e., less than XPLine)                | 1 | - |
|                           | H7. Use cacheline granularity (64B) with ntstore for small writes (i.e., less than XPLine) | - | 1 |
|                           | H8. Use less atomic operations on NVM                                                      | 1 | 1 |
| A3. RDMA-aware (§4.3)     | H9. Enable outstanding request with doorbell batching for one-sided persistent WRITE       | 1 | - |



9



## First Responder: Persistent Memory Simultaneously as High Performance Buffer Cache and Storage [ATC]

Hyunsub Song, Shean Kim, J. Hyun Kim, Ethan JH Park, Sam H. Noh

#### PM can easily accelerate existing systems!





## PM as Volatile Memory (Area 2)

+ Larger pools of memory per node
+ Lower energy costs
+ Faster than SWAP

- Higher latency than DRAM
- Lower memory bandwidth



Throughput of memory-caching services using DRAM (**MM-LDRAM**), NVM with DRAM as a cache (**MM-Optane-Cached**), and NVM directly (**MM-Optane-Uncached**).



## **PM as Volatile Memory Paper**

# Improving Performance of Flash Based Key-Value Stores Using Storage Class Memory as a Volatile Memory Extension [ATC]

Hiwot Tadese Kassa, Jason Akers, Mrinmoy Ghosh, Zhichao Cao, Vaibhav Gogte, Ronald Dreslinski

#### PM can reduce operating costs!





Up to 80% higher throughput at 43-48% lower cost!

### **PM Developer Tools (Area 3)**

- Many crash states to reason about
- Adding cache flushes can be tedious/error prone (Series 100)
- Even more challenging for legacy code
  - Reason about entire code base at once
  - May restructure to remove FS calls
  - May add durability to volatile structures

| 1.<br>2. | store<br>store | 1<br>2 | into<br>into | $egin{array}{c} \mathbf{X}_1 \ \mathbf{X}_2 \end{array}$ |   |
|----------|----------------|--------|--------------|----------------------------------------------------------|---|
| N.       | store          | N      | into         | x                                                        | ļ |

Updating a PM data structure.



### **PM Developer Tool Papers**

#### Ayudante: A Deep Reinforcement Learning Approach to Assist Persistent Memory Programming [ATC]

Hanxian Huang, Zixuan Wang, Juno Kim, Steven Swanson, Jishen Zhao

#### Automatic conversion tools can do just as well as developers!



#### **TIPS: Making Volatile Index Structures Persistent with DRAM-NVMM Tiering** [ATC]

R. Madhava Krishnan, Wook-Hee Kim, Xinwei Fu, Sumit Kumar Monga, Hee Won Lee, Minsung Jang, Ajit Mathew, Changwoo Min

Reusing highly optimized volatile indices is possible and *preferable*!



3-10x higher performance!

## **Session Information**

### <u>OSDI 2021</u>

#### Storage

• Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes

### ATC 2021

- Peeking over the Fence: RDMA
  - Characterizing and Optimizing Remote Persistent Memory with RDMA and NVM

#### Friends Fur-Ever: Persistent Memory and In-Memory Computing

- Ayudante: A Deep Reinforcement Learning Approach to Assist Persistent Memory Programming
- TIPS: Making Volatile Index Structures Persistent with DRAM-NVMM Tiering
- Improving Performance of Flash Based Key-Value Stores Using Storage Class Memory as a Volatile Memory Extension
- First Responder: Persistent Memory Simultaneously as High Performance Buffer Cache and Storage

