# Sifter: An Inversion-Free and Large-Capacity Programmable Packet Scheduler

Peixuan Gao, Anthony Dalleggio, Jiajin Liu, Chen Peng, Yang Xu, H. Jonathan Chao







- Background and Motivation
- Observation and Insights
- Sifter: An Inversion-Free and Large-Capacity Programmable Packet Scheduler
- Evaluation and Implementation
- Conclusion



# Background and Motivation



#### Motivation: Latency-Sensitive Application

Latency-sensitive application have hard deadlines for packets





Remote surgery

These applications rely on accurate schedulers that guarantee the deadline of each packet



### Motivation: Latency-Sensitive Application



#### Motivation: Latency-Sensitive Application

Latency-sensitive application have hard deadlines for packets



Such extra delay caused by the packet inversion can be critical for delay sensitive applications



# Observations and Insights





#### Packet Inversions

**Packet Inversions:** When a packet with rank r departs from the scheduler, there exists packet(s) with a smaller rank r' (where smaller rank has higher priority, r' < r) in the scheduler.

Packet Inversions in the FIFO-based Schedulers





#### Packet Inversions

**Packet Inversions:** When a packet with rank r departs from the scheduler, there exists packet(s) with a smaller rank r' (where smaller rank has higher priority, r' < r) in the scheduler.





#### Packet Inversions

**Packet Inversions:** When a packet with rank r departs from the scheduler, there exists packet(s) with a smaller rank r' (where smaller rank has higher priority, r' < r) in the scheduler.

Packet Inversions in the FIFO-based Schedulers



3

FIFO-based schedulers are subject to packet inversion within each queue



#### Eliminating Packet Inversions

We can eliminate packet inversion if we can sort packets within each queue



### Eliminating Packet Inversions

We can eliminate packet inversion if we can sort packets within each queue

But sorting all the packets in a queue takes time





### Speed-up factor

Speed-up Factor: The number of packet metadata that can be processed during the time that transmit a packet

Assume we know the packet with the smallest rank that is ready to transmit





the time interval of transmitting each packet

# Sifter: An Inversion-Free and Large-Capacity Programmable Packet Scheduler



### Sifter Architecture



#### (2) Rotating Calendar Queue (RCQ)



**Mini-PIFO:** A small PIFO stores the packets with the smallest ranks in the scheduler

Use a mini-PIFO that **ALWAYS** store the packets with the smallest ranks in the scheduler

Set a 'Sifting Threshold  $Th_{S}$ ' to keep the Mini-PIFO occupied by **at least**  $Th_{S}$  metadata

**Rotating Calendar Queue (RCQ):** A FIFO-based Calendar Queue stores the packets with the larger ranks in the scheduler

#### Sifter Architecture

### Sifter - Enqueue



**Sentinel:** The rank threshold between mini-PIFO and RCQ

۲

SCHOOL OF COMPUTER SCIENCE

TANDON SCHOOL

🧳 NYU

Sifter Architecture

### Sifter - Enqueue



#### Sifter - Dequeue





🧳 NYU

#### Sifter - Dequeue













🧳 NYU

۲

SCHOOL OF COMPUTER SCIENCE

TANDON SCHOOL OF ENGINEERING

🧳 NYU



Sifter Operation 3: Sifting **Th**<sub>S</sub> (1) Mini-PIFO 15 14 12 11 9 (2) Rotating Calendar Queue (RCQ) Sentinel s = 160-9 10-19 16 10 25 29 22 28 20-29 21 20 33 32 31 36 30-39 30 • 90-99 99

**Sentinel:** The rank threshold between mini-PIFO and RCQ

TANDON SCHOOL OF ENGINEERING

🧳 NYU

۲

SCHOOL OF COMPUTER SCIENCE Sifter Architecture

Sifter Operation 3: Sifting **Th**<sub>S</sub> (1) Mini-PIFO 14 12 11 10 9 7 (2) Rotating Calendar Queue (RCQ) Sentinel s = 150-9 10-19 16 15 25 29 22 28 20-29 21 20 33 32 31 36 30-39 30 • 90-99 99

Sentinel: The rank threshold between mini-PIFO and RCQ

TANDON SCHOOL OF ENGINEERING

🧳 NYU

۲

SCHOOL OF COMPUTER SCIENCE Sifter Architecture

## Sifter – Inversion Free Condition



#### (2) Rotating Calendar Queue (RCQ)

TANDON SCHOOL OF ENGINEERING



SCHOOL OF

COMPUTER SCIENCE

#### **Condition for Inversion-free Scheduling**

Metadata in the Mini PIFO should buy enough time to traverse a FIFO

(1) Time to drain Mini-PIFO  $\geq$  Time to traverse a FIFO

 $Th_S * K \ge S_F$ 

Mini PIFO should occupied by enough metadata by the end of last sifting

(2) Reloaded metadata  $\geq$  Minimal PIFO occupancy

 $S_p - Th_S \ge Th_S \implies S_P \ge 2 * Th_S$ 

| <i>Th<sub>S</sub></i> : Sifting threshold | $S_F$ : FIFO size          |
|-------------------------------------------|----------------------------|
| $S_P$ : PIFO size                         | <b>K</b> : Speed-up factor |

# Evaluation and Implementation



## Evaluation: NS3 – Single Node Flow Convergence

We use a single node topology to evaluate flow convergence on one bottleneck link:

- 8 src hosts and 1 dst host are connected by a switch with Sifter.
- All links have a bandwidth of 10 Gbps and a delay of  $3\mu s$



We ran STFQ on Sifter in these evaluations



#### Evaluation: NS3 – Fat-tree Topology

COMPUTER SCIENCE

OF ENGINEERING

We ran STFQ on Sifter in these evaluations



#### **Evaluation: Hardware Testbed**

We implemented Sifter in VHDL on a Xilinx Alveo U250 FPGA card with a mid-speed grade XCVU13P FPGA.



Xilinx Alveo U250 FPGA



Hardware Testbed





#### Metadata Scheduling Throughputs

#### Metadata Scheduling Order



(a) Scheduling Order, fixed packet size (b) Scheduling Order, varying packet size

Sifter hardware prototype supports **100Gbps line rate** 

Scheduled packet ranks are monotonically increasing: Inversion free

Max frequency: 322 MHz Minimal packet size: 370 Bytes

We ran STFQ on Sifter in these evaluations



# Conclusion





#### Conclusion

Sifter: An Inversion-Free and Large-Capacity Programmable Packet Scheduler

- Generic inversion-free programmable packet scheduler
- Combines the advantages of PIFO and FIFO-based schedulers
- Two-step `Sifting Sorting` with a mini-PIFO
- Support large capacity of buffer space
- Fast scheduling process makes it able to operate at line rate

# Thank you



