When to Hedge in Interactive Services


Mia Primorac, Oracle Labs; Katerina Argyraki and Edouard Bugnion, EPFL


In online data-intensive (OLDI) services, each client request typically executes on multiple servers in parallel; as a result, "system hiccups", although rare within a single server, can interfere with many client requests and cause violations of service-level objectives. Service providers have long been fighting this "tail at scale" problem through "hedging", i.e., issuing redundant queries to mask system hiccups. This, however, can potentially cause congestion that is more detrimental to tail latency than the hiccups themselves.

This paper asks: when does it make sense to hedge in OLDI services, and how can we hedge enough to mask system hiccups but not as much as to cause congestion? First, we show that there are many realistic scenarios where hedging can have no benefit—where any hedging-based scheduling policy, including the state-of-the-art, yields no latency reduction compared to optimal load balancing without hedging. Second, we propose LÆDGE, a scheduling policy that combines optimal load balancing with work-conserving hedging, and evaluate it in an AWS cloud deployment. We show that LÆDGE strikes the right balance: first, unlike the state of the art, it never causes unnecessary congestion; second, it performs close to an ideal scheduling policy, improving the $99 percentile latency by as much as $49, measured on $60 system utilization — without any difficult parameter training as found in the state of the art.

NSDI '21 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {262032,
author = {Mia Primorac and Katerina Argyraki and Edouard Bugnion},
title = {When to Hedge in Interactive Services},
booktitle = {18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)},
year = {2021},
isbn = {978-1-939133-21-2},
pages = {373--387},
url = {https://www.usenix.org/conference/nsdi21/presentation/primorac},
publisher = {USENIX Association},
month = apr

Presentation Video