Mastering Chaos: Achieving Fault Tolerance with Observability-Driven Prioritized Load Shedding

Friday, June 16, 2023 - 10:55 am11:50 am

Harjot Gill and Hardik Shingala, FluxNinja


Microservices-based applications are complex, with metastable failures like cascading failures and retry storms posing significant challenges. In this talk, we will explore these types of failures, the shortcomings of current state-of-the-art approaches, and introduce Aperture, a unique open-source tool for observability-driven prioritized load shedding.

Aperture enables graceful degradation of non-critical services, ensuring system stability. We'll delve into Aperture's innovative architecture, covering its control and data planes, and discuss how it employs token buckets, weighted fair queuing, and concurrency limiting to prioritize workloads effectively.

We will also share real-world results from implementing Aperture in cloud products, demonstrating its ability to protect multi-tenant databases from overloads through prioritized load shedding of GRPC and GraphQL traffic.

Join us on this journey as we unveil a powerful solution that addresses the limitations of current approaches, ensuring the reliability and resilience of your microservices-based applications.

Harjot Gill, FluxNinja, Inc.

Harjot Gill is Co-founder & CEO of FluxNinja, an early stage startup enabling reliability automation. He is co-creator of the Aperture open source project and active contributor in the open source community. Previously, he was Co-founder & CEO of observability startup Netsil, which was acquired by Nutanix. He holds advanced degrees in Computer Science & Networking Systems and has published several highly cited papers on declarative programming, mesh networks and scalable packet processing.

Hardik Shingala, FluxNinja, Inc.

Hardik Shingala, is an experienced IT professional with over 5 years of experience in the industry. He has worked on a variety of projects related to cloud computing, security, and finance, among other areas. He is skilled in multiple programming languages including Golang, Java, and Python, and has experience working with technologies such as Kubernetes and Docker. At FluxNinja, he specializes in backend development and DevOps.

@conference {288317,
author = {Harjot Gill and Hardik Shingala},
title = {Mastering Chaos: Achieving Fault Tolerance with {Observability-Driven} Prioritized Load Shedding},
year = {2023},
address = {Singapore},
publisher = {USENIX Association},
month = jun

Presentation Video