Zero-Downtime Rebalancing and Data Migration of a Mature Multi-Shard Platform

Wednesday, 2 October, 2019 - 16:4517:30

Justin Li and Florian Weingarten, Shopify


Application-level sharding is a common pattern for scaling multi-tenant architectures. However, once it has been put into production, you inevitably run into follow-up problems that aren't as widely discussed. In this talk, we will share years worth of experience and connect the dots to outline a full sharding solution that goes beyond the initial implementation and deployment. At the core of our toolkit is the "binlog", an event stream used by the MySQL replication protocol. The tooling we've built on top of this idea is being used in production at Shopify to balance hundreds of MySQL shards for uniform load distribution, isolate heavy tenants from each other, and has in the past been used to safely transfer the entire dataset of our over 800.000 tenants from physical datacenters to a cloud environment. All of this happens online, without downtime, and is practically invisible to the tenants.

Justin Li, Shopify

Justin is a production engineer at Shopify. He likes performance problems, parsers, and distributed systems, and has worked on many aspects of Shopify’s production system, notably resiliency, sharding, flash sale preparations, scriptable load balancing and routing, and optimizing Shopify’s storefront rendering engine.

Florian Weingarten, Shopify

Florian is a production engineer at Shopify. For the past 5 years, he has been working on all aspects of Shopify's sharding and multi-tenancy stack, including resiliency, region failovers, load distribution and isolation, shard rebalancing, as well as Shopify's migration to Google Cloud.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {239518,
author = {Justin Li and Florian Weingarten},
title = {{Zero-Downtime} Rebalancing and Data Migration of a Mature {Multi-Shard} Platform},
year = {2019},
address = {Dublin},
publisher = {USENIX Association},
month = oct

Presentation Video