Avoiding and Breaking Out of Capacity Prison

Friday, 2017, September 1 - 10:0010:30

Jake Welch, Microsoft

Abstract: 

Capacity management at any scale has many moving pieces and requires a range of activities from capacity forecasting to emergency response. Capacity issues can directly impact your service scalability, performance and availability. Lead time to acquire new capacity can make a capacity management plan as important as your service monitoring. Being prepared can help ensure a great customer experience even during difficult times.

In this talk, we will present a comprehensive set of activities necessary to execute a capacity management plan for a storage service of any size. We will discuss learnings from Microsoft Azure Storage - one of the largest and fastest growing storage systems on the planet and how SREs used code to proactively scale and remove complex manual effort and toil through automation. The work here has resulted in an improved customer experience, better work/life balance and reduced cost.

Jake Welch, Microsoft

Jake Welch is a Site Reliability Engineer on the Microsoft Azure team in NYC. He has worked on large scale services for a decade, in both dev and operational roles. At Microsoft, he primarily works on infrastructure services with focus on Storage and Security.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Audio

BibTeX
@conference {205566,
author = {Jake Welch},
title = {Avoiding and Breaking Out of Capacity Prison},
year = {2017},
address = {Dublin},
publisher = {{USENIX} Association},
}