Pro Tip: Save Money on Outages by Having a Bot Do the Heavy Lifting

Thursday, June 07, 2018 - 2:20 pm2:45 pm

Cezar Guimaraes, Microsoft

Abstract: 

Humans are slow, unreliable and hard to train. Azure has saved many millions of downtime minutes by using a knowledgeable and intelligent Bot. This Bot enhances and automates impact assessment, mitigation and problem management from your incident management.

You will learn about how to effectively run your outages and the strategy that we used to ensure that our solution was what our users wanted and would in fact lead to the immense time and cost savings that we predicted. We will share our guiding principles and lessons learned along the way.

Cezar Guimaraes, Microsoft

Cezar Guimaraes is a Site Reliability Engineer Lead on the Microsoft Azure team. He has more than 15 years of experience and has worked at Microsoft for 12 years as a Software Engineer. Currently, he is working on Azure to identify and resolve problems that stand in the way of service uptime through engineering solutions such as bots and intelligence/autonomous engines.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {214965,
author = {Cezar Guimaraes},
title = {Pro Tip: Save Money on Outages by Having a Bot Do the Heavy Lifting},
year = {2018},
publisher = {USENIX Association},
month = jun
}

Presentation Video 

Presentation Audio