Rick Boone, Uber
At Uber, the majority of our services are in the critical path of customer-facing features (matching drivers and riders, handling ongoing trips, determining prices or ETA's, etc). Each of these services consumes resources (CPU, MEM, NET, DISK) in a manner which is "driven" by the behavior of 1 or more of a few key business metrics ("Trips Occurring", "Drivers Online", "App Opens", etc)—for instance, a CPU-bound, "trips-driven" service will see its CPU utilization increase when trips demand increases. With this in mind, along with historical data and machine learning algorithms in hand, we can statistically model the relationship between these key business metrics and the resource utilization of each individual service. This allows us to accurately build predictions of how many hardware resources any service will need at any arbitrary point in the future with stunning accuracy. This talk will walk you through the method of gathering the right data and applying machine learning to it, to allow you to revolutionize how you approach and perform capacity planning.
I've worked in reliability engineering for over 12 years, most recently at Uber as both an SRE (2 years) and a Capacity Engineer (1 year) and, prior to that, at Facebook as a Production Engineer (3 years). At both companies, I've ensured reliability for large-scale, complex applications and platforms with both high criticality and high-performance requirements.
SREcon18 Americas Open Access Videos Sponsored by
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.