Automating Performance Tuning with Machine Learning

Note: Presentation times are in Coordinated Universal Time (UTC).

Thursday, 14 October, 2021 - 02:0002:30

Stefano Doni, Akamas


SRE's main goal is to achieve optimal application performance, stability, and availability. A crucial role is played by configurations (e.g. container resources limits and replicas, runtime settings, etc): wrong settings are among the top causes of poor performance, efficiency, and incidents. But tuning configurations is a very complex and manual task, as there are hundreds of settings in the stack. We present a novel approach that leverages machine learning to find optimal configurations of the tech stack in an automated fashion. This approach leverages reinforcement learning techniques to find the best configurations based on an optimization goal that SREs can define (e.g. minimize service latency or cloud costs). We show an example of optimizing Kubernetes microservice cost and latency tuning container resource and JVM options. We analyze the optimal configurations that were found, the most impactful parameters, and the lesson learned for tuning microservices.

Stefano Doni, Akamas

Stefano is obsessed with performance optimization and leads the Akamas vision for Autonomous Performance Optimization powered by AI. With more than 15 years of experience in the performance industry, he has worked on projects for major national and international enterprises. He has presented several talks at the Computer Measurement Group international conference and in 2015, he won the Best Paper award for his contributions to capacity planning and performance optimization of Java applications.

SREcon21 Open Access Sponsored by Indeed

@conference {276745,
author = {Stefano Doni},
title = {Automating Performance Tuning with Machine Learning},
year = {2021},
publisher = {USENIX Association},
month = oct

Presentation Video