You are here
Signatures, Patterns, and Trends: Timeseries Data Mining at Etsy
Andrew Clegg, Etsy
Etsy loves metrics. Everything that happens in our data centres gets recorded, graphed, and stored. But with over a million metrics flowing in constantly, it’s hard for any team to keep on top of all that information. Graphing everything doesn’t scale, and traditional alerting methods based on thresholds become very prone to false positives.
That’s why we started Kale, an open-source software suite for pattern mining and anomaly detection in operational data streams. These are big topics with decades of research, but many of the methods in the literature are ineffective on terabytes of noisy data with unusual statistical characteristics, and techniques that require extensive manual analysis are unsuitable when your ops teams have service levels to maintain.
In this talk I’ll briefly cover the main challenges that traditional statistical methods face in this environment, and introduce some pragmatic alternatives that scale well and are easy to implement (and automate) on Elasticsearch and similar platforms. I’ll talk about the stumbling blocks we encountered with the first release of Kale, and the resulting architectural changes coming in version 2.0. And I’ll go into a little technical detail on the fingerprinting and anomaly detection algorithms we apply to metrics and their associated stat`istical metadata. These techniques have applications in clustering, outlier detection, similarity search, and supervised learning, and they are not limited to the data centre but can be applied to any high-volume timeseries data.
Andrew joined Etsy in 2014, becoming their first data scientist outside the USA. In the past he has worked on visualization, information retrieval, and data mining techniques for streaming data, so he naturally gravitated towards the Kale project and the broader issues of operational data mining—areas ripe for fruitful collaboration between devops and data science specialists. He has a MSc in bioinformatics and a PhD in natural language processing, both from the University of London.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.