Skip to main content
Back to USENIX
  • Conferences
  • Students
Sign in

sponsors

Diamond Sponsor
Gold Sponsor
Bronze Sponsor

help promote

SREcon16 button

general information

Venue:
Google
Gordon House
Barrow Street, Dublin 4
Ireland

Questions?
About SREcon?
About Registration?
About Sponsorship?

Signatures, Patterns, and Trends: Timeseries Data Mining at Etsy

Friday, May 15, 2015 - 3:00pm-3:30pm

Andrew Clegg, Etsy

Etsy loves metrics. Everything that happens in our data centres gets recorded, graphed, and stored. But with over a million metrics flowing in constantly, it’s hard for any team to keep on top of all that information. Graphing everything doesn’t scale, and traditional alerting methods based on thresholds become very prone to false positives.

That’s why we started Kale, an open-source software suite for pattern mining and anomaly detection in operational data streams. These are big topics with decades of research, but many of the methods in the literature are ineffective on terabytes of noisy data with unusual statistical characteristics, and techniques that require extensive manual analysis are unsuitable when your ops teams have service levels to maintain.

In this talk I’ll briefly cover the main challenges that traditional statistical methods face in this environment, and introduce some pragmatic alternatives that scale well and are easy to implement (and automate) on Elasticsearch and similar platforms. I’ll talk about the stumbling blocks we encountered with the first release of Kale, and the resulting architectural changes coming in version 2.0. And I’ll go into a little technical detail on the fingerprinting and anomaly detection algorithms we apply to metrics and their associated stat`istical metadata. These techniques have applications in clustering, outlier detection, similarity search, and supervised learning, and they are not limited to the data centre but can be applied to any high-volume timeseries data.

Andrew joined Etsy in 2014, becoming their first data scientist outside the USA. In the past he has worked on visualization, information retrieval, and data mining techniques for streaming data, so he naturally gravitated towards the Kale project and the broader issues of operational data mining—areas ripe for fruitful collaboration between devops and data science specialists. He has a MSc in bioinformatics and a PhD in natural language processing, both from the University of London.

Andrew Clegg, Etsy

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {208842,
author = {Andrew Clegg},
title = {Signatures, Patterns, and Trends: Timeseries Data Mining at Etsy},
year = {2015},
address = {Dublin},
publisher = {USENIX Association},
month = may
}
Download
View the slides

Presentation Video

Presentation Audio

MP3 Download

Download Audio

  • Log in or register to post comments

Diamond Sponsors

Gold Sponsors

Bronze Sponsors

© USENIX
EIN 13-3055038

  • Privacy Policy
  • Contact Us