Designing Resilient Data Pipelines

Wednesday, March 27, 2019 - 2:55 pm3:25 pm

Andrew Bolin, Two Sigma Investments, LP

Abstract: 

There are a number of questions that plague any operator of a complex data pipeline. How do I quickly recover from failures in my pipeline? How do I know that the data I generate is accurate? How do I minimize the risk associated with updating my pipeline? Designing your data pipeline with resiliency and observability in mind will help to answer these questions. In this talk, I will present several strategies that my team has adopted for reducing operational complexity, risk associated with updates, and concerns about accuracy of data pipelines.

Andrew Bolin, Two Sigma Investments, LP

Andrew Bolin is a Reliability Engineer at Two Sigma Investments where he is responsible for the design and operation of data pipelines critical to the firm's research environment. Before his current role, Andrew worked on the team responsible for the development of Two Sigma's open source fair-share scheduler, Cook. Andrew has an equal passion for spreading RE best practices at Two Sigma and exploring the diverse food offerings of NYC.

SREcon19 Americas Open Access Videos Sponsored by
Salesforce

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229507,
author = {Andrew Bolin},
title = {Designing Resilient Data Pipelines},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar
}

Presentation Video