Designing Resilient Data Pipelines

Wednesday, March 27, 2019 - 2:55 pm3:25 pm

Andrew Bolin, Two Sigma Investments, LP


There are a number of questions that plague any operator of a complex data pipeline. How do I quickly recover from failures in my pipeline? How do I know that the data I generate is accurate? How do I minimize the risk associated with updating my pipeline? Designing your data pipeline with resiliency and observability in mind will help to answer these questions. In this talk, I will present several strategies that my team has adopted for reducing operational complexity, risk associated with updates, and concerns about accuracy of data pipelines.

Andrew Bolin, Two Sigma Investments, LP

Andrew Bolin is a Reliability Engineer at Two Sigma Investments where he is responsible for the design and operation of data pipelines critical to the firm's research environment. Before his current role, Andrew worked on the team responsible for the development of Two Sigma's open source fair-share scheduler, Cook. Andrew has an equal passion for spreading RE best practices at Two Sigma and exploring the diverse food offerings of NYC.

@conference {229507,
author = {Andrew Bolin},
title = {Designing Resilient Data Pipelines},
year = {2019},
address = {Brooklyn, NY},
publisher = {{USENIX} Association},