Care and Feeding of Data Processing Pipelines

Thursday, 30 August, 2018 - 14:0014:45

Rita Sodt, Google


Data processing pipelines have important use cases ranging from business analytics, machine learning, eliminating spam and abuse, and delivering billing invoices to transforming data for many important user facing serving jobs. These pipelines are often composed of multiple steps where the input of one is the output of another and with dependencies on external systems and storage, all of which can break. When they do, and pipelines fail to meet SLOs, fixes are often expensive and time consuming, especially if a large data set needs to be reprocessed or repaired. It is best to focus on prevention and quickly detecting and responding to the issues, which is where SRE can help.

In part the difficulty of managing pipelines lies in their difference from serving jobs. Unable to monitor RPC latency and errors directly as a proxy for customer happiness it's necessary to gain visibility into the age of oldest unprocessed data and measure data correctness since corrupt output data may be customer visible and persisted even when serving jobs report no errors. To prevent issues and minimize impact techniques such as canarying, incremental rollout, automatic failover, and auto­scaling can be used, which all have specific considerations for pipelines.

Rita Sodt, Google

Rita is an SRE at Google with experience managing data processing pipelines, including Google Analytics. She has worked with other pipeline groups at Google on automation and, in particular, monitoring products that meet the needs of pipelines as well as serving jobs. She started her career as a software developer on Google cloud and comes from an interdisciplinary research background at University of Washington, including projects to predict and model brain tumor growth and to interface sensors with mobile devices for applications in the developing world.

@inproceedings {218855,
author = {Rita Sodt},
title = {Care and Feeding of Data Processing Pipelines},
booktitle = {SREcon18 Europe/Middle East/Africa (SREcon18 Europe)},
year = {2018},
address = {Dusseldorf},
url = {},
publisher = {USENIX Association},
month = aug