Uncertainty in Aggregate Estimates from Sampled Distributed Traces

Authors: 

Nate Coehlo, Arif Merchant, and Murray Stokely, Google, Inc.

Abstract: 

Tracing mechanisms in distributed systems give important insight into system properties and are usually sampled to control overhead. At Google, Dapper [8] is the always-on system for distributed tracing and performance analysis, and it samples fractions of all RPC traffic. Due to difficult implementation, excessive data volume, or a lack of perfect foresight, there are times when system quantities of interest have not been measured directly, and Dapper samples can be aggregated to estimate those quantities in the short or long term. Here we find unbiased variance estimates of linear statistics over RPCs, taking into account all layers of sampling that occur in Dapper, and allowing us to quantify the sampling uncertainty in the aggregate estimates. We apply this methodology to the problem of assigning jobs and data to Google datacenters, using estimates of the resulting cross-datacenter traffic as an optimization criterion, and also to the detection of change points in access patterns to certain data partitions.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {179444,
author = {Nate Coehlo and Arif Merchant and Murray Stokely},
title = {Uncertainty in Aggregate Estimates from Sampled Distributed Traces},
year = {Submitted},
url = {https://www.usenix.org/conference/mad12/workshop-program/presentation/Coehlo},
publisher = {{USENIX} Association},
}

Presentation Audio