Astronomical Pipeline Provenance: A Use Case Evaluation

Authors: 

Michael A. C. Johnson, Institute of Data Science (DLR) and Max Planck Institute for Radio Astronomy; Marcus Paradies and Marta Dembska, Institute of Data Science (DLR); Kristen Lackeos, Hans-Rainer Klöckner, and David J. Champion, Max Planck Institute for Radio Astronomy; Sirko Schindler, Institute of Data Science (DLR)

Abstract: 

In this decade astronomy is undergoing a paradigm shift to handle data from next generation observatories such as the Square Kilometre Array (SKA) or the Vera C. Rubin Observatory (LSST). Producing real time data streams of up to 10 TB/s and data products of the order of 600 Pbytes/year, the SKA will be the biggest civil data producing machine of the world that demands novel solutions on how these data volumes can be stored and analysed. Through the use of complex, automated pipelines the provenance of this real time data processing is key to establish confidence within the system, its final data products, and ultimately its scientific results.

The intention of this paper is to lay the foundation for making an automated provenance generation tool for astronomical/data-processing pipelines. We therefore present a use case analysis, specific to the astronomical needs which addresses the issues of trust and reproducibility as well as other ulterior use cases which are of interest to astronomers. This analysis is subsequently used as the basis to discuss the requirements, challenges, and opportunities involved in designing both the tool and the associated provenance model.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {274857,
author = {Michael A. C. Johnson and Marcus Paradies and Marta Dembska and Kristen Lackeos and Hans-Rainer Kl{\"o}ckner and David J. Champion and Sirko Schindler},
title = {Astronomical Pipeline Provenance: A Use Case Evaluation},
booktitle = {13th International Workshop on Theory and Practice of Provenance (TaPP 2021)},
year = {2021},
url = {https://www.usenix.org/conference/tapp2021/presentation/johnson},
publisher = {USENIX Association},
month = jul
}