Home // eKNOW 2015, The Seventh International Conference on Information, Process, and Knowledge Management // View article


Supporting Provenance in Climate Science Research

Authors:
Brett Yasutake
Niko Simonson
Jason Woodring
Nathan Duncan
William Pfeffer
Hazeline Asuncion
Munehiro Fukuda
Eric Salathe

Keywords: data provenance; climate science; parallelization; big data

Abstract:
While the data produced by climate models exponentially grows in size and complexity, the ability of researchers to analyze available data lags. Existing tools for climate analysis that capture provenance are generally implemented on supercomputing clusters. Provenance is often difficult for a researcher to analyze due to its sheer volume. In contrast, our Pacific Northwest Climate Analysis (PNCA) Tracker is a lightweight, provenance-aware parallel system that allows researchers from smaller facilities to quickly develop custom analysis tools while enabling them to easily verify their datasets. This technique modularizes the captured provenance, allows researchers to customize the provenance collection, and efficiently collects provenance within a parallel and distributed environment, made possible by the use of the Multi-Agent Spatial Simulation (MASS) library. It is designed to be highly extensible by minimizing dependencies within the architecture. We demonstrate that our tool is potentially accessible to a wider range of researchers and is highly efficient compared to the commonly used climate analysis tool, Network Common Data Form (NetCDF) Operators or NCO. Finally, we discuss how provenance concepts in PNCA Tracker map to the W3C PROV.

Pages: 84 to 91

Copyright: Copyright (c) IARIA, 2015

Publication date: February 22, 2015

Published in: conference

ISSN: 2308-4375

ISBN: 978-1-61208-386-5

Location: Lisbon, Portugal

Dates: from February 22, 2015 to February 27, 2015