- Contact: Yang Cao, Peter Slaughter (DataONE)
- License: Apache 2
- Package source code and results see here
The Matlab DataONE Toolbox (version 2) is a provenance management software. It can capture, store, query, visualize, and publish of a singel Matlab script or multiple Matlab script runs. There are three types of provenance supported by matlab-dataone: prospective provenance, retrospective provenance, and hybrid provenance. For a workflow project, we have multiple provenance graphs consisting of a graph of prospective provenance, a graph of hybrid provenance, a graph of retrospective multi-run provenance. The DataONE data package can be archived in the file system so that these past versions of files can be retrieved for a run in order to investigate previous versions of processing or analysis, support reproducibility, and provide an easy way to publish data products and all files that contributed to those products to a data repository such as the DataONE network.
- Author: Christopher Jones, Yang Cao, Peter Slaughter, Matthew B. Jones (DataONE)
- License: Apache 2
- Package source code on Github
- Submit Bugs and feature requests
The Matlab DataONE Toolbox (version 1) provides an automated way to capture, store, and publish data provenance for Matlab scripts and console commands without the need to modify existing Matlab code. The rerospective provenance captured during a Matlab script execution includes information about the script that was run, files that were read or written, and details about the execution environment at the time of execution. Our toolbox uses prospective provenance tool YesWorkflow to express the prospective provenance that is embedded in the script code as figures. A package of the script iteself, a list of input files, and a list of generated files that are associated with the run can be easily published to a repository within the DataONE network. The DataONE data package can be archived in the file system so that these past versions of files can be retrieved for a run in order to investigate previous versions of processing or analysis, support reproducibility, and provide an easy way to publish data products and all files that contributed to those products to a data repository such as the DataONE network.
Matlab R2015b or later for Mac, Windows, or Linux is required to use the toolbox. To install the toolbox,
- Download the zip file: Matlab DataONE Toolbox 1.0.0
- Unpack the zip file into an installation directory of your choosing
- Open Matlab and change directories to your unpacked matlab-dataone directory
- Run the install_matlab_dataone script in that directory
- Restart Matlab
The Matlab DataONE Toolbox is licensed as open source software under the Apache 2.0 license.
Thae Matlab DataONE package can be used to track code execution in Matlab, data inputs and outputs to those executions, and the software environment during the execution (e.g. Matlab and operating system versions). As a quick start, here is an example that starts the toolbox RunManager, executes a precanned script, and then views the details of that script run.
import org.dataone.client.run.RunManager;
mgr = RunManager.getInstance();
mgr.record('/Users/cjones/projects/intertidal_temps/process_temperatures.m', 'First toolbox run');
mgr.listRuns();
mgr.view('runNumber', 1);
The classes provided in the toolbox have built-in documentation. Use the help() function or the doc() function to view the help for a given class. For instance, to view the help on the RunManager class, use:
doc org.dataone.client.run.RunManager
A User Guide is in the works, and will walk through the various toolbox functions.
- The toolbox captures provenance for only a subset of the load() function syntaxes. See Issue #196
- The toolbox captures provenance for the save() function, but requires the filename to be the first argument. See Issue #198
- Debugging log output for some function calls is not suppressed completely. See Issue #200