Skip to content

Workflow for new data addition

brandon whitehead edited this page Sep 10, 2024 · 12 revisions
Lead Do Measure
Input
find data Identify what kinds of data we are interested in and where other people who are interested in that same kind of data are at. Talk with people, search data archives, and/or read the literature to find relevant datasets. Do people follow up with conversations? Is there engagement? Are we finding data that matches our needs?
Open Ticket Collect all available information on the data set. Document dataset in a new ticket (Issue). Can other people find the dataset? Do they know where to go for information about the data?
Evaluate Place data contribution on context of desired product. Extend the ticket to include how it overlaps. Can we make a decision on prioritization of this specific data contribution over other candidates?
Transformation
Annotations Read though documentation. Annotate the dataset with id-variable-type-entry tuple. Do the annotations match the data? Are the level of method descriptions clear?
Read script Understand the data model. Transform the dataset into a standardized id-variable-type-entry tuple. Are the transformations that the data went through clear (good comments)?
Integration Identify comparable variable-methods. Integrate the new data collection. Is it clear how the variables connected across the data and why? Could someone else reasonably agree with your decisions?
Output
QA/QC Identify needed visuals, min/max, and control vocabulary checks for new data contribution. QA/QC the data collection with this new data contribution. How does the data contribution compare with the larger collection?
merge to master Create pull request and identify reviewer Merge into the main branch. Did the main branch break? Does the collection still work? Can folks assemble the collection from scripts
publish Identify who is interested in the new updates Announce new data availability. Did we get new downloads of the repository? Were there questions that need to be added to the documentation that we missed?
Clone this wiki locally