Skip to content

Commit

Permalink
Create retire-unpublished.md
Browse files Browse the repository at this point in the history
I have not added this file to the toc until it's been reviewed!!! Please add on merge if approved @paolap or @chloemackallah 

I am not happy with the number rendering here where I've done 'custom' things, if you wish to rework the numbered lists I think it'd help?

This commit attempt to address most of Sharon's issues raised in #10
  • Loading branch information
hot007 authored Apr 29, 2022
1 parent 98b1299 commit 8ab5eb9
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions Governance/retire/retire-unpublished.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Use case: unpublished data

This use case addresses the plethora of data that is associated with published data creation. That is, storage use including, but not limited to:

- Model configuration data
- Failed model run output
- Successful model run output
- Data prepared for collaborative sharing but not publication/DOI
- Intermediate data products

How each of these scenarios is handled will typically be determined on a project basis, with a view to the importance of **reproducibility** and considering relative **compute or storage costs**.

## Suggested procedures

**If compute is readily available but storage is limited**

1. Maintain a database or wiki of model runs
2. Create zip archives of model configurationsand move to slow access tape storage if they are required to be kept for reproducibility
3. If model run failed, remove data immediately
4. If model run was successful and post-processing has been completed (and if bit-reproducibility across systems is not a concern), then data can be removed, perhaps after an initial quarantine period for data validation
5. Intermediate data products and collaborative data can be retired at the end of their active projects, following a quaratine period

5a. Some intermediate data may not have a logical project end, such as regridded CMIP data - such data might follow a similar approach as the replciated data use case.

**If compute is limited but deep storage is readily available**

Repeat steps 1-3 as above.

4. Following post-processing and validation, or at project close, data should be tarred and trasnferred to a deep storage system

0 comments on commit 8ab5eb9

Please sign in to comment.