From 8ab5eb9bb33ee90fb6e2e5bb18d6bbd223ab8185 Mon Sep 17 00:00:00 2001 From: Claire Trenham Date: Fri, 29 Apr 2022 17:59:40 +1000 Subject: [PATCH] Create retire-unpublished.md I have not added this file to the toc until it's been reviewed!!! Please add on merge if approved @paolap or @chloemackallah I am not happy with the number rendering here where I've done 'custom' things, if you wish to rework the numbered lists I think it'd help? This commit attempt to address most of Sharon's issues raised in #10 --- Governance/retire/retire-unpublished.md | 29 +++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 Governance/retire/retire-unpublished.md diff --git a/Governance/retire/retire-unpublished.md b/Governance/retire/retire-unpublished.md new file mode 100644 index 0000000..0131921 --- /dev/null +++ b/Governance/retire/retire-unpublished.md @@ -0,0 +1,29 @@ +# Use case: unpublished data + +This use case addresses the plethora of data that is associated with published data creation. That is, storage use including, but not limited to: + +- Model configuration data +- Failed model run output +- Successful model run output +- Data prepared for collaborative sharing but not publication/DOI +- Intermediate data products + +How each of these scenarios is handled will typically be determined on a project basis, with a view to the importance of **reproducibility** and considering relative **compute or storage costs**. + +## Suggested procedures + +**If compute is readily available but storage is limited** + +1. Maintain a database or wiki of model runs +2. Create zip archives of model configurationsand move to slow access tape storage if they are required to be kept for reproducibility +3. If model run failed, remove data immediately +4. If model run was successful and post-processing has been completed (and if bit-reproducibility across systems is not a concern), then data can be removed, perhaps after an initial quarantine period for data validation +5. Intermediate data products and collaborative data can be retired at the end of their active projects, following a quaratine period + +5a. Some intermediate data may not have a logical project end, such as regridded CMIP data - such data might follow a similar approach as the replciated data use case. + +**If compute is limited but deep storage is readily available** + +Repeat steps 1-3 as above. + +4. Following post-processing and validation, or at project close, data should be tarred and trasnferred to a deep storage system