Skip to content

Commit

Permalink
deploy: d183ba9
Browse files Browse the repository at this point in the history
  • Loading branch information
chloemackallah committed Oct 26, 2023
1 parent d6a3147 commit b0028d7
Show file tree
Hide file tree
Showing 89 changed files with 902 additions and 2,538 deletions.
2 changes: 1 addition & 1 deletion _sources/concepts/license-qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The license is enforceable in court, but clearly that's an extreme step. Usually
* <ins>How can my license be valid if a project or myself act as licensor when the copyright belongs to my institution?</ins><br>
If you are the creator of the data/code then you can apply a license on behalf of your institution. They won't mind as long as the license you are using is in line with their recommendations. Most Australian universities and the ARC, which funds most projects, require open access for any research product (unless there is a valid reason not to).<br>

*<ins>How can I license data partly derived from a "commercial" product?</ins><br>
* <ins>How can I license data partly derived from a "commercial" product?</ins><br>
You should first check if there is an agreement allowing you to use the data and if this agreement covers publishing derived data. If this is not in place a way around it could be to leave out the commercial data used in the project and substituted with a derived quantity.
In this [example](https://zenodo.org/record/4448518#.Y322MuxBz0o) the authors removed the wind speed mesaurements they used to identify a “severe wind event” and introduce a variable indicating if such event occured or not to ensure at least partial reproducibility.<br>

Expand Down
164 changes: 112 additions & 52 deletions _sources/create/create-basics.md

Large diffs are not rendered by default.

24 changes: 10 additions & 14 deletions _sources/create/create-intro.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,20 @@
# Guidelines to create a climate dataset

## UNDER DEVELOPMENT

## Scope of the guidelines

These guidelines cover the various aspects of creating robust and well-described climate data for reuse, analysis, sharing, and publication.

We have identified five primary use cases that guide the recommendations and requirements you should follow when creating your climate datasets:
1. for your own reuse and analysis (basic dataset needs)
2. sharing with colleagues for collaboration (minimum sharing recommendations, no citation necessary)
3. for publication alongside a research paper (journal requirements apply)
4. for publication into a large multi-institutional intercomparison project like CMIP (strict standards apply)
5. for productisation, including market-readiness and commercialisation (standards to be defined)
We have identified five primary use cases that guide the recommendations and requirements to follow when creating climate datasets:
1. Own reuse and analysis: basic dataset needs.
2. Sharing with colleagues for collaboration: minimum sharing recommendations, no citation necessary.
3. Publication alongside a research paper: journal requirements apply.
4. Publication into a specific project: project standards apply.
5. Productisation, including market-readiness and commercialisation: standards depend on audience and intended use.

Additionally, we have identified two main situations you may find yourself in: i) preparing your datasets from scratch (i.e. you have 'raw' data that is currently undescribed, and in a format that is not analysis-ready); or ii) deriving metrics or indices from a reference dataset (e.g. performing an analysis on CMIP data for a research publication). We will mostly be discussing the first situation where you are creating climate data from scratch, with specific recommendatations for the second situation later in the section.
We will mostly be discussing starting datasets from scratch from 'raw' data that is currently undescribed, and in a format that is not analysis-ready. Datasets can also be derived from existing data, as result of analysis or deriving metrics and indices from a reference dataset. We provide specific recommendations for the second situation later in the section.


## Index
* [Dataset creation basics & sharing recommendations](create-basics.md)
This is an overview of the landscape of climate datasets, including the various components of netCDF files and their storage in POSIX systems, and some best practice recommendations for the back up of data and management of the creation process.
* [Dataset creation basics](create-basics.md)
An overview of the landscape of climate datasets, including the various components of netCDF files and their storage in POSIX systems, and best practice recommendations for the backup of data and management of the creation process.

* File formats, metadata & coordinates
* File & directory organisation
Expand All @@ -36,7 +32,7 @@ This is the more practical description of how to create climate datasets (genera

&nbsp;
* [Requirements for publication & productisation](create-publishing.md)
This chapter outlines the standards for publication data that either accompanies a journal article or is submitted to an intercomparison project (e.g. CMIP), and some recommendations for tools to aid this process.
This chapter outlines the standards for publication data that either accompanies a journal article or is submitted to an intercomparison project (e.g., CMIP), and some recommendations for tools to aid this process.

* Publishing in a journal
* Submitting to an intercomparison project
Expand Down
8 changes: 7 additions & 1 deletion _sources/create/create-new-derived.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# New, modified, and derived datasets

## Creating new datasets from raw data
Paola (new comments following meeting Sep23):

We discussed here mentioning tools to generate/modify a netcdf file (ncdump/ncgen, nco to modify attributes, how xarray/matlab "create" netcdf file))
rather than trying to re-create every possible workflow.
As well as things a user should check to make sure they're following the reccomendations listed in create-basics. For example ar ethe attributes still relevant both at global and variable level?


Paola:
however rare, we could cover starting from a template, as for a cdl file (i.e. a ncdump output style file)
Expand Down Expand Up @@ -31,4 +37,4 @@ Make sure original attributes/documentation are still relevant
be careful particularly with units, cell_methods and coordinates that might have changed

Chloe:
Provenance: https://acdguide.github.io/Governance/concepts/provenance.html
Provenance: https://acdguide.github.io/Governance/concepts/provenance.html
34 changes: 31 additions & 3 deletions _sources/tech/drs.md → _sources/tech/drs-names.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Choosing a directory structure (DRS) and filenames
# Choosing a directory structure and filenames

The names you choose for files and directories and generally the way you organise your data, i.e. your directory structure, can help navigating the data, provide extra information, avoid confusion and avoid the user ending up accessing the wrong data. In many cases the best file organisation will depend on the specific research project and the actual server where the data is stored. The global climate modelling intercomparison project (CMIP) has adopted a **Data Reference Syntax (DRS)**, based on the **controlled vocabularies (CVs)** used in model metadata, to define their file names and directory structures.
The names you choose for files and directories and generally the way you organise your data, i.e. your directory structure, can help navigating the data, provide extra information, avoid confusion and avoid the user ending up accessing the wrong data. In many cases the best file organisation will depend on the specific research project and the actual server where the data is stored. The global Climate Modelling Intercomparison Project (CMIP) has adopted a **Data Reference Syntax (DRS)**, based on the **controlled vocabularies (CVs)** used in model metadata, to define their file names and directory structures.
Here we list a few guidelines and tips to help you decide.

## General considerations
Expand All @@ -9,7 +9,7 @@ Here we list a few guidelines and tips to help you decide.
* Be consistent, this applies both to the organisation and the naming, consistency is essential for the data to be machine-readable, i.e. data which is easy to access by coding. In fact, use community standards and/or controlled vocabularies wherever possible.
* Consider adding a `readme` file in the main directory, including an explanation of the DRS and the naming conventions, abbreviation and/or codes you used. If you used standards and controlled vocabularies all you have to do is to include a link to them.

## DRS
## Directory structure

![Example of directory structure](../images/example_drs.png)

Expand All @@ -29,3 +29,31 @@ for the final output. Also consider how others might use them: are they going to

* The CMIP6 DRS is defined in the [CMIP6 Controlled Vocabularies document](https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit), starting on p.13.
* The [CORDEX DRS](http://is-enes-data.github.io/CORDEX_adjust_drs.pdf) builds on the CMIP DRS to apply to regional climate models.

## File naming
You can use filenames to include information as:

* project, simulation and/or experiment acronyms, you might have to use a combination of them
* spatial coverage: the region or coordinates range covered by the data, could also be a specific domain for climate model data, e.g., ocean, land etc.
* grid: could be either a grid label or spatial resolution
* temporal coverage: a specific year/date or a temporal range
* temporal frequency: monthly, daily etc.
* type of data: again this depends on context, if the same directory contains data from different instrumentations it is important to specify the instrument in the name. For coupled model output this could be the model component, if you are using one file per variable, the variable name
* version: this is really important if you are sharing the data even if only 1 version exists at the time
* correct file extension

# Tips for machine-readable files
* avoid special characters: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ‘ “
* do not use spaces to separate words; use underscores "_" or dashes "-" or CamelCase
* use YYYYMMDD for dates, it will sort your files in chronological order, absolutely avoid "Jan, Feb, .." for months as they are much harder to code for.
* for number sequences, use leading zeros: so 001, 002,.. 020,.. 103 rather than 1, 2,.. 20, .. 103
* try to avoid overly long names - for a single file directory keep it under 255 characters, for paths 30000.
* avoid having a large number of files in a single directory, but also an excessive number of directories with one file each
* always include file extension, some software can recognise files from their header, but this is not always the case

## Online Resources
We partially based this page on the resources listed below, and recommend checking them for more insight and advice.

* [Best practice to organise your data](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/get-started-open-reproducible-science/best-practices-for-organizing-open-reproducible-science/) - part of an Open reproducible science course from the University of Colorado
* [Software Carpentry video covering DRS best practices](https://youtu.be/3MEJ38BO6Mo)
* [Best file naming practice handout (pdf) from Standford University](https://stanford.box.com/shared/static/yl5a04udc7hff6a61rc0egmed8xol5yd.pdf)
27 changes: 0 additions & 27 deletions _sources/tech/filenames.md

This file was deleted.

Binary file modified _static/__pycache__/__init__.cpython-38.pyc
Binary file not shown.
24 changes: 6 additions & 18 deletions about-us/goals.html
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../create/create-basics.html">
Dataset creation basics &amp; sharing recommendations
Dataset creation basics
</a>
</li>
<li class="toctree-l1">
Expand Down Expand Up @@ -500,22 +500,10 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
Contributor roles
</a>
</li>
<li class="toctree-l2 has-children">
<a class="reference internal" href="../tech/drs.html">
Choosing a directory structure (DRS) and filenames
<li class="toctree-l2">
<a class="reference internal" href="../tech/drs-names.html">
Choosing a directory structure and filenames
</a>
<input class="toctree-checkbox" id="toctree-checkbox-8" name="toctree-checkbox-8" type="checkbox"/>
<label for="toctree-checkbox-8">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l3">
<a class="reference internal" href="../tech/filenames.html">
Naming
</a>
</li>
</ul>
</li>
<li class="toctree-l2">
<a class="reference internal" href="../tech/keywords.html">
Expand Down Expand Up @@ -585,8 +573,8 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
<a class="reference internal" href="../appendix/appendix-intro.html">
Supplemental material
</a>
<input class="toctree-checkbox" id="toctree-checkbox-9" name="toctree-checkbox-9" type="checkbox"/>
<label for="toctree-checkbox-9">
<input class="toctree-checkbox" id="toctree-checkbox-8" name="toctree-checkbox-8" type="checkbox"/>
<label for="toctree-checkbox-8">
<i class="fas fa-chevron-down">
</i>
</label>
Expand Down
24 changes: 6 additions & 18 deletions about-us/how-to-contribute.html
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../create/create-basics.html">
Dataset creation basics &amp; sharing recommendations
Dataset creation basics
</a>
</li>
<li class="toctree-l1">
Expand Down Expand Up @@ -500,22 +500,10 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
Contributor roles
</a>
</li>
<li class="toctree-l2 has-children">
<a class="reference internal" href="../tech/drs.html">
Choosing a directory structure (DRS) and filenames
<li class="toctree-l2">
<a class="reference internal" href="../tech/drs-names.html">
Choosing a directory structure and filenames
</a>
<input class="toctree-checkbox" id="toctree-checkbox-8" name="toctree-checkbox-8" type="checkbox"/>
<label for="toctree-checkbox-8">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l3">
<a class="reference internal" href="../tech/filenames.html">
Naming
</a>
</li>
</ul>
</li>
<li class="toctree-l2">
<a class="reference internal" href="../tech/keywords.html">
Expand Down Expand Up @@ -585,8 +573,8 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
<a class="reference internal" href="../appendix/appendix-intro.html">
Supplemental material
</a>
<input class="toctree-checkbox" id="toctree-checkbox-9" name="toctree-checkbox-9" type="checkbox"/>
<label for="toctree-checkbox-9">
<input class="toctree-checkbox" id="toctree-checkbox-8" name="toctree-checkbox-8" type="checkbox"/>
<label for="toctree-checkbox-8">
<i class="fas fa-chevron-down">
</i>
</label>
Expand Down
24 changes: 6 additions & 18 deletions about-us/working-group-governance.html
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../create/create-basics.html">
Dataset creation basics &amp; sharing recommendations
Dataset creation basics
</a>
</li>
<li class="toctree-l1">
Expand Down Expand Up @@ -500,22 +500,10 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
Contributor roles
</a>
</li>
<li class="toctree-l2 has-children">
<a class="reference internal" href="../tech/drs.html">
Choosing a directory structure (DRS) and filenames
<li class="toctree-l2">
<a class="reference internal" href="../tech/drs-names.html">
Choosing a directory structure and filenames
</a>
<input class="toctree-checkbox" id="toctree-checkbox-8" name="toctree-checkbox-8" type="checkbox"/>
<label for="toctree-checkbox-8">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l3">
<a class="reference internal" href="../tech/filenames.html">
Naming
</a>
</li>
</ul>
</li>
<li class="toctree-l2">
<a class="reference internal" href="../tech/keywords.html">
Expand Down Expand Up @@ -585,8 +573,8 @@ <h1 class="site-logo" id="site-title">Climate Data Guidelines</h1>
<a class="reference internal" href="../appendix/appendix-intro.html">
Supplemental material
</a>
<input class="toctree-checkbox" id="toctree-checkbox-9" name="toctree-checkbox-9" type="checkbox"/>
<label for="toctree-checkbox-9">
<input class="toctree-checkbox" id="toctree-checkbox-8" name="toctree-checkbox-8" type="checkbox"/>
<label for="toctree-checkbox-8">
<i class="fas fa-chevron-down">
</i>
</label>
Expand Down
Loading

0 comments on commit b0028d7

Please sign in to comment.