You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This code is focused on a specific part of the workflow folks may need to do -- but we are also provided tools and utilities for other bits. So I think it's helpful to Document the suggested workflow, and that will also help us determine where to put code.
My first draft:
Goal:
Starting Point:
User has a set of data that can be loaded into xarray: could be files on disk, or files on AMS, or Kerchunked zarr dataset, or ....
User needs a subset of that data:
Restricted to:
a polygon in space
particular time frame
either a single vertical layer or all vertical layers (proper vertical subsetting can wait ...)
only the variables they need.
Outcome:
An xarray Dataset all ready to save to netcdf, or .....
That Dataset contains only what the user wants -- and is as similar as the original as possible. e.g. same names for all variables, maybe some additional metadata.
Workflow:
Step One:
User does any pre-processing required to get their data into a single, conforming dataset.
In many cases, there's nothing to be done, but it some cases, there may be work to be done:
The grid and dat variables are in multiple files, they need to be combined into one dataset
If there are "troublesome" variables -- e.g. time coordinates that aren't correct, etc.
As a rule, this will be model specific, maybe even implementation-of-model specific.
This package can't provide all of that, but it can (and should) provide a few examples for common cases.
e.g. SCHISM (STOFS), maybe FVCOM fixing teh time variable (some use single precision float days :-()
Step 2:
The user processes the Dataset to make it CF compliant (or enough so that the subsetting code can work)
This package will contain utilities to do that, e.g.
ugrid.assign_ugrid_topology()
Step 3:
The Dataset can be queried by the user to find out what they need to know in order to specify a subset:
what variables are in the dataset
what timespan is covered
what region is covered (maybe?)
whether it's 2D or 3D ?
Step 4:
The user makes a request for a subset.
Result -- a subset Dataset.
The text was updated successfully, but these errors were encountered:
This code is focused on a specific part of the workflow folks may need to do -- but we are also provided tools and utilities for other bits. So I think it's helpful to Document the suggested workflow, and that will also help us determine where to put code.
My first draft:
Goal:
Starting Point:
User has a set of data that can be loaded into xarray: could be files on disk, or files on AMS, or Kerchunked zarr dataset, or ....
User needs a subset of that data:
Outcome:
An xarray Dataset all ready to save to netcdf, or .....
That Dataset contains only what the user wants -- and is as similar as the original as possible. e.g. same names for all variables, maybe some additional metadata.
Workflow:
Step One:
User does any pre-processing required to get their data into a single, conforming dataset.
In many cases, there's nothing to be done, but it some cases, there may be work to be done:
As a rule, this will be model specific, maybe even implementation-of-model specific.
This package can't provide all of that, but it can (and should) provide a few examples for common cases.
e.g. SCHISM (STOFS), maybe FVCOM fixing teh time variable (some use single precision float days :-()
Step 2:
The user processes the Dataset to make it CF compliant (or enough so that the subsetting code can work)
This package will contain utilities to do that, e.g.
ugrid.assign_ugrid_topology()
Step 3:
The Dataset can be queried by the user to find out what they need to know in order to specify a subset:
Step 4:
The user makes a request for a subset.
Result -- a subset Dataset.
The text was updated successfully, but these errors were encountered: