Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example notebooks exhibiting the JuliaClimate stack #4

Open
hdrake opened this issue Jan 25, 2020 · 4 comments
Open

Example notebooks exhibiting the JuliaClimate stack #4

hdrake opened this issue Jan 25, 2020 · 4 comments

Comments

@hdrake
Copy link

hdrake commented Jan 25, 2020

It would be nice to have some examples of the existing JuliaClimate stack in action in notebooks (if there redundancies, then maybe even feature / performance comparisons?).

My personal goal for the next few weeks is to basically implement a Julia translation of my "big data" Python tutorial [binder link] which uses the following stack:

  1. xarray and dask as the building blocks for reading and analyzing out-of-memory labelled NetCDF arrays
  2. intake (generic organizational tool leveraging xarray) to read in netcdf files stored in a cloud-optimized zarr format in Google Cloud storage as xarray.Dataset instances
  3. xgcm for doing grid-aware operations (e.g. differentiation) on datasets
  4. xmitgcm (model-specific package) to process the dataset's non-rectangular native grid into something more rectangular

These are basically the four categories of packages that I see as necessary to replicate the kinds of workflows that I am interested (and which are extremely straight-forward using the existing Pangeo Python-stack, as in my example above):

  1. Basic data types that make handling large NetCDF-like datasets efficient (low overhead), effortless (intuitive and compact syntax), scalable (distributable, out-of-memory), and extendable (flexible and simple data structure types).
  2. Organizational packages that simplify the workflow (e.g. organizing model ensembles, models vs. observations, downloading scripts)
  3. Generic utility packages that extend the functionality of 1-type packages (e.g. for problem-specific, dimension-specific, or operation-specific uses).
  4. Model-specific utility packages that rely on model-specific metadata (e.g. some of what @natgeo-wong and @gaelforget are working on).

I don't really know how much sense it makes in putting effort to develop 2, 3, and 4 if we haven't yet settled on a stable 1.

@Balinus
Copy link
Member

Balinus commented Jan 26, 2020

Awesome!

  1. Basic data types that make handling large NetCDF-like datasets efficient (low overhead), effortless (intuitive and compact syntax), scalable (distributable, out-of-memory), and extendable (flexible and simple data structure types).

I'm wondering if we can use directly NCDatasets.jl for that point. It is certainly efficient and effortless and extendable. The part about scalability is less clear though. There is support for larger than RAM datasets and some things about using Dagger has been done but I must admit I still haven't had the time to test those features.

With that being said, ESDL is perhaps a more generic candidate (with support for other format). From my test, it seems to check all the boxes. However, I'm not certain how to configure everything for a distributed approach: exposing (and using) the cluster in ESDL.

2. Organizational packages that simplify the workflow (e.g. organizing model ensembles, models vs. observations, downloading scripts)
3. Generic utility packages that extend the functionality of 1-type packages (e.g. for problem-specific, dimension-specific, or operation-specific uses).

This was my initial aim with ClimateTools. I'm not there yet and my work focus at the time was to implement a quantile-quantile bias correction technique. A lot of time was spent though on working to implement extraction and utility functions. Ideally, those shouldn't be necessary now that we have newer packages for that.

I don't really know how much sense it makes in putting effort to develop 2, 3, and 4 if we haven't yet settled on a stable 1.

Totally agree!

@Balinus
Copy link
Member

Balinus commented Jan 30, 2020

See https://github.com/esa-esdl/ESDL.jl/issues/170

It covers a lot of material for point #1.

@gaelforget
Copy link
Member

See esa-esdl/ESDL.jl#170

It covers a lot of material for point #1.

Great! Also please see #2 where I added a somewhat related post

@gaelforget
Copy link
Member

See #3 (comment) about a notebook stack that we started putting together

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants