-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example notebooks exhibiting the JuliaClimate stack #4
Comments
Awesome!
I'm wondering if we can use directly NCDatasets.jl for that point. It is certainly efficient and effortless and extendable. The part about scalability is less clear though. There is support for larger than RAM datasets and some things about using Dagger has been done but I must admit I still haven't had the time to test those features. With that being said, ESDL is perhaps a more generic candidate (with support for other format). From my test, it seems to check all the boxes. However, I'm not certain how to configure everything for a distributed approach: exposing (and using) the cluster in ESDL.
This was my initial aim with ClimateTools. I'm not there yet and my work focus at the time was to implement a quantile-quantile bias correction technique. A lot of time was spent though on working to implement extraction and utility functions. Ideally, those shouldn't be necessary now that we have newer packages for that.
Totally agree! |
See https://github.com/esa-esdl/ESDL.jl/issues/170 It covers a lot of material for point #1. |
See #3 (comment) about a notebook stack that we started putting together |
It would be nice to have some examples of the existing JuliaClimate stack in action in notebooks (if there redundancies, then maybe even feature / performance comparisons?).
My personal goal for the next few weeks is to basically implement a Julia translation of my "big data" Python tutorial [binder link] which uses the following stack:
xarray
anddask
as the building blocks for reading and analyzing out-of-memory labelled NetCDF arraysintake
(generic organizational tool leveraging xarray) to read in netcdf files stored in a cloud-optimized zarr format in Google Cloud storage as xarray.Dataset instancesxgcm
for doing grid-aware operations (e.g. differentiation) on datasetsxmitgcm
(model-specific package) to process the dataset's non-rectangular native grid into something more rectangularThese are basically the four categories of packages that I see as necessary to replicate the kinds of workflows that I am interested (and which are extremely straight-forward using the existing Pangeo Python-stack, as in my example above):
I don't really know how much sense it makes in putting effort to develop 2, 3, and 4 if we haven't yet settled on a stable 1.
The text was updated successfully, but these errors were encountered: