You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For 4-D STEM there are some operations which would work significantly better with the dataset chunked equally in all dimensions. For example if you want to do something like make a virtual image, or apply a gaussian filter in real space. This has traditionally been a pain because zarr likes large chunks which translate to fast parallel operations. Similarly, hyperspy likes no chunks in the signal dimensions for the map function and for plotting.
With the V3 spec for zarr and the sharding implementation we might be able to rethink how we handle things. For example we could have the data in a format like:
Where it essentially acts like the current ideal data strucuture but within the sharded dataset there are small chunks which operate fast along certain dimensions. This allows us to create virtual images without loading the entire dataset into memory and reduce the memory footprint when doing things like rechunking.
This might not be ready (quite yet) as there are some issues to solve regaurding speeding up the sharding implementation. zarr-developers/zarr-python#1338
It is worth a disucssion about if this is something worth persuing.
The text was updated successfully, but these errors were encountered:
Describe the functionality you would like to see.
For 4-D STEM there are some operations which would work significantly better with the dataset chunked equally in all dimensions. For example if you want to do something like make a virtual image, or apply a gaussian filter in real space. This has traditionally been a pain because zarr likes large chunks which translate to fast parallel operations. Similarly, hyperspy likes no chunks in the signal dimensions for the
map
function and for plotting.With the V3 spec for zarr and the sharding implementation we might be able to rethink how we handle things. For example we could have the data in a format like:
Where it essentially acts like the current ideal data strucuture but within the sharded dataset there are small chunks which operate fast along certain dimensions. This allows us to create virtual images without loading the entire dataset into memory and reduce the memory footprint when doing things like rechunking.
This might not be ready (quite yet) as there are some issues to solve regaurding speeding up the sharding implementation. zarr-developers/zarr-python#1338
It is worth a disucssion about if this is something worth persuing.
The text was updated successfully, but these errors were encountered: