Test Sharding Implementation #260

CSSFrancis · 2024-05-13T17:16:58Z

Describe the functionality you would like to see.

For 4-D STEM there are some operations which would work significantly better with the dataset chunked equally in all dimensions. For example if you want to do something like make a virtual image, or apply a gaussian filter in real space. This has traditionally been a pain because zarr likes large chunks which translate to fast parallel operations. Similarly, hyperspy likes no chunks in the signal dimensions for the map function and for plotting.

With the V3 spec for zarr and the sharding implementation we might be able to rethink how we handle things. For example we could have the data in a format like:

Where it essentially acts like the current ideal data strucuture but within the sharded dataset there are small chunks which operate fast along certain dimensions. This allows us to create virtual images without loading the entire dataset into memory and reduce the memory footprint when doing things like rechunking.

This might not be ready (quite yet) as there are some issues to solve regaurding speeding up the sharding implementation. zarr-developers/zarr-python#1338

It is worth a disucssion about if this is something worth persuing.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Sharding Implementation #260

Test Sharding Implementation #260

CSSFrancis commented May 13, 2024 •

edited

Loading

Test Sharding Implementation #260

Test Sharding Implementation #260

Comments

CSSFrancis commented May 13, 2024 • edited Loading

Describe the functionality you would like to see.

CSSFrancis commented May 13, 2024 •

edited

Loading