-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX(psoct): avoid loading all slices when files are "old matlab format" #27
Conversation
…nto oct_mat_to_zarr
…amid): do not crash if last chunk in a row only has a single voxel
…nto oct_mat_to_zarr
dat = omz[str(level - 1)][tuple(slicer)] | ||
|
||
# Discard the last voxel along odd dimensions | ||
crop = [0 if x == 1 else x % 2 for x in dat.shape[-ndim:]] | ||
crop = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what is this for? Why do we need to use this full shape instead? Since this changes breaks another test from another modality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We divide the previous resolution by 2. Thinking in 1D, to do this we reshape a dimension [N] into a dimension [N//2, 2] (essentialy you get a stack of the odd and even voxels) and then average across the new small axis.
However, we can only do this if the original dimension is even. So if it's odd I crop the last voxel (by doing something like array[:-1]).
That said, I think that the current code assumes that the data is exactly 3D (no channel dimension). I might have fixed it in the other PR (to be merged).
What error do you get, and on what kind of data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The error from another modality is basically dimension not matching which makes sense to me now. Could you please point me to the other PR if it is not the one we just merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the one you just merged
The previous code was memory efficient when files where in h5 format ("new mat files"), but was loading all slices at once
when they were "old mat files". I now wrap the files in H5/MAT wrapper that maps or load the data on demand, and loop across slices one at a time (rather than reading chunks of slices as before). It's a bit less efficient when h5 files are used but much more memory efficient when old files are used. We could think of having two different fallabacks based on file version in the future, but I am not sure it's necessary.
I've also made the code a bit more general so that it does not break if the slices have a channel dimension (altough I don't have such files to test, and assume that the channel dimension is the first dimension, which is unlikely).
I've also made guesing the key a bit more robust in old mat files by skipping "private" keys (that start with an underscore).