-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different Objects Pointing at the Same store_path
can have different metadata
#2716
Comments
if the name of the root group is '', and |
@d-v-b I think that's the expected behavior, but that does not appear to be what is happening in v3. The final two asserts fails whereas I would expect them to succeed under your interpretation (and they do indeed pass under v2) |
Ok the issue here is actually quite deep, and may have tentacles elsewhere unforeseen. In short, because But this could happen with any two accesses: import zarr
z = zarr.open("foo.zarr")
location_group = z.create_group("location")
same_location_group_but_different_object = z["location"]
location_group.attrs.setdefault("foo", "bar")
assert location_group.attrs.asdict() == { "foo": "bar" }
assert same_location_group_but_different_object.attrs.asdict() == { "foo": "bar" } # fails in v3
assert same_location_group_but_different_object == location_group # fails, probably for the best The failing final check here and above make sense in that the attributes are different, but not in that it shouldn't happen. |
store_path
can have different metadata
I could see the way forward here as checking if the store is accepting alterations (write, append etc.) and then force re-reads of the metadata in this case. That would preserve the performance boost for read-only copies. |
Another option would be ensuring metadata objects are always shared, but this could maybe get hairy with consistency...Although I suppose the other option could as well. |
zarr-python 2.x had a correction: |
What was the default value for that though? I would think that we would want it to be |
for arrays the default values for both was Lines 125 to 126 in b1480d7
And for that reason this kind of thing was possible in v2 -- if There are situations where you can be sure that the attributes won't change, and in those cases caching saves IO, but there are many other situations, and for them we should definitely expose a way to not cache the attributes. |
Perhaps the difference is that this example is a group? I checked and this issue does not occur in v2. Maybe I am wrong though? |
I've tried my hand at this for a few hours and it is quite difficult because of how the |
thanks for working on this @ilan-gold, if I have some time soon I will try running your example and I will see if I can explain what's going on. |
Ok @d-v-b I found out why it works previously in v2. The first access to So here I can get it to fail: import zarr
z = zarr.open("foo.zarr")
location_group = z.create_group("location")
same_location_group_but_different_object = z["location"]
same_location_group_but_different_object.attrs.asdict() # access first from the different location before the update
location_group.attrs.setdefault("foo", "bar")
assert location_group.attrs.asdict() == { "foo": "bar" }
assert same_location_group_but_different_object.attrs.asdict() == { "foo": "bar" } # fails in v3
assert same_location_group_but_different_object == location_group # fails, probably for the best |
Given this fact, I am tempted to close the issue as no-fix, pending (hopefully) a note added to the zarr documentation about this fact (all metadata is declaration-time-only valid). I think in general, it could be good to add something about the consistency guarantees of a zarr object. I wonder if other folks in this project are aware of other thorny sides around read-write consistency perhaps (maybe the icechunk folks?). |
Zarr version
v3.0.0
Numcodecs version
0.14.1
Python Version
3.12
Operating System
Mac
Installation
uv
Description
Something very subtle appears to be going on here when doing something like
my_group["/"]
- previously in v2, this operation was idempotent in thatmy_group == my_group["/"]
.I am not sure about how this sort of thing is actually supposed to work, or if the old behavior itself was buggy.
Steps to reproduce
v2/v3 code:
Additional output
No response
The text was updated successfully, but these errors were encountered: