Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zarr.array from from an existing zarr.Array #2622

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

brokkoli71
Copy link
Member

@brokkoli71 brokkoli71 commented Jan 2, 2025

added concurrent streaming of source array into new array

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@brokkoli71 brokkoli71 marked this pull request as draft January 2, 2025 16:54
@brokkoli71
Copy link
Member Author

Do we also want concurrency for different chunk sizes?

@normanrz
Copy link
Member

normanrz commented Jan 8, 2025

Do we also want concurrency for different chunk sizes?

That would be nice, if the chunk sizes are somewhat compatible, i.e. one is a multiple of the other.

src/zarr/core/array.py Outdated Show resolved Hide resolved
@d-v-b
Copy link
Contributor

d-v-b commented Jan 8, 2025

  • (Is there some measure to prevent this that I am not aware of?)

if you are trying to write K input chunks into M output chunks, you can partition your K chunks into sets, where within each set elements can be written independently from all the other elements. then you write each set one after another. in the worst case scenario there will be 1 set per chunk, but you are guaranteed to avoid write collisions this way.

@dstansby dstansby added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 9, 2025
src/zarr/core/array.py Outdated Show resolved Hide resolved
@d-v-b
Copy link
Contributor

d-v-b commented Jan 14, 2025

one question to answer here is what "auto" means for chunks if the user passes in a chunked array, but they want to use zarr-python's auto-chunking instead of the chunks that came with the array.

We might want to use a separate value that means "copy the chunks this object already has", which is distinct from "generate some chunks using the chunking heuristics". maybe something like ChunksLike: Literal['auto'] | Literal['keep'] | ShapeLike?

@brokkoli71
Copy link
Member Author

brokkoli71 commented Jan 15, 2025

one question to answer here is what "auto" means for chunks if the user passes in a chunked array, but they want to use zarr-python's auto-chunking instead of the chunks that came with the array.

Good point! I like the idea of distinguishing between keep and auto.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 15, 2025
Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 96.82540% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.43%. Comparing base (eaf5d7a) to head (58f05fe).
Report is 431 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/core/array.py 96.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2622      +/-   ##
==========================================
- Coverage   99.98%   90.43%   -9.56%     
==========================================
  Files          38       59      +21     
  Lines       14718     6531    -8187     
==========================================
- Hits        14716     5906    -8810     
- Misses          2      625     +623     
Files with missing lines Coverage Δ
src/zarr/__init__.py 100.00% <ø> (ø)
src/zarr/api/asynchronous.py 85.57% <100.00%> (ø)
src/zarr/api/synchronous.py 86.48% <100.00%> (ø)
src/zarr/core/array.py 93.82% <96.42%> (ø)

... and 93 files with indirect coverage changes

@brokkoli71 brokkoli71 marked this pull request as ready for review January 15, 2025 19:55
# Conflicts:
#	src/zarr/core/array.py
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 30, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Jan 30, 2025

now that #2761 is in, could we use from_array inside create_array (after the data / dtype / shape validation)?

@brokkoli71
Copy link
Member Author

brokkoli71 commented Jan 30, 2025

now that #2761 is in, could we use from_array inside create_array (after the data / dtype / shape validation)?

@d-v-b Currently, in this PR from_array calls create_array. Is it redundant to have both from_array and create_array to create an array from another array? Or do you see a benefit in having both?

@d-v-b
Copy link
Contributor

d-v-b commented Jan 30, 2025

now that #2761 is in, could we use from_array inside create_array (after the data / dtype / shape validation)?

@d-v-b Currently, in this PR from_array calls create_array. Is it redundant to have both from_array and create_array to create an array from another array? Or do you see a benefit in having both?

create_array should call from_array if the user provided data; from_array should call the newly added init_array to persist the array metadata, and then store the array data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] zarr.array from from an existing zarr.Array
4 participants