-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failing with Dask 2024.12.0 #345
Comments
Failure from Commit dask/dask@511b8af |
@ericpre do you have any idea if this loads things into memory? It seems like it doesn't but the change in the array flags is slightly concering. import numpy as np
# Create a memmap array backed by a file
filename = 'my_array.dat'
dtype = np.float32
shape = (100, 100)
# Create the memmap
mmap_array = np.memmap(filename, dtype=dtype, mode='r', shape=shape)
m2 = mmap_array.copy()
m2 # mmap array still
print(mmap_array.flags)
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : False
# WRITEABLE : False
# ALIGNED : True
# WRITEBACKIFCOPY : False
print(m2.flags)
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : True
# WRITEABLE : True
# ALIGNED : True
# WRITEBACKIFCOPY : False It doesn't seem to from some tests, although copying does appear to change the array to being writeable.... |
Sorry no idea... |
Okay from my testing this is different but shouldn't be detrimental. You won't be able to write to the original array, but that is already the prefered behavior. There isn't a huge overhead for the copy so we can probably just remove the check. |
Sounds good, is there actually a simple way to test the behaviour (not possible to write?), maybe check that it raises an error? I am wondering, if this is still the case that the data can't be written? Could it be related to the fact that previously, dask array were not mutable but this has changed since the blockfile code has been written? |
So if we take this little bit of code: import numpy as np
# Create a memmap array backed by a file
filename = 'my_array.dat'
dtype = np.float32
shape = (100, 100)
# Create the memmap
mmap_array = np.memmap(filename, dtype=dtype, mode='w+', shape=shape)
# read the memmap
mmap_array2 = np.memmap(filename, dtype=dtype, mode='r', shape=shape)
#copy the read only memmap
m2 = mmap_array2.copy()
m2[:]=1 # This is possible now (should it be???) Maybe??
m2.flush() # This does nothing from what I can tell.....
print(np.all(mmap_array2 != m2)) # Arrays are not equal (True) I'm not sure what the intended behavior here should be. In any case this is something that should be determined in numpy. I've been meaning to make a issue but I'm a little unsure of what the behavior should be which is why I'm slightly hesistant. |
help(np.copy) documents the behaviour, scrolling a abit you will find this note:
Which is absolutely logical behaviour and sensible defaults. Please note that |
not only |
@sem-geologist I guess I might have not done exactly the right test. It is still a np.memmap object I guess but (maybe?) shouldn't be. For example: import numpy as np
# Create a memmap array backed by a file
filename = 'my_array.dat'
dtype = np.float32
shape = (100, 100)
# Create the memmap
mmap_array = np.memmap(filename, dtype=dtype, mode='w+', shape=shape)
# Read the memmap
mmap_array2 = np.memmap(filename, dtype=dtype, mode='r', shape=shape)
arrs = [mmap_array2.copy() for i in range(1000000)] Doesn't crash my computer but it does say that my memory useage is 47 GB (far more than the 16 GB it actually has) so something must be happening behind the scenes. The thing that really matters here is if this results in a large change in the memory usage when pairing np.memmap with dask. |
In the discussion on the dask PR, it is suggested that |
Describe the bug
Test passes with dask - 2024.11.2 and fails with dask 2024-12.0
To Reproduce
Additional context
Will look upstream to determine what has changed.
The text was updated successfully, but these errors were encountered: