-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior for bn.move_std with float32 array #164
Comments
Hi there. Can you paste a small segment of the big array? Perhaps 200 elements that contain a 121 element window that gives a std of zero. |
Sure, -2.5284750462 |
I'm trying to get a small example that I can easily play with. So if you have something along the lines of
Then I can copy and paste and play. |
Oh, wait, so the array snippet you gave doesn't reproduce the problem? Yeah, there is a path dependence. So the starting point can matter. |
Unfortunately I haven’t been able to reproduce it on a small dataset |
I can reproduce the zero std with float32. First create an array:
No problems with float64:
But problems with float32:
Difference in std:
It might help if I could create a smaller example. But right now I've got a headache from caulking the bathroom sink. That stuff is vile. |
Ok, by line 3 it looks like it has something to do with the spikes in the data? Interesting. |
I suspect that all this is related to #99 and the corresponding PR. I found a smaller example:
The zeros probably come from this line in the move_std C code:
which I think is to prevent negative std due to round-off error. |
The change was made to handle this situation:
That used to not work before. |
This issue is possibly related with this one: pydata/xarray#1346 |
@kwgoodman I am not sure about the smaller example presented in your next to last post. If I understand the code correctly you run a 3 element window over the data which has 7 repeated 100. Thus there should be 5 cases where the window consists entirely of 100s and thus has a sample standard deviation of 0. In this case the float64 example seems to suffer from some residual value from the earlier computations following over into the later windows and thus does not give 0.0 exactly. I am by no means an expert on floating points but 5.50604123e-07**2 ≅ 3.0e-13 which I think is on the border of the precision of 64-bit floating points, thus probably as good as can be expected from the variance computation. But a fix for the real issue in the earlier example could be to do the calculations in a higher precision, i.e. using float64 for delta, amean and assqdm and just output as float32? This would probably lead to a slowdown for float32 computations. |
@adament Keeping the intermediate values in float64 sounds like a great idea. I'll keep it on my list of things to test. |
@adament In order to test the idea of using float64 intermediate values, I need a unit test that demonstrates the problem. Does anyone have one? It may take me a while to get around to testing this so anyone is welcome to give it a try. |
I think this is a small illustrative example (though not minimal): a = np.empty(10)
a[:5] = 1.0e9
a[5::2] = 1.0
a[6::2] = 0.0
bn.move_std(a, 3)
bn.move_std(a.astype(np.float32), 3) In this case neither the 64-bit nor the 32-bit floating point computation is correct in the end. As there is too large a residual from the earlier large values. |
If it works for neither 32 nor 64 bit then I don't think I can use it for a unit test (?) |
@kwgoodman I am sorry for the late reply, you can fix the test case such that it works for the 64-bit case by making it a bit less extreme: a = np.empty(10)
a[:5] = 1.0e6
a[5::2] = 1.0
a[6::2] = 0.0
bn.move_std(a, 3)
bn.move_std(a.astype(np.float32), 3)
bn.move_std(a, 3) - np.asarray([np.nan, np.nan] + [[i:i+3].std() for i in range(8)]) Then the error of the 64-bit computation compared to the non-rolling computation is on the order of 1e-5 which is about 4 digits agreement. I suspect it will be hard to improve significantly on this while retaining a moving window method of computation that accumulates precision error. |
Is the idea of "using float64 intermediate values" still in the TODO list? I have tested this idea and it actually improves the precision problem when using float32. And it only requires few lines of code modification. For example: bottleneck/bottleneck/src/move_template.c Line 159 in b8f4c52
Just change it to |
Hello @jkadbear, more information on the current state of the project can be found in the latest messages here: #388. TL,DR: the state of the PR accepted for this feature :) |
PR is made. #407. |
Hello,
A colleague and I have noticed that using the move_std method on a large float 32 array can return some strange results, mainly zeros when the std of the window is not 0. Moreover, if the move_std is calculated again for just the window of interest it returns the correct result. When the array is cast to dtype np.float64 it seems to work fine. Is float32 not supported? If so the operation should probably raise an exception, or at least a warning.
I have attached a jupyter notebook and the data demonstrating the issue. Basically:
fails because zeros were found in the std array, but
does not fail as there are no zeros in the std array.
bad_move.zip
The text was updated successfully, but these errors were encountered: