Unexpected behavior for bn.move_std with float32 array #164

d-chambers · 2017-02-28T17:02:13Z

Hello,

A colleague and I have noticed that using the move_std method on a large float 32 array can return some strange results, mainly zeros when the std of the window is not 0. Moreover, if the move_std is calculated again for just the window of interest it returns the correct result. When the array is cast to dtype np.float64 it seems to work fine. Is float32 not supported? If so the operation should probably raise an exception, or at least a warning.

I have attached a jupyter notebook and the data demonstrating the issue. Basically:

import bottleneck as bn
import numpy as np
import pandas as pd
# read data
input_arr = np.load('PXZ.EHZ.npy')
std = bn.move_std(input_arr, 121)
zeros = np.where(std == 0)
assert not len(zeros[0])

fails because zeros were found in the std array, but

input_arr = np.load('PXZ.EHZ.npy').astype(np.float64)
std = bn.move_std(input_arr, 121)
zeros = np.where(std == 0)
assert not len(zeros[0])

does not fail as there are no zeros in the std array.

bad_move.zip

kwgoodman · 2017-02-28T18:11:45Z

Hi there.

Can you paste a small segment of the big array? Perhaps 200 elements that contain a 121 element window that gives a std of zero.

d-chambers · 2017-02-28T18:53:41Z

Sure,
What is interesting though, as you can see in the notebook, if a smaller window is used it calculates the result more or less correctly. So when I perform the calculation on the window posted below it doesn’t return any 0s.

-2.5284750462
-6.2919397354
-1.3226305246
-4.7851758003
-3.4860665798
-10.1597480774
-0.3701832891
13.9742021561
6.5433959961
9.634223938
3.0750653744
-12.6853837967
-16.4847431183
-4.0147423744
7.9780387878
2.9790670872
6.7037324905
6.0012392998
-2.1859781742
-0.5457134247
-2.6305334568
-5.8820185661
-6.7868590355
-5.2417421341
7.3230443001
14.1062440872
13.3010320663
-7.6624073982
-19.7685203552
1.7452163696
0.0990015566
-2.9588396549
6.7365036011
4.5946087837
1.9612246752
-2.0452444553
-0.4105802476
0.5827489495
-5.1045341492
0.777156651
3.9382896423
-7.2001643181
-5.1068882942
6.7578649521
4.9806294441
3.4133441448
5.1996588707
-1.8428272009
-7.0068020821
-9.3145847321
-5.6427431107
3.9759426117
5.8565983772
7.0034828186
5.9576721191
-1.3350609541
-4.2324938774
-1.5859234333
-2.2288668156
-4.2464022636
-5.2180476189
-1.6682722569
5.5020709038
10.4394388199
7.052025795
-4.0992021561
-6.9343070984
-2.9708602428
0.5221287012
-0.689814508
-4.2820692062
0.5706662536
7.7491393089
0.2529479265
-1.6038390398
4.6071925163
0.3350660801
-4.9038901329
-2.1772441864
0.3203429878
-5.3912835121
0.9395498037
4.5799665451
2.1893494129
4.1454181671
-0.318448782
-2.8871643543
-4.2155075073
-0.9479162693
1.1749888659
2.3553519249
2.1778562069
-1.5604808331
1.935454011
-0.0063718222
-7.5537371635
-7.497610569
6.4680433273
14.7130641937
4.2097949982
-10.5383806229
-11.7676992416
-0.9247381091
5.4548268318
12.6308956146
5.8996295929
-11.7950487137
-11.125579834
1.5490256548
6.2535657883
-0.1072820574
0.792358458
7.2746787071
1.5382611752
-8.3699884415
-4.8096442223
0.2626971006
-1.2519851923
3.4611811638
5.8206295967
2.9740099907
0.1990164816
-6.9784965515
-7.3488740921
2.0996313095
4.6389508247
5.0002398491
-2.8655395508
-3.1748898029
3.862085104
-2.5001163483
-1.1817802191
6.3094935417
-0.5251377225
-10.1771764755
3.0875940323
3.9778904915
-2.3620967865
3.7947404385
2.8270330429
0.4795057774
-8.6607837677
-7.7135744095
6.3234648705
4.7404131889
1.6732549667
5.7153167725
-1.5910204649
-9.7333593369
-4.3945922852
4.1555552483
4.5729961395
2.9083378315
2.3797662258
-1.7646039724
-6.0924057961
-4.2171339989
1.7789765596
5.4465098381
0.6530832052
-3.2589774132
-0.5568723679
2.4662780762
8.0061149597
-0.5557681322
-15.4680337906
-3.8965210915
7.6350045204
4.0393557549
4.1909799576
3.2838480473
-1.5727992058
-4.0110783577
-3.5144693851
-8.7388019562
-3.2404100895
10.052942276
9.6923313141
5.9719820023
-4.6670451164
-7.6512498856
-1.4481172562
-3.516307354
-0.3272907436
2.1782839298
4.4382796288
4.0346212387
-5.1898503304
-9.1939210892
2.59324646
11.3465366364
4.2283205986
-3.2852146626
-7.3183469772
-1.5758877993
5.887509346
-0.4564013779
-6.5575261116
-4.3141489029
0.2136210501
3.84120965
5.5500483513
10.3696727753
4.7897620201
-9.656493187
-12.9593143463
-0.4358349442
2.6325674057
-8.4255027771
-0.7223656178
12.4292955399
16.2506198883
8.3799123764
-9.7949790955
-16.6930789948
-12.6975183487
-4.5630512238
3.2427294254
10.6418161392
14.281580925
10.4933614731
6.1766161919
-12.498128891
-26.3931293488
-5.6738448143
10.4533243179
2.6849167347
3.6418528557
13.0387125015
4.4636654854
-8.4340057373
-5.8683085442
-5.9987282753
-9.5630569458
5.0076861382
15.0639715195
6.5557694435
1.4748768806
-5.5807547569
-15.1983718872
-9.2029418945
8.4750299454
11.1006193161
2.8714213371
6.2090907097
1.6388434172
-12.5529975891
-8.2857303619
-7.8802065849
-3.5799911022
14.1643867493
14.4701957703
10.8143577576
1.8028203249
-12.4945497513
-18.0881004333
-11.1219739914
-2.3140103817
5.5928707123
20.4235877991
13.0771608353
-2.1964263916
-1.1900795698
-1.5465049744
-14.0739383698
-16.6487941742
0.1947135627
5.295147419
5.7168540955
12.7346935272
12.4139919281
-0.8161699772
-9.9928359985
-9.2182407379
-6.0655498505
-0.3915526271
2.9949274063
2.920861721
-0.1381985247
6.3467025757
4.5117349625
-2.6839256287
-2.6819007397
-5.9622364044
-4.4439377785
2.3083894253
4.4756808281
-0.7431480885
5.8435502052
4.8975563049
-4.7688260078
-1.6921800375
-5.5173192024
-6.3787221909
5.0028986931
1.7507245541
2.7408602238
8.6380777359
-0.7344329953
-10.08979702
-10.8039121628
7.0985417366
11.0808601379
-5.0774583817
1.0460007191
8.5950498581
-0.1733292639
-8.4080076218
-12.6891260147
-2.3999023438
7.407122612
5.8537073135
1.7318782806
5.0176405907
6.2909750938
-3.8044655323
-6.6019325256
-9.0147886276
-2.2527375221
1.9441357851
-2.5591933727
6.0285921097
11.2414560318
4.0100588799
-4.4588136673
-3.5896859169
-4.2076358795
-6.7087316513
-2.4861242771
6.6369481087
12.6313762665
3.2913250923
-9.1072063446
-11.2915477753
-4.3825678819
10.3941774368
10.9443969727
0.434533149
-3.4727339745
-12.7594032288
-16.9814491272
-3.2974574566
7.6645927429
4.2998857498
7.9698944092
17.899154663125
4.2738776207
2.6338267326
-0.0810393319
10.3941774368
10.9443969727
0.434533149
-3.4727339745
-12.7594032288
-16.9814491272
-3.2974574566
7.6645927429
4.2998857498
7.9698944092
17.8991546631

kwgoodman · 2017-02-28T18:58:59Z

I'm trying to get a small example that I can easily play with. So if you have something along the lines of

In [1]: a = np.random.rand(20)
In [2]: bn.move_std(a, 5)
Out[2]: 
array([        nan,         nan,         nan,         nan,  0.29160087,
        0.22065454,  0.21211704,  0.22380035,  0.15067442,  0.15432015,
        0.16127856,  0.15335015,  0.23545563,  0.25747258,  0.24552778,
        0.24936578,  0.24384096,  0.19625498,  0.22205896,  0.25267315])
In [3]: a
Out[3]: 
array([ 0.08338058,  0.24385437,  0.57172368,  0.9237718 ,  0.56234984,
        0.45308886,  0.27521289,  0.73293583,  0.56480881,  0.60684832,
        0.39607517,  0.84789734,  0.13896913,  0.22843558,  0.46661061,
        0.53213284,  0.83079184,  0.41385475,  0.13899303,  0.17730305])

Then I can copy and paste and play.

kwgoodman · 2017-02-28T19:10:58Z

Oh, wait, so the array snippet you gave doesn't reproduce the problem? Yeah, there is a path dependence. So the starting point can matter.

d-chambers · 2017-02-28T19:11:45Z

Unfortunately I haven’t been able to reproduce it on a small dataset

kwgoodman · 2017-02-28T21:06:42Z

I can reproduce the zero std with float32. First create an array:

In [1]: rs = np.random.RandomState(10)
In [2]: a = rs.rand(100000000)
In [3]: a[a < 0.05] = 100

No problems with float64:

In [4]: np.where(bn.move_std(a, 121) == 0)
Out[4]: (array([], dtype=int64),)

But problems with float32:

In [5]: np.where(bn.move_std(a.astype(np.float32), 121) == 0)
Out[5]: 
(array([16085106, 16085107, 16085119, 17870732, 17870737, 17870739,
        18208491, 25424792, 25445994, 25445995, 25445997, 25445998,
        25548672, 25548674, 25548679, 25548680, 25636840, 25636869,
        25654926, 27905892, 27905893, 27905898, 27905899, 28106880,
        28106881, 28106882, 28106885, 28106889, 28106890, 28106891,
        28106895, 28106896, 28106897, 28106898, 28283068, 32105056,
        32105057, 32105087, 32448657, 32448665, 33170366, 33170367,
        33170370, 33170371, 33170373, 33170376, 33170377]),)

Difference in std:

In [6]: bn.move_std(a, 121)[16085106]
Out[6]: 0.24272340779335716
In [7]: bn.move_std(a.astype(np.float32), 121)[16085106]
Out[7]: 0.0

It might help if I could create a smaller example. But right now I've got a headache from caulking the bathroom sink. That stuff is vile.

d-chambers · 2017-02-28T21:20:15Z

Ok, by line 3 it looks like it has something to do with the spikes in the data? Interesting.

kwgoodman · 2017-02-28T21:34:47Z

I suspect that all this is related to #99 and the corresponding PR.

I found a smaller example:

In [4]: for i in range(1000000):
   ...:     rs = np.random.RandomState(i)
   ...:     a = rs.rand(10)
   ...:     a[a < rs.rand()] = 100
   ...:     w32 = np.where(bn.move_std(a.astype(np.float32), 3)==0)[0]
   ...:     w64 = np.where(bn.move_std(a, 3)==0)[0]
   ...:     if w32.size > w64.size:
   ...:         print i
   ...:         print a
   ...:         print w32
   ...:         print w64
   ...:         break
   ...:     
1
[ 100.            0.72032449  100.          100.          100.          100.
  100.          100.          100.            0.53881673]
[4 5 6 7 8]
[]
In [5]: a
Out[5]: 
array([ 100.        ,    0.72032449,  100.        ,  100.        ,
        100.        ,  100.        ,  100.        ,  100.        ,
        100.        ,    0.53881673])
In [6]: bn.move_std(a.astype(np.float64),window=3, min_count=1)
Out[6]: 
array([  0.00000000e+00,   4.96398378e+01,   4.68008879e+01,
         4.68008879e+01,   5.50604123e-07,   5.50604123e-07,
         5.50604123e-07,   5.50604123e-07,   5.50604123e-07,
         4.68864514e+01])
In [7]: bn.move_std(a.astype(np.float32),window=3, min_count=1)
Out[7]: 
array([  0.        ,  49.63983917,  46.80088806,  46.80088806,
         0.        ,   0.        ,   0.        ,   0.        ,
         0.        ,  46.88644791], dtype=float32)

The zeros probably come from this line in the move_std C code:

                if (assqdm < 0) {
                    assqdm = 0;
                }

which I think is to prevent negative std due to round-off error.

kwgoodman · 2017-02-28T21:47:11Z

The change was made to handle this situation:

In [10]: a = np.array([1, 2, 3]) + 1e9
In [11]: bn.move_std(a, 2)
Out[11]: array([ nan,  0.5,  0.5])

That used to not work before.

matteodefelice · 2017-03-31T20:45:28Z

This issue is possibly related with this one: pydata/xarray#1346

adament · 2017-07-05T11:46:03Z

@kwgoodman I am not sure about the smaller example presented in your next to last post. If I understand the code correctly you run a 3 element window over the data which has 7 repeated 100. Thus there should be 5 cases where the window consists entirely of 100s and thus has a sample standard deviation of 0. In this case the float64 example seems to suffer from some residual value from the earlier computations following over into the later windows and thus does not give 0.0 exactly. I am by no means an expert on floating points but 5.50604123e-07**2 ≅ 3.0e-13 which I think is on the border of the precision of 64-bit floating points, thus probably as good as can be expected from the variance computation.

But a fix for the real issue in the earlier example could be to do the calculations in a higher precision, i.e. using float64 for delta, amean and assqdm and just output as float32? This would probably lead to a slowdown for float32 computations.

kwgoodman · 2017-07-05T14:21:29Z

@adament Keeping the intermediate values in float64 sounds like a great idea. I'll keep it on my list of things to test.

kwgoodman · 2017-07-05T14:39:27Z

@adament In order to test the idea of using float64 intermediate values, I need a unit test that demonstrates the problem. Does anyone have one? It may take me a while to get around to testing this so anyone is welcome to give it a try.

adament · 2017-07-05T14:56:58Z

I think this is a small illustrative example (though not minimal):

a = np.empty(10)
a[:5] = 1.0e9
a[5::2] = 1.0
a[6::2] = 0.0
bn.move_std(a, 3)
bn.move_std(a.astype(np.float32), 3)

In this case neither the 64-bit nor the 32-bit floating point computation is correct in the end. As there is too large a residual from the earlier large values.

kwgoodman · 2017-07-05T15:03:07Z

If it works for neither 32 nor 64 bit then I don't think I can use it for a unit test (?)

adament · 2017-07-17T17:12:27Z

@kwgoodman I am sorry for the late reply, you can fix the test case such that it works for the 64-bit case by making it a bit less extreme:

a = np.empty(10)
a[:5] = 1.0e6
a[5::2] = 1.0
a[6::2] = 0.0
bn.move_std(a, 3)
bn.move_std(a.astype(np.float32), 3)
bn.move_std(a, 3) - np.asarray([np.nan, np.nan] + [[i:i+3].std() for i in range(8)])

Then the error of the 64-bit computation compared to the non-rolling computation is on the order of 1e-5 which is about 4 digits agreement. I suspect it will be hard to improve significantly on this while retaining a moving window method of computation that accumulates precision error.

jkadbear · 2022-03-18T03:14:58Z

@adament In order to test the idea of using float64 intermediate values, I need a unit test that demonstrates the problem. Does anyone have one? It may take me a while to get around to testing this so anyone is welcome to give it a try.

Is the idea of "using float64 intermediate values" still in the TODO list? I have tested this idea and it actually improves the precision problem when using float32. And it only requires few lines of code modification. For example:

bottleneck/bottleneck/src/move_template.c

Line 159 in b8f4c52

npy_DTYPE0 asum, ai, aold, count_inv;

Just change it to npy_float64 asum, ai, aold, count_inv;.

rdbisme · 2022-03-19T07:46:45Z

@adament In order to test the idea of using float64 intermediate values, I need a unit test that demonstrates the problem. Does anyone have one? It may take me a while to get around to testing this so anyone is welcome to give it a try.

Is the idea of "using float64 intermediate values" still in the TODO list? I have tested this idea and it actually improves the precision problem when using float32. And it only requires few lines of code modification. For example:

bottleneck/bottleneck/src/move_template.c

Line 159 in b8f4c52

npy_DTYPE0 asum, ai, aold, count_inv;

Just change it to npy_float64 asum, ai, aold, count_inv;.

Hello @jkadbear, more information on the current state of the project can be found in the latest messages here: #388. TL,DR: the state of the master branch was partially incomplete, therefore we reverted to what it was at the latest release and applied some bugfixes, especially for build, in order to allow the package to be installable again with new versions of Python. The previous state is in develop branch.

PR accepted for this feature :)

jkadbear · 2022-03-19T13:46:30Z

PR is made. #407.

d-chambers mentioned this issue Feb 28, 2017

OpenCv vs scipy.fftconvolve, is it really worth it? eqcorrscan/EQcorrscan#80

Closed

calum-chamberlain mentioned this issue May 22, 2017

Update internals to vectorised SciPy eqcorrscan/EQcorrscan#94

Merged

6 tasks

d-chambers mentioned this issue Jun 9, 2017

Subspace Freq domain statistic eqcorrscan/EQcorrscan#92

Merged

This was referenced Feb 13, 2019

bottleneck : Wrong mean for float32 array pydata/xarray#1346

Open

Wrong result for float32 Series when using bottleneck pandas-dev/pandas#25307

Closed

qwhelan added this to the 1.4.0 milestone Sep 23, 2019

rdbisme mentioned this issue Feb 23, 2024

[BUG] bottleneck gives erroneous standard deviation in Pandas with float32 array. #443

Open

rdbisme mentioned this issue Oct 18, 2024

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behavior for bn.move_std with float32 array #164

Unexpected behavior for bn.move_std with float32 array #164

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

matteodefelice commented Mar 31, 2017

adament commented Jul 5, 2017 •

edited

Loading

kwgoodman commented Jul 5, 2017

kwgoodman commented Jul 5, 2017

adament commented Jul 5, 2017

kwgoodman commented Jul 5, 2017

adament commented Jul 17, 2017

jkadbear commented Mar 18, 2022

rdbisme commented Mar 19, 2022

jkadbear commented Mar 19, 2022

Unexpected behavior for bn.move_std with float32 array #164

Unexpected behavior for bn.move_std with float32 array #164

Comments

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

d-chambers commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

kwgoodman commented Feb 28, 2017

matteodefelice commented Mar 31, 2017

adament commented Jul 5, 2017 • edited Loading

kwgoodman commented Jul 5, 2017

kwgoodman commented Jul 5, 2017

adament commented Jul 5, 2017

kwgoodman commented Jul 5, 2017

adament commented Jul 17, 2017

jkadbear commented Mar 18, 2022

rdbisme commented Mar 19, 2022

jkadbear commented Mar 19, 2022

adament commented Jul 5, 2017 •

edited

Loading