You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tests passed previously (eg in #370), and since we're now seeing the RNG before every test function execution, the deviation must be coming from the unerlying blas implementation and/or some interaction with the hardware on which it's deploying.
I've been seeing similar weirdness in other packages (pescador, librosa) lately, and there are some strange things happening with openblas and xsimd that expose the non-associativity of floating point arithmetic in cases like this.
I propose to "fixing" this by detecting the execution platform and raising the atol parameter on OSX deployments in test_separation.py. Since the separation metrics are in decibels, I don't think we should be too concerned about raising the tolerance from 0.01dB to 0.05dB, and keeping the stricter tolerance on better-behaved platforms should keep us safe.
Meanwhile, I don't think separation test failures on osx should be a blocker to merging unrelated PRs (eg #374).
The text was updated successfully, but these errors were encountered:
Looks like we're hitting numerical precision issues again, sometimes, in the source separation tests on OSX. See https://github.com/craffel/mir_eval/actions/runs/8426164125/job/23106646394?pr=374
The tests passed previously (eg in #370), and since we're now seeing the RNG before every test function execution, the deviation must be coming from the unerlying blas implementation and/or some interaction with the hardware on which it's deploying.
I've been seeing similar weirdness in other packages (pescador, librosa) lately, and there are some strange things happening with openblas and xsimd that expose the non-associativity of floating point arithmetic in cases like this.
I propose to "fixing" this by detecting the execution platform and raising the
atol
parameter on OSX deployments intest_separation.py
. Since the separation metrics are in decibels, I don't think we should be too concerned about raising the tolerance from 0.01dB to 0.05dB, and keeping the stricter tolerance on better-behaved platforms should keep us safe.Meanwhile, I don't think separation test failures on osx should be a blocker to merging unrelated PRs (eg #374).
The text was updated successfully, but these errors were encountered: