-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hypothesis testing to check the implementation #66
Comments
Well, once you try to do exactly what you suggest here, will find out the following truths:
Of course, feel free to perform PRs that do "proper hypothesis testing" in the test suit if you'd like. But the process is not as trivial as you would imagine. But for me this is not at all a priority. TimeseriesSurrogates is about making surrogate timeseries. What should be tested is whether the surrogates satisfy the defining properties, not what you should do with them after you have them. For example, the fourier surrogates should retain the spectrum, etc. To give you an example: the Distributions.jl package tests whether the distributions satisfy the defining properties, not whether the e.g. |
The issue of subjectivity
Yes, the problem of choosing a suitable discriminatory statistic is not trivial and will involve subjectivity to some extent. The same applied to decide on a threshold for rejection of the null hypothesis. These choices will be context-dependent (are the data direct measurements or proxy measurements, noise/signal ratios) and system-dependent (systems are sensitive to various discriminatory statistics to varying degrees).
The latter statement is why I haven't made anything but the most basic tests already. One obvious problem is that some of the methods require you to also choose values for method parameters. The methods require careful tuning of these parameters for the particular time series you are working with. For example:
These parameters must be tuned to the particular time series you're working with, and it is not all obvious to me how to tune them given a particular time series to achieve surrogates that behave the way I desire them to. For the random shuffle and Fourier-based methods, that is not so much of a problem, because they are parameter-free. In fact, we already test for the basic assumptions of these surrogates.
Making more complicated testsIf there is a particular example that can verify that the methods work the way they are supposed to - as shown by the authors - then it would be nice to add that to the test suites. However, this is not unit testing per se, because what we're then doing is replicating the papers, given the subjective choices of the authors, not verifying that the implementations here actually do what they are supposed to (which is covered by the permutation tests already included where relevant). That would almost be to hypothesis test the hypothesis tests of the original authors 😁
What you propose, @felixcremer, is absolutely possible for the Fourier methods. To test that the linear properties of the original signal are produced for the If you have good examples, where we have well-reasoned choices for the discriminatory statistic threshold (for Fourier methods: difference in autocorrelation at a particular lag), then feel free to add PRs. However, the tests should not be so restrictive that the package fails CI because we made too strict subjective choices for our replicate-original-paper tests. A funny note: the statement "The ACF of the original time series coincides with the ACF of the iAAFT surrogate and the one of the TS" is used in the original twin surrogate paper to verify that the twin surrogates (TS) also preserve linear properties like AAFT. In other words, the original authors of that paper judge this visually from a plot, without any hypothesis test. If it is good enough for publication, it is good enough for our package, I guess 🤖 In summary, what we want to test is that the implementations work as described in the original papers, not that methods themselves are valid approaches to solving a set of problems, by applying our own hypothesis tests. As @Datseris, I've been having trouble replicating some of the original papers. That may be because I'm failing to interpret steps in their algorithm correctly, or because the surrogate methods are flawed themselves. Until I can do a relatively systematic study on each of them, I will not include new methods. The "passing test" would be to replicate the original papers. For the Fourier-based methods, visually checking that the autocorrelation functions align between original time series and surrogates satisfies this criterion for me. That is, again, a subjective choice. If that can be partly remedied by some good objective-ish numerical procedure, I won't object 😄 |
To continue the testing discussion from #12.
To provide better tests we can use the hypothesis testing which the surrogates are supposed to do in the testcases.
The idea would be to have for every surrogate Method one time series which will adhere to the null hypothesis and another time series which would reject the null hypothesis. This way, we would have tested, that the surrogates are doing what they should.
To test the null hypothesis which the surrogate supports we could compare the autocorrelation between the original data and an ensemble of surrogates at least for some of the surrogate methods.
What do you see as subjective with the hypothesis testing?
The text was updated successfully, but these errors were encountered: