-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: add statistical tests between histograms #503
base: main
Are you sure you want to change the base?
Conversation
I think so far all 4 implemented tests have a decent core functionality. I'm saying this by running toys and looking at the pvalue distributions because they should be uniform(0, 1) under the null hypothesis. |
@nsmith- I don't think having an option to consider flow bins in ks_1samp tests makes sense right? |
I'm thinking having |
Hmmm, for chi2 tests it does. You are including the observed number of counts in the flow bins in your tests statistic and comparing vs the expected number of counts (in one sample tests) or with the observed number of counts of another histogram (in two sample tests). |
By the way, pre-commit is complaining for "Too many public methods (23/20)" in basehist. |
@fabriceMUKARAGE Why did you change it from draft? |
@iasonkrom. oooh, that was a mistake I made when checking the unsuccessful checks. My bad, can we bring it back to the draft? |
Yup it's back to draft. I was just scared about an accidental merge. Don't worry about failing pytests though. I'm changing things in the statistics and therefore the tests fail if I don't update the values there. But the "real" tests are ran offline where I run toy models to see if my statistics are correct. Proper pytests will be written later since I'm tired of chaging them all the time to pass |
I see it's back to draft mode now. That's true, the proper pytests can be done later, thanks for the insights. |
Added pytest.importorskip("mplhep")
Added pytest.importorskip("mplhep")
) | ||
|
||
if mode == "exact": | ||
success, d, pvalue = _attempt_exact_2kssamp(n1, n2, g, d, alternative) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this really worries me. Using a hidden object is a really bad idea - see copier-org/copier#1225 for the most recent time I've seen someone broken by this! Is it possible to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean the function written by scipy, I was trying to avoid writing too many lines since after defining the test statistic from the histograms, performing the test is the same. I could write a similar function and have it here in hist but it would be mainly done by copying code from scipy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added the code in the stats.py
file instead of importing from scipy
. Do you think this is resolved?
I'm starting an early PR to add statistical tests to histograms after this feature was requested in #500 and I actually believe it is a good and useful feature.
This is an early WIP PR but I'm starting it now in order to make it publicly visible and ask for feedback on the statistics.