-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change std outlier condition to be multiple of median std #960
base: ctapipe_0.17
Are you sure you want to change the base?
Conversation
The previous condition of a three sigma interval over the pixels was very restrictive and removed too many valid pixels. Instead of the interval [median_of_std - 3 * std_of_std, median_of_std + 3 * std_of_std] we now use the interval [1/3 * median_of_std, 3 * median_of_std] by default.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #960 +/- ##
===========================================
+ Coverage 74.49% 85.45% +10.95%
===========================================
Files 124 78 -46
Lines 12211 6457 -5754
===========================================
- Hits 9097 5518 -3579
+ Misses 3114 939 -2175
☔ View full report in Codecov by Sentry. |
@@ -34,13 +34,13 @@ class PedestalIntegrator(PedestalCalculator): | |||
|
|||
""" | |||
charge_median_cut_outliers = List( | |||
[-3, 3], | |||
[-4, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this changed 3->4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we assume that all pixels follow a common distribution, cutting an interval of 3 sigma will always remove ~5 pixels erroneously. That's way to harsh.
With 4 sigma, the expected number of erroneously rejected pixels is 0.1:
In [10]: def expected_rejected_pixels(a, b):
...: return (1 - (norm.cdf(b) - norm.cdf(a))) * 1855
...:
In [11]: expected_rejected_pixels(-3, 3)
Out[11]: 5.008121697347684
In [12]: expected_rejected_pixels(-4, 4)
Out[12]: 0.11750030720087512
I now added some checks on how many values are ignored in the sigma clipping. Output for a run with issues 8835 from 2022-06-21 (as identified by @FrancaCassol):
@FrancaCassol manually identified at least 5 bad events, which matches the 5 combinations where both high gain and low gain have more then 50 outlier pixels in the same event. This is currently the output for a "good" run, 8055 from 2022-05-01:
|
@FrancaCassol could you check if you agree with the modification of the limits so we can get this merged, or is this not needed anymore after all the modifications in the (updated branch)[https://github.com/cta-observatory/cta-lstchain/tree/ctapipe_0.17]? |
mmh, I agree that a 3 sigma cut is too strong (this cut is indeed practically disabled in our config where it is put to 10 sigma), but I would change the number of default sigmas (e.g to 4 or even to 5 sigma), not the cut criteria |
The previous condition of a three sigma interval over the pixels
was very restrictive and removed too many valid pixels.
Instead of the interval
[median_of_std - 3 * std_of_std, median_of_std + 3 * std_of_std]
we now use the interval
[1/3 * median_of_std, 3 * median_of_std] by default.