Change std outlier condition to be multiple of median std #960

maxnoe · 2022-04-04T14:12:24Z

The previous condition of a three sigma interval over the pixels
was very restrictive and removed too many valid pixels.

Instead of the interval
[median_of_std - 3 * std_of_std, median_of_std + 3 * std_of_std]
we now use the interval
[1/3 * median_of_std, 3 * median_of_std] by default.

The previous condition of a three sigma interval over the pixels was very restrictive and removed too many valid pixels. Instead of the interval [median_of_std - 3 * std_of_std, median_of_std + 3 * std_of_std] we now use the interval [1/3 * median_of_std, 3 * median_of_std] by default.

codecov · 2022-04-04T14:35:53Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +10.95 🎉

Comparison is base (ced74ac) 74.49% compared to head (c2e2cce) 85.45%.

❗ Current head c2e2cce differs from pull request most recent head d04522b. Consider uploading reports for the commit d04522b to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #960       +/-   ##
===========================================
+ Coverage   74.49%   85.45%   +10.95%     
===========================================
  Files         124       78       -46     
  Lines       12211     6457     -5754     
===========================================
- Hits         9097     5518     -3579     
+ Misses       3114      939     -2175

Impacted Files	Coverage Δ
lstchain/calib/camera/flatfield.py	`96.22% <100.00%> (-0.39%)`	⬇️
lstchain/calib/camera/pedestals.py	`95.60% <100.00%> (-0.52%)`	⬇️

... and 91 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

rlopezcoto · 2022-04-05T07:39:07Z

lstchain/calib/camera/pedestals.py

@@ -34,13 +34,13 @@ class PedestalIntegrator(PedestalCalculator):

     """
    charge_median_cut_outliers = List(
-        [-3, 3],
+        [-4, 4],


why is this changed 3->4?

Even if we assume that all pixels follow a common distribution, cutting an interval of 3 sigma will always remove ~5 pixels erroneously. That's way to harsh.

With 4 sigma, the expected number of erroneously rejected pixels is 0.1:

In [10]: def expected_rejected_pixels(a, b): ...: return (1 - (norm.cdf(b) - norm.cdf(a))) * 1855 ...: In [11]: expected_rejected_pixels(-3, 3) Out[11]: 5.008121697347684 In [12]: expected_rejected_pixels(-4, 4) Out[12]: 0.11750030720087512

maxnoe · 2022-07-06T11:20:05Z

I now added some checks on how many values are ignored in the sigma clipping. Output for a run with issues 8835 from 2022-06-21 (as identified by @FrancaCassol):

Found large number of outliers in pedestal sample:
Outliers: 17 events with more than 10 pixels, 14454 total outliers
Bad Index: [ 504 2283 2965 3254 3558 4532 4977 5108 6422 6429 6879 6941 6942 7738
 8169 8345 9143]
Bad gain: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
Bad Pixel Counts: [ 21 219 282  23  13  14 217  74  29  11 840  61 323  76  11 110 686]
Found large number of outliers in pedestal sample:
Outliers: 10 events with more than 50 pixels, 14454 total outliers
Bad Index: [2283 2965 4977 5108 6879 6941 6942 7738 8345 9143]
Bad gain: [1 1 1 1 1 1 1 1 1 1]
Bad Pixel Counts: [219 282 217  74 840  61 323  76 110 686]
Found large number of outliers in flatfield sample:
Outliers: 16 events with more than 10 pixels, 16262 total outliers
Bad Index: [1416 1416 3275 3645 3645 3756 3756 3798 3798 6326 6406 8917 8917 9381
 9381 9839]
Bad gain: [0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0]
Bad Pixel Counts: [ 575  182  137  456  186 1470 1133 1855 1855   11   16 1855 1855   95
   24   15]
Found large number of outliers in flatfield sample:
Outliers: 12 events with more than 50 pixels, 16262 total outliers
Bad Index: [1416 1416 3275 3645 3645 3756 3756 3798 3798 8917 8917 9381]
Bad gain: [0 1 1 0 1 0 1 0 1 0 1 0]
Bad Pixel Counts: [ 575  182  137  456  186 1470 1133 1855 1855 1855 1855   95]

@FrancaCassol manually identified at least 5 bad events, which matches the 5 combinations where both high gain and low gain have more then 50 outlier pixels in the same event.

This is currently the output for a "good" run, 8055 from 2022-05-01:

Found large number of outliers in pedestal sample:
Outliers: 7 events with more than 10 pixels, 12697 total outliers
Bad Index: [ 194  200 1502 3202 3885 6889 8562]
Bad gain: [1 1 1 1 1 1 1]
Bad Pixel Counts: [ 14  24  35  83 226  13  83]
Found large number of outliers in pedestal sample:
Outliers: 3 events with more than 50 pixels, 12697 total outliers
Bad Index: [3202 3885 8562]
Bad gain: [1 1 1]

rlopezcoto · 2023-05-24T06:48:36Z

@FrancaCassol could you check if you agree with the modification of the limits so we can get this merged, or is this not needed anymore after all the modifications in the (updated branch)[https://github.com/cta-observatory/cta-lstchain/tree/ctapipe_0.17]?
Maybe we can change the target branch as well instead of the master one

FrancaCassol · 2023-05-24T07:06:53Z

mmh, I agree that a 3 sigma cut is too strong (this cut is indeed practically disabled in our config where it is put to 10 sigma), but I would change the number of default sigmas (e.g to 4 or even to 5 sigma), not the cut criteria

maxnoe requested a review from FrancaCassol April 4, 2022 14:14

rlopezcoto reviewed Apr 5, 2022

View reviewed changes

Merge branch 'master' into outlier_conditions

d04522b

rlopezcoto changed the base branch from master to ctapipe_0.17 May 24, 2023 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change std outlier condition to be multiple of median std #960

Change std outlier condition to be multiple of median std #960

maxnoe commented Apr 4, 2022

codecov bot commented Apr 4, 2022 •

edited

Loading

rlopezcoto Apr 5, 2022

maxnoe Apr 5, 2022

maxnoe commented Jul 6, 2022

rlopezcoto commented May 24, 2023

FrancaCassol commented May 24, 2023

Change std outlier condition to be multiple of median std #960

Are you sure you want to change the base?

Change std outlier condition to be multiple of median std #960

Conversation

maxnoe commented Apr 4, 2022

codecov bot commented Apr 4, 2022 • edited Loading

Codecov Report

rlopezcoto Apr 5, 2022

Choose a reason for hiding this comment

maxnoe Apr 5, 2022

Choose a reason for hiding this comment

maxnoe commented Jul 6, 2022

rlopezcoto commented May 24, 2023

FrancaCassol commented May 24, 2023

codecov bot commented Apr 4, 2022 •

edited

Loading