-
Notifications
You must be signed in to change notification settings - Fork 0
Data Processing
This page serves to provide a set of instructions for running the analysis chain, as well as provide information in order to customize the analysis chain for a specific detector setup.
The raw data is unpacked with the unpackRaw.py
script. Run ./unpackRaw.py --help
to see a list of available options.
The unpacked data is converted into hits using the hitReco.py
script. Run ./hitReco.py
to see a list of available options.
The default hit selection criteria is extremely basic. A more selective criteria should be implemented for your specific setup.
The reconstructed data can then be used for data analysis.
The event data is stored in the event_data
array inside the reco file:
recoFile = np.load(recoFileName, allow_pickle=True)
eventData = recoFile['event_data']
You can loop over this array and access each event's data with the typical python loop syntax:
for event in eventArray:
# process event here
Each event consists of a series of 'hits'. A hit is a single cell that passes some criteria defined in 'hitReco.py'. Hits are supposed to represent the positions in the detector where some particle has hit the sensor and deposited energy. Each hit has a number of properties describing the position, energy deposition, and other parameters. Inside an event, the properties of the hits are stored in separate arrays.
As an example, consider an event with 10 hits. This event will contain a number of elements. Two elements, event
and nhits
, are single numbers representing the event number and the number of hits in the event, respectively. We can access these elements like they are dictionary elements:
>>> event = eventArray[0] # take the first event, here we assume it contains 10 hits
>>> event['event']
1
>>> event['nhits']
10
The properties of the 10 hits are stored in arrays prefixed with hit_
. All available arrays can be seen in hitReco.py
. Let's print the x-positions of each of the hits:
>>> event['hit_x']
array([0., -1., 1., 2., 2., -1., -2., 1., 0., 0.])
and the low gain ADC waveforms (note there are 11 time samples per waveform, so this is a 10x11 array:
>>> event['hit_adc_lg']
array([[105., 108., 112., 110., 102., 100., 100., 100., 100., 100.],
[98., 120., 125., 109., 104., 98., 98., 97., 98., 99. ],
...
[96., 112., 122., 119., 102., 98., 98., 101., 99., 100.]])
Analysis on the hits can be performed in two ways. You can use full-array manipulation (thanks to NumPy), or you can loop over the hits individually. As an example, we calculate the average hit radii relative to the coordinates (x=-2,y=3,)
both ways:
# loop method
mean_hit_radius_loopmethod = []
for hit in range(nhits):
hit_radius = np.sqrt((event['hit_x'][hit] - (-2))**2 + (event['hit_y'][hit] - 3)**2)
mean_hit_radius_loopmethod += hit_radius
mean_hit_radius_loopmethod /= nhits
# array manipulation method
mean_hit_radius_arraymethod = np.mean(np.sqrt(np.power(event['hit_x'] - (-2), 2) + np.power(event['hit_y'] - 3, 2)))
Note that this is a slightly weird calculation to make, since the hits are in different layers. So, in effect, we are calculating the average hit distance from the line going down the z axis at the coordinates (x=-2,y=3)
.
In general, the array manipulation method is much preferred for its speed advantages. The loop method is much slower, but it may be the only option for some types of calculations.
An example data analyzer exampleAnalyzer.py
is provided. This produces a 2-dimensional histogram of the low gain ADC waveforms (after common mode subtraction) using all events in data/reco/run1.reco.npz
. For a 1000 event run on a dummy sensor, the output looks like: