Assess the performance of HGCAL's clustering algorithm (CLUE) with testbeam data and simulation. Working under CMSSW 11_1_0_pre2 release.
- Electromagnetic showers' studies completed and first draft of the Detector's Note ready for submission
- Now starting to replicate the same studies using hadronic showers
1) selection stage: the original NTuples are pruned, in order to keep the relevant information only
2) analysis stage:
CLUE is run over the pruned NTuples
most of the quantities of interest are calculated and stored in
3) residual analysis and plotting stage:
fits, histogram manipulation and dataframe operations are performed
quantities of interest are plotted using BokehPlot, a custom bokeh wrapper (under development, but already capable of doing the most common plotting operations)
Steps 1) and 2) were chained with a Directed Acyclic Graph (DAG) that runs within HTCondor.
Electromagnetic showers
(HGCAL only) -
sim_proton (with proton contamination):
Hadronic showers (there are no simulations available without proton contamination)
Step #3 was divided into independent micro-analysis:
hit-level: energy distributions, calibrations, responses, resolutions
layer-level: CLUE densities and distances (1D and 2D), fractions of clusterized hits and energies
cluster-level: number of hits, energies, number of clusters, 'x' and 'y' position of the clusters in the detector
- spatial resolution: it is possible to summarize its information across multiple tags (datasets with different conditions)
: everything related to submitting jobs to the grid-
: creates all the required DAG submission files. It takes as input theCondorJobs/ntuple_ids.txt
file which lists all the run numbers available and was generated with a combination ofls
(applied to the folder with the input Ntuples) andawk
. This file is used in combination with astd::map
stored inCondorJobs/interface/run_en_map.h
which pairs run numbers with incident beam energy in GeV. -
: used by the jobs to run step #1 and #2 -
: very simple utility that cleans the output files of the jobs once they are not needed -
: writes a file namedntuple_ids.txt
which contains the identifiers of the data ntuples to be considered for the electromagnetic or hadronic analysis
: everything related to running CLUE and extract its relevant quantities-
and some other files ininterface/
: the CLUE standalone algorithm -
: calculation of all the quantities of interested from the results obtained by CLUE -
: class that manages step #1 -
: executable that runs step #1 -
: class that manages step #2 -
: executable that runs step #2 -
: utility that allows looping over containers by index -
: run the hit-level analysis type -
: run the layer-level analysis type -
: run the cluster-level analysis type -
: summaryze cluster spatial resolution related quantities for different tags
: a CMSSW subpackage to create simulation files. This analysis framework can then be applied both to testbeam data and to CMS simulated data, making comparisons possible. Simulated data is converted into flat Ntuples, so that it can be treated in the exact same way as testbeam data.
The macros were written having a particular user in mind, but extremely simple and straightforward adaptations can make it work for other users as well, since the code is reasonably abstract. In particular, running everything over a new dataset should be easy.
If the user wants to process the sim_proton
dataset with electromagnetic showers, he/she should do the following:
- Produce DAG files
write_dag --datatype sim_proton --showertype em --tag <anything>
where --tag
is used for identifying a particular data production step, and should ideally indicate the conditions the data was produced; the data will be stored in a folder named after the tag. Given that the selection stage is seen as something general, the --tag
only affects the analysis step.
WARNING: If the same tag is specified more than once, the files will be written in the same folder. If the
are also the same, the files will be rewritten, and the old ones lost.
For hadronic showers, --showertype had
is the option to use.
If only the analysis step is required, one can do
write_dag --datatype sim_proton --showertype em --tag <anything> --last_step_only
- Run the jobs (the submission files will be stored under
condor_submit_dag CondorJobs/clue_sim_proton_em_sometag.dag
NOTE: When the file has already been run Condor automatically uses its rescue files, i.e., tries to run only the jobs that did not suceed in previous attempts. To remove all previous files, including job outputs, use
bash CondorJobs/
(or manually remove the files).
- Join the output files according to their beam energy
bash DataProcessing/ --datatype sim_proton --showertype em --analysistype layerdep --tag <anything>
bash DataProcessing/ --datatype sim_proton --showertype em --analysistype clusterdep --tag <anything>
The outputs are currently being stored under /eos/user/<first username letter>/<username>/TestBeamReconstruction/job_output/
. Please create the required folders if needed. Under /job_output/
the files are stored in the hit_dependent/
, layer_dependent/
and cluster_dependent/
There is no need to join the data of the hit-level analysis type, since they are .csv
files joined by the pandas
package. The two other types are instead in ROOT
format and are read by uproot
Possible improvement: one could potentially change the way uproot
reads the files so that it iterates through them (it is potentially faster). This joining step would then become unnecessary.
- Run the python analysis and plotting macros
python DataProcessing/python/ --datatype sim_proton --showertype em --tag <anything> #hit level
python DataProcessing/python/ --datatype sim_proton --showertype em --tag <anything> --all #layer level
python DataProcessing/python/ --datatype sim_proton --showertype em --tag <anything> --all #cluster level
If one needs to rerun the plotting stage for
simply due to plot formatting or an additional cut, the option --use_saved_data
can be added, effectively speeding-up the macro by reusing the previously stored dataset.
To summarize the X or Y spatial resolution information of multiple tags, calculated by the
macro, an additional plotting macro is available:
python DataProcessing/python/ --datatype data --showertype em --var dx
where the tags to be used have to be specified manually in the macro.
Please run the scripts with the --help
option for more information, including running the last step only for a subset of final variables (this can be done at layer-level and cluster-level).
Plots should be publicly accessible.
Please use CERN Phonebook's details.