Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

mmasciov · 2025-01-15T11:50:25Z

PR description:

This PR aims at speeding up the mkFit (input) converters and the mkFit producer, targeting HLT in 2025:
- RecoTracker/MkFit/plugins/MkFit*HitConverter.cc and convertHits.h: save mkFit layer index per hit to be used for a faster MkFitEventOfHitsProducer; use more variables available for a given Det to avoid their recomputation per hit (including hit local->global conversion); access clusters from the reference collection by index directly instead of dereferencing an OmniClusterRef
- RecoTracker/MkFit/plugins/MkFitEventOfHitsProducer.cc uses pre-computed per-hit layer index
- RecoTracker/MkFit/plugins/MkFitProducer.cc (and customization functions) : disable (by config) additional/redundant re-check of the input cluster charge (already applied during the hit creation); this should work OK for an mkFit setup where CCC is the same in all iterations
- RecoTracker/MkFitCore/interface/Hit.h : vdt, replace (slow) std::hypot with explicit sqrt(x*x+y*y); use SMatrixSym33 for direct covariance storage to avoid conversion from SVector6
- RecoTracker/MkFitCore/src/MkFinder.cc, RecoTracker/MkFitCore/src/MkFitter.cc, RecoTracker/MkFitCore/src/PropagationMPlex*.cc, RecoTracker/MkFitCore/src/Track.cc vdt, replace (slow) std::hypot with explicit sqrt(x*x+y*y)

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules:

MkFitEventOfHitsProducer : -40%
MkFitSiStripHitConverter: -35%
MkFitSiPixelHitConverter: -40%
MkFitProducer: -14% (7% is the faster math, 7% from not updating the cluster mask with CCC)

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

PR validation:

This PR was validated using both offline and HLT configurations.
For offline: http://uaf-10.t2.ucsd.edu/~mmasciov/MIC/HLTTracking/offlineMTV_TTbarPU_2024_PR155/
--> Only differences are at the level of fluctuations, consistently with the proposed changes.

FYI: @cms-sw/tracking-pog-l2, @slava77, @mtosi, @missirol

cmsbuild · 2025-01-15T11:50:53Z

cms-bot internal usage

cmsbuild · 2025-01-15T11:52:44Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47106/43294

cmsbuild · 2025-01-15T11:53:05Z

A new Pull Request was created by @mmasciov for master.

It involves the following packages:

RecoTracker/IterativeTracking (reconstruction)
RecoTracker/MkFit (reconstruction)
RecoTracker/MkFitCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

mmasciov · 2025-01-15T11:53:49Z

please test

fwyzard · 2025-01-15T13:50:05Z

replace (slow) std::hypot with explicit sqrt(x*x+y*y)

do you know why std::hypot is slower ?
is it because it uses double precision while the explicit replacement uses single precision ?

fwyzard · 2025-01-15T14:01:52Z

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules

I'm curious, what is the timing reduction measured using the framework report or the FastTimerService ?

mmusich · 2025-01-15T14:27:51Z

do you know why std::hypot is slower ?

https://stackoverflow.com/questions/32435796/when-to-use-stdhypotx-y-over-stdsqrtxx-yy

Looks like builtin overflow / underflow checks slow it down?

mmasciov · 2025-01-15T14:32:27Z

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules

I'm curious, what is the timing reduction measured using the framework report or the FastTimerService ?

When running the same HLT configuration on the same machine, I had checked the FastTimerService output for MkFitEventOfHitsProducer, MkFitSiPixelHitConverter and MkFitSiStripHitConverter, and indeed the corresponding timing was reduced by roughly 35% for all of them, consistent with the values reported above.

cmsbuild · 2025-01-15T14:57:41Z

+1

Size: This PR adds an extra 112KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43767/summary.html
COMMIT: 82b554f
CMSSW: CMSSW_15_0_X_2025-01-14-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47106/43767/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 25 lines to the logs
Reco comparison results: 26037 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3932183
DQMHistoTests: Total failures: 16750
DQMHistoTests: Total nulls: 17
DQMHistoTests: Total successes: 3915396
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 49 files compared)
DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
Checked 218 log files, 189 edm output root files, 50 DQM output files
TriggerResults: no differences found

mmusich · 2025-01-15T15:11:38Z

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

What's the final customization function to be applied on top of the HLT menu for testing purposes?
Is it RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 ?

mmasciov · 2025-01-15T15:38:33Z

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

What's the final customization function to be applied on top of the HLT menu for testing purposes? Is it RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 ?

Correct. As mentioned at today's TSG meeting, I'm going to push a new commit, renaming RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 so that it's picked up by the existing .7 workflows.

…h new customization function targeting 2025

cmsbuild · 2025-01-15T15:47:38Z

Pull request #47106 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

jfernan2 · 2025-01-15T15:52:46Z

enable profiling

mmasciov · 2025-01-15T15:57:57Z

Given that profiling was enabled, I'll start the tests again (on top of those started by @mmusich for cms-data/RecoTracker-MkFit#15, which are equivalent except for profiling)

mmasciov · 2025-01-15T15:59:02Z

@cmsbuild, please test

mmusich · 2025-01-15T16:00:19Z

(on top of those started by @mmusich for cms-data/RecoTracker-MkFit#15)

if the pure configuration addition at that cms-data PR is considered final, it might be convenient to get that out of the way as early as possible (no physics change from that alone).

cmsbuild · 2025-01-15T22:40:04Z

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/summary.html
COMMIT: 6ca150a
CMSSW: CMSSW_15_0_X_2025-01-15-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47106/43777/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/git-merge-result

Comparison Summary

Summary:

You potentially added 26 lines to the logs
Reco comparison results: 26121 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3932183
DQMHistoTests: Total failures: 16717
DQMHistoTests: Total nulls: 17
DQMHistoTests: Total successes: 3915429
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 49 files compared)
DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
Checked 218 log files, 189 edm output root files, 50 DQM output files
TriggerResults: no differences found

jfernan2 · 2025-01-16T09:29:49Z

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html
despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

mmasciov · 2025-01-16T11:57:52Z

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Unless I'm reading this wrong, what is reported is the "time fraction diff percent". So, the reported reduction is relative to the total time. The MkFitProducer for detachedTriplet and pixelLess steps takes a significant amount of time with respect to MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer, the fractions being 2.4 and 4.7 for MkFitProducer in detachedTriplet and pixelLess, respectively, as opposed to 0.007, 0.09 and 0.08 for the converters.
However, the fraction for MkFitProducer in pixelLess (as an example) goes from 4.7 to 4.4 , with a relative difference of 7%, while for MkFitEventOfHitsProducer (as an example) goes from 0.08 to 0.05, with a relative difference of ~35%, which seems (to me) in line with the numbers reported above.
Hope this helps.

mmusich · 2025-01-16T12:19:58Z

RecoTracker/MkFit/python/customizeHLTIter0ToMkFit.py

@@ -39,11 +72,13 @@ def customizeHLTIter0ToMkFit(process):



process.hltMkFitGeometryESProducer = mkFitGeometryESProducer_cfi.mkFitGeometryESProducer.clone()

when I try to run the customization function in the HLT addOnTests, I get the following failure:

----- Begin Fatal Exception 16-Jan-2025 12:34:44 CET----------------------- An exception of category 'EventSetupConflict' occurred while [0] Calling beginJob Exception Message: two EventSetup Producers want to deliver type="MkFitGeometry" label="" from record TrackerRecoGeometryRecord. The two providers are 1) type="MkFitGeometryESProducer" label="hltMkFitGeometryESProducer" 2) type="MkFitGeometryESProducer" label="mkFitGeometryESProducer" Please either remove one of these Producers or find a way of configuring one of them so it does not deliver this data or use an es_prefer statement in the configuration to choose one. ----- End Fatal Exception -------------------------------------------------

in the step that runs HLT+RECO in the same step. Would it be possible to substitute this line with:

process.load("RecoTracker.MkFit.mkFitGeometryESProducer_cfi")

such that one ESProducer overrides the other? This change would be needed also for the final implementation in ConfDB by the way.

Otherwise, please append the data to a different label, not the default '' (and change the input label for all the EDProducers that consume this product).

Of course: d578a59
Thanks!

Thanks to you.

cmsbuild · 2025-01-16T12:40:39Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47106/43325

cmsbuild · 2025-01-16T12:41:04Z

Pull request #47106 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

mmasciov · 2025-01-16T13:18:12Z

please test

slava77 · 2025-01-16T14:26:56Z

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates
PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

makortel · 2025-01-16T15:35:06Z

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

Could you copy that comment to #43166 ?

Speedup of mkFit hit converters, and update of HLT customization

82b554f

mmasciov mentioned this pull request Jan 15, 2025

Add HLT configuration for 2025 cms-data/RecoTracker-MkFit#15

Open

cmsbuild added this to the CMSSW_15_0_X milestone Jan 15, 2025

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending code-checks-pending tracking labels Jan 15, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 15, 2025

cmsbuild added tests-started and removed tests-pending labels Jan 15, 2025

cmsbuild added tests-approved and removed tests-started labels Jan 15, 2025

Remove original, deprecated HLT mkFit customization, replacing it wit…

6ca150a

…h new customization function targeting 2025

cmsbuild added tests-pending and removed tests-approved code-checks-approved labels Jan 15, 2025

cmsbuild added the code-checks-approved label Jan 15, 2025

cmsbuild added tests-started and removed tests-pending labels Jan 15, 2025

cmsbuild added tests-approved and removed tests-started labels Jan 15, 2025

mmusich reviewed Jan 16, 2025

View reviewed changes

Use process.load for mkFiGeometryESProducer in HLT customizer

d578a59

cmsbuild added tests-pending code-checks-pending and removed tests-approved code-checks-approved labels Jan 16, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 16, 2025

cmsbuild added tests-started and removed tests-pending labels Jan 16, 2025

slava77 mentioned this pull request Jan 16, 2025

No execuation time comparison available for PRs #43166

Open

jfernan2 mentioned this pull request Jan 16, 2025

fill charge_ and barycenter_ for all SiStripClusters #47094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

mmasciov commented Jan 15, 2025 •

edited

Loading

cmsbuild commented Jan 15, 2025 •

edited

Loading

cmsbuild commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

mmasciov commented Jan 15, 2025

fwyzard commented Jan 15, 2025

fwyzard commented Jan 15, 2025

mmusich commented Jan 15, 2025

mmasciov commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

mmusich commented Jan 15, 2025

mmasciov commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

jfernan2 commented Jan 15, 2025

mmasciov commented Jan 15, 2025

mmasciov commented Jan 15, 2025

mmusich commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

jfernan2 commented Jan 16, 2025

mmasciov commented Jan 16, 2025

mmusich Jan 16, 2025 •

edited

Loading

mmasciov Jan 16, 2025

mmusich Jan 16, 2025

cmsbuild commented Jan 16, 2025

cmsbuild commented Jan 16, 2025

mmasciov commented Jan 16, 2025

slava77 commented Jan 16, 2025

makortel commented Jan 16, 2025

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

Are you sure you want to change the base?

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

Conversation

mmasciov commented Jan 15, 2025 • edited Loading

PR description:

PR validation:

cmsbuild commented Jan 15, 2025 • edited Loading

cmsbuild commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

mmasciov commented Jan 15, 2025

fwyzard commented Jan 15, 2025

fwyzard commented Jan 15, 2025

mmusich commented Jan 15, 2025

mmasciov commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

Comparison Summary

mmusich commented Jan 15, 2025

mmasciov commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

jfernan2 commented Jan 15, 2025

mmasciov commented Jan 15, 2025

mmasciov commented Jan 15, 2025

mmusich commented Jan 15, 2025

cmsbuild commented Jan 15, 2025

Comparison Summary

jfernan2 commented Jan 16, 2025

mmasciov commented Jan 16, 2025

mmusich Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

mmasciov Jan 16, 2025

Choose a reason for hiding this comment

mmusich Jan 16, 2025

Choose a reason for hiding this comment

cmsbuild commented Jan 16, 2025

cmsbuild commented Jan 16, 2025

mmasciov commented Jan 16, 2025

slava77 commented Jan 16, 2025

makortel commented Jan 16, 2025

mmasciov commented Jan 15, 2025 •

edited

Loading

cmsbuild commented Jan 15, 2025 •

edited

Loading

mmusich Jan 16, 2025 •

edited

Loading