Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mmasciov
Copy link
Contributor

@mmasciov mmasciov commented Jan 15, 2025

PR description:

This PR aims at speeding up the mkFit (input) converters and the mkFit producer, targeting HLT in 2025:
- RecoTracker/MkFit/plugins/MkFit*HitConverter.cc and convertHits.h: save mkFit layer index per hit to be used for a faster MkFitEventOfHitsProducer; use more variables available for a given Det to avoid their recomputation per hit (including hit local->global conversion); access clusters from the reference collection by index directly instead of dereferencing an OmniClusterRef
- RecoTracker/MkFit/plugins/MkFitEventOfHitsProducer.cc uses pre-computed per-hit layer index
- RecoTracker/MkFit/plugins/MkFitProducer.cc (and customization functions) : disable (by config) additional/redundant re-check of the input cluster charge (already applied during the hit creation); this should work OK for an mkFit setup where CCC is the same in all iterations
- RecoTracker/MkFitCore/interface/Hit.h : vdt, replace (slow) std::hypot with explicit sqrt(x*x+y*y); use SMatrixSym33 for direct covariance storage to avoid conversion from SVector6
- RecoTracker/MkFitCore/src/MkFinder.cc, RecoTracker/MkFitCore/src/MkFitter.cc, RecoTracker/MkFitCore/src/PropagationMPlex*.cc, RecoTracker/MkFitCore/src/Track.cc vdt, replace (slow) std::hypot with explicit sqrt(x*x+y*y)

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules:

  • MkFitEventOfHitsProducer : -40%
  • MkFitSiStripHitConverter: -35%
  • MkFitSiPixelHitConverter: -40%
  • MkFitProducer: -14% (7% is the faster math, 7% from not updating the cluster mask with CCC)

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

PR validation:

This PR was validated using both offline and HLT configurations.
For offline: http://uaf-10.t2.ucsd.edu/~mmasciov/MIC/HLTTracking/offlineMTV_TTbarPU_2024_PR155/
--> Only differences are at the level of fluctuations, consistently with the proposed changes.

FYI: @cms-sw/tracking-pog-l2, @slava77, @mtosi, @missirol

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 15, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mmasciov for master.

It involves the following packages:

  • RecoTracker/IterativeTracking (reconstruction)
  • RecoTracker/MkFit (reconstruction)
  • RecoTracker/MkFitCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmasciov
Copy link
Contributor Author

please test

@fwyzard
Copy link
Contributor

fwyzard commented Jan 15, 2025

replace (slow) std::hypot with explicit sqrt(x*x+y*y)

do you know why std::hypot is slower ?
is it because it uses double precision while the explicit replacement uses single precision ?

@fwyzard
Copy link
Contributor

fwyzard commented Jan 15, 2025

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules

I'm curious, what is the timing reduction measured using the framework report or the FastTimerService ?

@mmusich
Copy link
Contributor

mmusich commented Jan 15, 2025

do you know why std::hypot is slower ?

https://stackoverflow.com/questions/32435796/when-to-use-stdhypotx-y-over-stdsqrtxx-yy

Looks like builtin overflow / underflow checks slow it down?

@mmasciov
Copy link
Contributor Author

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules

I'm curious, what is the timing reduction measured using the framework report or the FastTimerService ?

When running the same HLT configuration on the same machine, I had checked the FastTimerService output for MkFitEventOfHitsProducer, MkFitSiPixelHitConverter and MkFitSiStripHitConverter, and indeed the corresponding timing was reduced by roughly 35% for all of them, consistent with the values reported above.

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 112KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43767/summary.html
COMMIT: 82b554f
CMSSW: CMSSW_15_0_X_2025-01-14-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47106/43767/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 25 lines to the logs
  • Reco comparison results: 26037 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3932183
  • DQMHistoTests: Total failures: 16750
  • DQMHistoTests: Total nulls: 17
  • DQMHistoTests: Total successes: 3915396
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mmusich
Copy link
Contributor

mmusich commented Jan 15, 2025

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

What's the final customization function to be applied on top of the HLT menu for testing purposes?
Is it RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 ?

@mmasciov
Copy link
Contributor Author

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

What's the final customization function to be applied on top of the HLT menu for testing purposes? Is it RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 ?

Correct. As mentioned at today's TSG meeting, I'm going to push a new commit, renaming RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 so that it's picked up by the existing .7 workflows.

@cmsbuild
Copy link
Contributor

Pull request #47106 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@jfernan2
Copy link
Contributor

enable profiling

@mmasciov
Copy link
Contributor Author

Given that profiling was enabled, I'll start the tests again (on top of those started by @mmusich for cms-data/RecoTracker-MkFit#15, which are equivalent except for profiling)

@mmasciov
Copy link
Contributor Author

@cmsbuild, please test

@mmusich
Copy link
Contributor

mmusich commented Jan 15, 2025

(on top of those started by @mmusich for cms-data/RecoTracker-MkFit#15)

if the pure configuration addition at that cms-data PR is considered final, it might be convenient to get that out of the way as early as possible (no physics change from that alone).

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/summary.html
COMMIT: 6ca150a
CMSSW: CMSSW_15_0_X_2025-01-15-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47106/43777/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/git-merge-result

Comparison Summary

Summary:

  • You potentially added 26 lines to the logs
  • Reco comparison results: 26121 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3932183
  • DQMHistoTests: Total failures: 16717
  • DQMHistoTests: Total nulls: 17
  • DQMHistoTests: Total successes: 3915429
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html
despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

@mmasciov
Copy link
Contributor Author

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Unless I'm reading this wrong, what is reported is the "time fraction diff percent". So, the reported reduction is relative to the total time. The MkFitProducer for detachedTriplet and pixelLess steps takes a significant amount of time with respect to MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer, the fractions being 2.4 and 4.7 for MkFitProducer in detachedTriplet and pixelLess, respectively, as opposed to 0.007, 0.09 and 0.08 for the converters.
However, the fraction for MkFitProducer in pixelLess (as an example) goes from 4.7 to 4.4 , with a relative difference of 7%, while for MkFitEventOfHitsProducer (as an example) goes from 0.08 to 0.05, with a relative difference of ~35%, which seems (to me) in line with the numbers reported above.
Hope this helps.

@@ -39,11 +72,13 @@ def customizeHLTIter0ToMkFit(process):

Copy link
Contributor

@mmusich mmusich Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    process.hltMkFitGeometryESProducer = mkFitGeometryESProducer_cfi.mkFitGeometryESProducer.clone()

when I try to run the customization function in the HLT addOnTests, I get the following failure:

----- Begin Fatal Exception 16-Jan-2025 12:34:44 CET-----------------------
An exception of category 'EventSetupConflict' occurred while
   [0] Calling beginJob
Exception Message:
two EventSetup Producers want to deliver type="MkFitGeometry" label=""
 from record TrackerRecoGeometryRecord. The two providers are 
1) type="MkFitGeometryESProducer" label="hltMkFitGeometryESProducer"
2) type="MkFitGeometryESProducer" label="mkFitGeometryESProducer"
Please either
   remove one of these Producers
   or find a way of configuring one of them so it does not deliver this data
   or use an es_prefer statement in the configuration to choose one.
----- End Fatal Exception -------------------------------------------------

in the step that runs HLT+RECO in the same step. Would it be possible to substitute this line with:

    process.load("RecoTracker.MkFit.mkFitGeometryESProducer_cfi")

such that one ESProducer overrides the other? This change would be needed also for the final implementation in ConfDB by the way.

Otherwise, please append the data to a different label, not the default '' (and change the input label for all the EDProducers that consume this product).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course: d578a59
Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to you.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47106 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@mmasciov
Copy link
Contributor Author

please test

@slava77
Copy link
Contributor

slava77 commented Jan 16, 2025

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates
PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

@makortel
Copy link
Contributor

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

Could you copy that comment to #43166 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants