Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: benchmark toy detector config #714

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

niermann999
Copy link
Contributor

@niermann999 niermann999 commented Sep 24, 2024

Allow the toy detector grids to find all neighboring surfaces, this should increase the number of tracks that can be found and fitted. Furthermore, I restricted the simulation to an eta region, where the tracks should hit a decent number of sensitive surface.

I also tried removing the constrained step from the finder and fitter, since they are not used in performance relevant code and eat at least a bit of memory (it is an array or constraints that all need to be checked at every step).

I then tried to see if the performance hit comes from the grids, so I tried using the static grids of the toy detector instead of a grid with dynamic bin capacities. Apart from that, the detector is now copied to device instead of using managed memory.

@niermann999
Copy link
Contributor Author

niermann999 commented Oct 8, 2024

As soon as I let the grid actually search for neighbouring surfaces, the perfromance gets really bad...

Running ./bin/traccc_benchmark_cpu
Run on (48 X 2551.74 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 512 KiB (x24)
  L3 Unified 32768 KiB (x4)
Load Average: 0.92, 0.66, 1.04
WARNING: No entries in volume finder

Detector check: OK
-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
ToyDetectorBenchmark/CPU 1.8804e+10 ns   1.3231e+10 ns            1 event_throughput_Hz=7.55819/s
Running ./bin/traccc_benchmark_cuda
Run on (48 X 2270.35 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 512 KiB (x24)
  L3 Unified 32768 KiB (x4)
Load Average: 1.80, 2.18, 1.64
WARNING: No entries in volume finder

Detector check: OK
------------------------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------
ToyDetectorBenchmark/CUDA 1.3547e+10 ns   1.3471e+10 ns            1 event_throughput_Hz=7.42362/s

@niermann999 niermann999 marked this pull request as ready for review October 8, 2024 14:52
@niermann999 niermann999 force-pushed the ref-benchmark-cfg branch 2 times, most recently from 6ed2e31 to 29b1033 Compare October 14, 2024 15:52
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant