No execuation time comparison available for PRs #43166

mandrenguyen · 2023-11-02T05:36:32Z

Since a few months we are not able to see the CPU impact of a given pull request, which used to be possible with the enable profiling option in the Jenkins tests.
This is a bit problematic for integrating new features, as we won't easily be able to keep track of changes in performance until a pre-release is built.
The issue seems to come from igprof, which apparently can no longer really be supported.
One suggestion from @gartung is to try to move to VTune.

The text was updated successfully, but these errors were encountered:

mandrenguyen · 2023-11-02T05:36:47Z

assign core, reconstruction

cmsbuild · 2023-11-02T05:36:51Z

New categories assigned: core,reconstruction

@Dr15Jones,@jfernan2,@makortel,@mandrenguyen,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2023-11-02T05:36:53Z

A new Issue was created by @mandrenguyen Matthew Nguyen.

@rappoccio, @antoniovilela, @sextonkennedy, @makortel, @smuzaffar, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

gartung · 2023-11-02T13:09:55Z

The problem is that the cmsRun process itself gets a segfault while being profiled by Igprof. The same segfault might happen when being profiled with Vtune.

makortel · 2023-11-02T13:29:50Z

In case IgProf+cmsRun combination crashes, is any information on the job timings saved that can be used for comparison?

gartung · 2023-11-02T14:32:29Z

Usually the FastTimerService job completes and the average per module is contained in the raw json file if the resources piechart is not readable.

gartung · 2023-11-02T14:34:06Z

The IgprofService dumps the profile after the first, middle and next to last event. The first one might not have enough data to be meaningful.

gartung · 2023-11-02T14:35:30Z

@mandrenguyen Can you point me to a PR so I can look at the logs.

mandrenguyen · 2023-11-02T16:31:42Z

@gartung The last one we tried with profiling enabled was:
#43107

makortel · 2023-12-13T17:43:56Z

The crashes under profilers are quite likely caused by the memory corruption inside Tensorflow (when ran through IgProf or VTune) that has been investigated in #42444.

VinInn · 2023-12-15T13:32:47Z

The FastTimer Service should suffice. Still It seems not active in RelVals

mmusich · 2024-08-26T14:45:49Z

The issue seems to come from igprof, which apparently can no longer really be supported.
[...] is to try to move to VTune.

for my education is this replacement documented somewhere ?
I still see igprof listed in the RecoIntegration CMS Twiki.

jfernan2 · 2024-09-03T10:21:12Z

@mmusich it is expected that VTune gives the same problem as igprof, so the replacement has not been done.
Indeed this is a real showstopper for RECO developments since we cannot monitor the time profiling in PRs

mmusich · 2024-09-03T10:49:45Z

so the replacement has not been done.
Indeed this is a real showstopper for RECO developments since we cannot monitor the time profiling in PRs

I see, that's bad news. I gather the same holds true for user checks when developing (regardless of the time profiling in PRs)

makortel · 2024-09-05T19:57:58Z

Is the most burning problem that there is no timing information (entire job, per module) or that the real IgProf/VTune profile (with function-level information) is missing (because of crash)?

mmusich · 2024-09-06T17:51:41Z

Is the most burning problem that there is no timing information (entire job, per module) or that the real IgProf/VTune profile (with function-level information) is missing (because of crash)?

for me (personally) at least, having the function level information would be really helpful.

jfernan2 · 2024-09-06T20:05:16Z

IMHO the crash of igprof/Vtune is a problem although there is timing info from FastTimer module, but the real issue is not having a comparison of baseline time performance vs baseline+PR, which force us to detect a posteriori total increases in the profiles when a prerelease is built, and then figure out which PR(s) were responsible....

Perhaps a comparison script based on FastTimer output could be useful even if not optimal, do you think this is possible @gartung ?
Thanks

gartung · 2024-09-10T17:10:09Z

Yes it would be possible. In fact there is a script already that merges two FastTimer output files
https://github.com/fwyzard/circles/blob/master/scripts/merge.py
Changing that to diff two files should be possible.

gartung · 2024-09-10T18:09:20Z

You can try this script

https://raw.githubusercontent.com/gartung/circles/master/scripts/diff.py

gartung · 2024-09-10T18:10:56Z

If you add enable profiling as a comment on the pull request the FastTimer service is run as part of the profiling.

jfernan2 · 2024-09-19T18:36:03Z

Thanks, but the real need is to have the comparison in the PR, to see the changes introduced, the same way we had with igprof

enable profiling at this point only runs the FastTimer in the PR FW, but gives no comparison of time. which is what allows to decide

gartung · 2024-09-26T16:58:30Z

I am working on a pull request to cms-bot that will add the diff.py script and run it when enable profiling is commented. The PR timing and the diff of the PR timing vs the IB release will be linked in the summary.
cms-sw/cms-bot#2343

gartung · 2024-10-01T14:55:52Z

This script has been added to pull request profiling and produces an html table of all of the modules in the resources file and their differences
https://raw.githubusercontent.com/cms-sw/cms-bot/refs/heads/master/comparisons/resources-diff.py

makortel · 2024-10-02T15:45:31Z

due to initialization times reading inputs we are having too many false positives

Do you mean something along the the baseline being run first, and that leading the input file being cached in memory?

If we are to go to that level of precision, I'd suggest to dd the input file to /dev/null first (to cache it in memory), and run one warmup job (could be the same number of events as the actual profiling job, or less) prior the actual measurement.

makortel · 2024-10-02T15:47:00Z

In the table, I'd suggest to also add units to both time and memory, and in the cells present first the baseline and then the PR value (but keep the difference as "PR - baseline", as we do in the maxmemory table).

jfernan2 · 2024-10-02T17:00:54Z

due to initialization times reading inputs we are having too many false positives

Do you mean something along the the baseline being run first, and that leading the input file being cached in memory?

If we are to go to that level of precision, I'd suggest to dd the input file to /dev/null first (to cache it in memory), and run one warmup job (could be the same number of events as the actual profiling job, or less) prior the actual measurement.

I would not seek precisión, but something which allows to tell if there is a real change or not. It seems to me that with 10 events we are left to stats fluctuations which give more than 3-4% difference in about 90% of the modules being compared (orange everywhere).

jfernan2 · 2024-10-02T18:30:41Z

Moreover, fluctuations seem to make some modules to increase and others to decrease, so perhaps a global total value of time per event as summary makes more sense if we cannot get rid of these ups and downs

gartung · 2024-10-02T20:06:19Z

Would a sum over module labels per module type give a better indication? The reco-event-loop shows the time in each module type's produce method.

jfernan2 · 2024-10-02T20:09:18Z

I think so, at least as a first result to see at a glance if timing is increased or not, the module by module info is still necessary to spot culprits. However I still believe 10 event jobs have very large uncertainty

gartung · 2024-10-08T12:20:58Z

The variance in module timing might be caused by more than one Jenkins job running on vocms11 at the same time. I can restrict the baseline and PR profiling jobs so that only one at a time can run on vocms11.

gartung · 2024-10-08T12:56:43Z

I determined that the IB profiling was being run on cmsprofile-11 and the PR profiling was being run on vocms011. The former is a multicore VM, the later is a bare metal machine. This could also account for the differences.

makortel · 2024-10-08T14:10:45Z

Is the "IB profiling" the same as "PR test baseline profiling"? I'm asking because for DQM/reco comparison purposes the "IB" and "PR test baseline" are different things.

gartung · 2024-10-08T15:13:10Z

There is no profiling done for baseline. The comparison is with the corresponding IB.

jfernan2 · 2024-10-14T15:38:10Z

The variance in module timing might be caused by more than one Jenkins job running on vocms11 at the same time. I can restrict the baseline and PR profiling jobs so that only one at a time can run on vocms11.

vocms011 is also used to monitor RECO profiling on new releases, so it is used centrally from time to time.
What I don't understand is why the several simultaneous jobs might cause an almost random timing across modules

makortel · 2024-10-14T18:44:25Z

What I don't understand is why the several simultaneous jobs might cause an almost random timing across modules

For example, modern CPUs adjust their operating frequency based on the load of the machine. Other processes may also interfere with e.g. disk, memory, and/or network usage.

jfernan2 · 2024-10-15T08:13:19Z

Yes, I see that, but that would shift all the modules in one direction, not in a random way, am I right?

Dr15Jones · 2024-10-15T13:16:41Z

another effect is the OS can 'steal' a CPU from a process to use for something else temporarily. The heavier the load on the machine, the more likely this is to happen. If it happens while a module were running, it would make the time between the start and stop of the module longer than it would have been without the interruption.

gartung · 2024-10-15T14:57:57Z

For a comparison of the IB timing and memory FastTimerService numbers you can also look at
https://monit-grafana.cern.ch/d/000000530/cms-monitoring-project?viewPanel=60&orgId=11
which is the link on the IB dashboard labeled "IB profiling results"

jfernan2 · 2024-10-18T10:05:21Z

Thanks @gartung
However IBs have many changes in, we need to inspect PR by PR to distangle in an adiabatic way... I would still push to implement these suggestions though:
#43166 (comment)

gartung · 2024-11-06T15:34:21Z

I tried using the step2.root produced by the IB profiling build with the IB+PR profiling in step3. I also tried to account for timing differences from running on two different VMs by dividing the time per module by the total time. The percentage difference is calculated from these fractions of the total time. I still see large percentage differences.
cms-sw/cms-bot#2366
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-018b9a/42603/profiling/29834.21/diff-step3_cpu.resources.json.html
I will next try profiling the IB on the same machine to remove timing differences from running on two different VMs.

gartung · 2024-11-06T15:35:16Z

I am also working on summing metrics across module labels for each module type.

gartung · 2024-11-07T14:40:27Z

Running the IB and IB+PR profiling on the same VM still results in diffrerences
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-018b9a/42636/profiling/29834.21/diff-step3_cpu.resources.json.html

gartung · 2024-11-07T20:37:38Z

With the number of events increased to 100 the variance between two runs of step 3 on the same VM is less
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-018b9a/42636/profiling/12634.21/diff-step3_cpu.resources.json.html

jfernan2 · 2024-11-22T11:11:53Z

@gartung sorry but the last version of the comparison seems to have reversed signs on the total RECO summary at the beginning of the report, see:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-119d70/42986/profiling/29834.21/diff-step3_cpu.resources.json.html

RECO real time difference is (PR-baseline) = -2007ms when PR seems to be slower (915525ms)
Besides, legend at the top misses the denominator (baseline)

Thanks

gartung · 2024-11-22T15:01:10Z

The denominator was left out because the percentage diff of fractional time was thought to be less useful.

jfernan2 · 2024-11-22T15:02:57Z

The denominator was left out because the percentage diff of fractional time was thought to be less useful.

I meant in the legend, right after the definition of the colors. We are quoting time fraction diff percents, so there must be a denominator.....

gartung · 2024-11-22T15:23:35Z

That would be total time. I will add it to the next iteration of the script.

gartung · 2024-11-22T15:26:08Z

so it's 100% * ((PR/PR total time) - (Base/Base total time))

slava77 · 2025-01-16T16:04:52Z

following #47106 (comment) (Jan 16, 2025)

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates
PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

gartung · 2025-01-16T16:09:24Z

Yes, this script is a work in progress.

gartung · 2025-01-16T21:25:10Z

cms-sw/cms-bot#2414
updates the script

cmsbuild added reconstruction-pending core-pending pending-signatures labels Nov 2, 2023

This was referenced Nov 2, 2023

Updating SONIC ParticleNet Producer and Config Files #43138

Merged

Minor CLUE3D optimization #43050

Merged

jfernan2 mentioned this issue Jul 2, 2024

Remove data workflows from profiling in PRs cms-sw/cms-bot#2280

Merged

jfernan2 mentioned this issue Sep 6, 2024

Redesign of the association maps, multivector manager, HGCAL Rechits and Validation with significant speedup of Phase-2 workflows #45865

Merged

makortel mentioned this issue Nov 8, 2024

Improve FastTimerService-based time reporting in profiling PR tests cms-sw/framework-team#1085

Open

3 tasks

makortel mentioned this issue Jan 16, 2025

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

Open

No execuation time comparison available for PRs #43166

No execuation time comparison available for PRs #43166

Comments

mandrenguyen commented Nov 2, 2023

mandrenguyen commented Nov 2, 2023

cmsbuild commented Nov 2, 2023

cmsbuild commented Nov 2, 2023 • edited Loading

gartung commented Nov 2, 2023

makortel commented Nov 2, 2023

gartung commented Nov 2, 2023

gartung commented Nov 2, 2023

gartung commented Nov 2, 2023

mandrenguyen commented Nov 2, 2023

makortel commented Dec 13, 2023

VinInn commented Dec 15, 2023

mmusich commented Aug 26, 2024

jfernan2 commented Sep 3, 2024

mmusich commented Sep 3, 2024

makortel commented Sep 5, 2024

mmusich commented Sep 6, 2024

jfernan2 commented Sep 6, 2024

gartung commented Sep 10, 2024

gartung commented Sep 10, 2024

gartung commented Sep 10, 2024

jfernan2 commented Sep 19, 2024

gartung commented Sep 26, 2024

gartung commented Oct 1, 2024

makortel commented Oct 2, 2024

makortel commented Oct 2, 2024

jfernan2 commented Oct 2, 2024

jfernan2 commented Oct 2, 2024

gartung commented Oct 2, 2024

jfernan2 commented Oct 2, 2024

gartung commented Oct 8, 2024

gartung commented Oct 8, 2024

makortel commented Oct 8, 2024

gartung commented Oct 8, 2024

jfernan2 commented Oct 14, 2024

makortel commented Oct 14, 2024

jfernan2 commented Oct 15, 2024

Dr15Jones commented Oct 15, 2024

gartung commented Oct 15, 2024

jfernan2 commented Oct 18, 2024 • edited Loading

gartung commented Nov 6, 2024

gartung commented Nov 6, 2024

gartung commented Nov 7, 2024

gartung commented Nov 7, 2024

jfernan2 commented Nov 22, 2024

gartung commented Nov 22, 2024

jfernan2 commented Nov 22, 2024 • edited Loading

gartung commented Nov 22, 2024

gartung commented Nov 22, 2024

slava77 commented Jan 16, 2025 • edited Loading

gartung commented Jan 16, 2025

gartung commented Jan 16, 2025

cmsbuild commented Nov 2, 2023 •

edited

Loading

jfernan2 commented Oct 18, 2024 •

edited

Loading

jfernan2 commented Nov 22, 2024 •

edited

Loading

slava77 commented Jan 16, 2025 •

edited

Loading