-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAPI_TOT_INS randomly off on AMD EPYC 7352 #160
Comments
Hi Alexander, I am looking into this issue and will keep you updated. Thank you, |
I am unable to reproduce this issue on our AMD Zen2 testbed. I also do not get the error message:
Could you please provide the output from the 'papi_component_avail' utility in addition to the value in the file '/proc/sys/kernel/perf_event_paranoid'? Thank you, |
Here the requested output:
Note that the failure doesn't always happen. After the just failed
Repeating the same loop twice I got all success (200001617 - 200001627) and then all fails (3200025856 - 3200026016) |
Hi Alexander, Could you please change the value in perf_event_paranoid to 0, re-run the ctest, and post the results? |
Hi Alexander, Are there any updates to this issue? Has it been resolved? |
The issue I observe happens on a HPC system. So I don't have sufficient privileges to change that flag. I asked the admins and am waiting for a reply. |
For
|
Test with
Even as user with root it doesn't look much different |
Any updates here?
The failing ones are off by a factor of exactly 15,99999992 . A constant of ~16 makes me think there is an issue with the measuring logic and/or scheduling does something different sometimes. |
We test the installation with the usual
make test
which runsctests/zero
. But that fails seemingly randomly but very oftenPassing output is:
Failing output:
This is with PAPI 7.1.0 on a "AMD EPYC 7352 24-Core Processor" system (4 CPUs)
It looks like it randomly picks up an additional "3000000000" instructions which looks rather like an error as the remainder makes sense.
Any ideas?
The text was updated successfully, but these errors were encountered: