-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fatal system signal has occurred during exit #32045
Comments
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Occurred in 250407.17 step 2 in slc7_amd64_gcc10 IB CMSSW_11_2_X_2020-11-05-2300
|
@hufnagel reported "high rate of job failures" with the same error at T3_US_TACC in https://hypernews.cern.ch/HyperNews/CMS/get/edmFramework/3902.html with CMSSW_9_3_18 and CMSSW_9_4_7 (slc6_amd64_gcc630). The HN thread did not end up in any conclusion (yet). Some of the failed jobs did give stack traces:
|
One conclusion from the tests at TACC was that the failure is reproducible (although whether or not you end up getting a stack trace seems to be random). Another interesting data point was that the same workflow had almost 100% failure rate at TACC and ran fine elsewhere. |
Occurred in
|
To add here as well, with
|
Just to note here as well, the crash in |
We see again In particular #32804 caused several problems (see #33107 Assertion failures in edm::service::Timing) and they were solved by #33125. |
@silviodonato #33126 is meant to fix the crashes at end of job |
+1 We have not seen these crashes for a long time, so I think we can close the issue. |
This issue is fully signed and ready to be closed. |
Occasionally random jobs in IB RelVals fail with
at the very end of the log without any further hint of what is causing the problem. This purpose of this issue is to collect their occurrences, and to discuss how to debug it and what could be causing the problem.
The text was updated successfully, but these errors were encountered: