You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that sometimes our booktests timeout after 2.5 hours / 150 minutes, e.g. here for PR #4490 while usually they finish with lots of room to spare; e.g. here in 40 minutes.
In this case, the last message from the test runner in the logs before a gazillion messages of the form From worker 7: GC: pause 15.33ms. collected 52.249626MB. incr was this:
2025-01-22T23:08:56.8299319Z From worker 7: vinberg_2.jlcon
Not having looked at when this filename is printed, I don't know if this indicates we are running the tests in vinberg_2.jlcon, or perhaps in the file after it.
But anyway, my suspicion is that there are some tests here which "usually" work but for some RNG seed states end up performing much worse than usual.
Perhaps we should reduce fluctuation in this test by forcing a specific seed? Or do we already set a seed -- then this might a sign of another source of randomness we do not yet control for (e.g. when Singular is forking to achieve parallelism, I am guessing the order in which child processes finish their task may affect the overall result)
Also it may be useful if someone dug into it to figure out which part causes the issue. Since we parse the .jlcon files to process them step-by-step, perhaps we could enable a debug mode where it prints out which line (number) it is currently processing?
The text was updated successfully, but these errors were encountered:
Directly in the run_repl_string we can only print the whole jlcon file but not each input line, since we basically paste the whole jlcon file to the repl (in bracketed paste mode).
I started working on a PR to help debugging this by adding a hangcheck timer: #4504
But the hangcheck triggered only once after about 10 minutes, at vinberg_2.jlcon line 153, julia> B = [basis_representation(Y2, D) for D in pullbackDivY1];.
It did not trigger at all after this, so I would assume that we are either in some tight loop or in non-julia code which does not allow the timer to run again (somewhere after that line 153...).
I noticed that sometimes our booktests timeout after 2.5 hours / 150 minutes, e.g. here for PR #4490 while usually they finish with lots of room to spare; e.g. here in 40 minutes.
In this case, the last message from the test runner in the logs before a gazillion messages of the form
From worker 7: GC: pause 15.33ms. collected 52.249626MB. incr
was this:Not having looked at when this filename is printed, I don't know if this indicates we are running the tests in
vinberg_2.jlcon
, or perhaps in the file after it.But anyway, my suspicion is that there are some tests here which "usually" work but for some RNG seed states end up performing much worse than usual.
Perhaps we should reduce fluctuation in this test by forcing a specific seed? Or do we already set a seed -- then this might a sign of another source of randomness we do not yet control for (e.g. when Singular is forking to achieve parallelism, I am guessing the order in which child processes finish their task may affect the overall result)
Also it may be useful if someone dug into it to figure out which part causes the issue. Since we parse the
.jlcon
files to process them step-by-step, perhaps we could enable a debug mode where it prints out which line (number) it is currently processing?The text was updated successfully, but these errors were encountered: