Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix alarm tests #39142

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from
Open

Fix alarm tests #39142

wants to merge 14 commits into from

Conversation

user202729
Copy link
Contributor

@user202729 user202729 commented Dec 16, 2024

As discussed in https://gist.github.com/user202729/52b0c7134ea34f78a4416cd19e28e578#checking-the-code-is-indeed-interruptable , the current doctest of SageMath that tests sage: alarm(0.5); f() only checks whether f can be interrupted within 10 minutes or whatever the doctest time limit is — which is not particularly useful.

With this change, if f doesn't get interrupted within 0.2 second of alarm() fired, a test failure will be reported.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Copy link

github-actions bot commented Dec 16, 2024

Documentation preview for this PR (built with commit 5f3eafa; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729 user202729 force-pushed the fix-alarm-tests branch 2 times, most recently from cf96408 to 5b6e04d Compare December 16, 2024 13:44
@tobiasdiez
Copy link
Contributor

There are a few failing tests. Some of them, I've seen before (randomly, like the one in qqbar) but others seem to be new.

@user202729
Copy link
Contributor Author

user202729 commented Dec 18, 2024

On make-based build system, there's only the missing warning one.

Edit: this warning sometimes appear and sometimes not. The easiest way to test is probably to suppress the warning.

On conda… the only new one I can see is in doctest/util.py. It's about the function takes way too long compared to expected.

Edit: sometimes it takes shorter than expected instead https://github.com/sagemath/sage/actions/runs/12404238773/job/34629083816?pr=39142 . Need to investigate.

It is true that sometimes the alarm()/cancel_alarm() takes an unreasonably long time though, I can even reproduce that locally (surely it should only takes < 0.001s because it's all I/O, if it takes longer there's probably some sort of deadlock somewhere which is worth investigating.)

meson has been failing everywhere nowadays. (presumably #39139 which should be merged soon)


I got most of the tests to pass, except a few that uses Python time.sleep() on Mac. Looks like it's sort of a common issue https://travis-ci.community/t/sleep-functions-are-not-accurate-on-macos/6122

@user202729 user202729 marked this pull request as draft December 19, 2024 08:55
@user202729 user202729 marked this pull request as ready for review December 20, 2024 03:36
@user202729
Copy link
Contributor Author

user202729 commented Dec 20, 2024

This should be ready for review now.

The (fixed) test failures on Mac is weird, but my best guess is context switching (as explained in the comment).

The framework would be useful for e.g. #39075 (to determine how much buffer time is needed when test computer is too fast), so it would be helpful if someone can review this one quickly.

@tobiasdiez
Copy link
Contributor

tobiasdiez commented Dec 20, 2024

The changes look good to me. Do you have an idea where the (random?) timeout error in the ci is coming from?

@user202729
Copy link
Contributor Author

user202729 commented Dec 20, 2024

No idea actually. But then, before the patch alarm() has been spuriously failing sometimes (with nobody have any idea why it happens) so it's probably not caused by this PR.

This time around it is triggered by PyThread_acquire_lock_timed though (… longjmp out of Python function leads to deadlock on some non-reentrant lock…? Which sort of make sense because the function is not just a single C library function call)

2024-12-20T03:22:35.4891017Z sage -t --long --warn-long 30.0 --random-seed=286735480429121101562228604801325644303 src/sage/rings/polynomial/polynomial_element.pyx
2024-12-20T03:32:32.9307435Z ##[error]/sage/local/var/lib/sage/venv-python3.10/lib/python3.10/site-packages/cysignals/signals.cpython-310-x86_64-linux-gnu.so(+0x93d4)[0x7fbcce8c53d4]
/sage/local/var/lib/sage/venv-python3.10/lib/python3.10/site-packages/cysignals/signals.cpython-310-x86_64-linux-gnu.so(+0x9496)[0x7fbcce8c5496]
/sage/local/var/lib/sage/venv-python3.10/lib/python3.10/site-packages/cysignals/signals.cpython-310-x86_64-linux-gnu.so(+0xc881)[0x7fbcce8c8881]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fbccfe44520]
/lib/x86_64-linux-gnu/libc.so.6(+0x91117)[0x7fbccfe93117]
/lib/x86_64-linux-gnu/libc.so.6(+0x9cc78)[0x7fbccfe9ec78]
python3(PyThread_acquire_lock_timed+0xa0)[0x55ac6f158f00]
python3(+0x1a1151)[0x55ac6f1ad151]
python3(+0x18c549)[0x55ac6f198549]
python3(_PyEval_EvalFrameDefault+0x818)[0x55ac6f181ae8]

...

python3(+0x22cf02)[0x55ac6f238f02]
python3(_PyEval_EvalFrameDefault+0x6b6c)[0x55ac6f187e3c]
python3(_PyFunction_Vec
2024-12-20T03:32:32.9331746Z ----------------------------------------------------------------------
2024-12-20T03:32:32.9332946Z sage -t --long --warn-long 30.0 --random-seed=286735480429121101562228604801325644303 src/sage/rings/polynomial/polynomial_element.pyx  # Timed out
2024-12-20T03:32:32.9334003Z ----------------------------------------------------------------------

Also the random truncation.

@tobiasdiez
Copy link
Contributor

Okay, the same error indeed happens also elsewhere and I've opened to keep track of it #39183

Could you please add a sentence or two to the developer guide, otherwise it looks good to go from my side.

@user202729
Copy link
Contributor Author

user202729 commented Dec 22, 2024

@tobiasdiez The problem is the current best place to put it is https://cysignals.readthedocs.io/en/latest/sigadvanced.html so… need to ask e.g. @dimpase I suppose.

Just have 1-2 sentence mentioning the function name should be sufficient. I hope the docstring is descriptive enough.

I suggest

However, the test above may still pass if the function takes several seconds or minutes to be interrupted, as long as it does not timeout the doctest. For this purpose, SageMath provides the convenient wrapper sage.doctest.util.ensure_interruptible_after() context manager, which checks the code exits quickly after being interrupted, and also gives descriptive error message on test failure.

@tobiasdiez
Copy link
Contributor

Sorry, should have been more precise. I meant adding a short remark here https://doc.sagemath.org/html/en/developer/coding_basics.html#writing-testable-examples, explaining how to test that a method can be interrupted using the helper you introduced.

@vbraun
Copy link
Member

vbraun commented Jan 1, 2025

I'm getting various test failures, e.g.

Running doctests with ID 2025-01-01-20-22-21-0a589553.
Git branch: develop
Git ref: 10.6.beta2-98-g14bee1e6830
Running with SAGE_LOCAL='/home/release/Sage/local' and SAGE_VENV='/home/release/Sage/local/var/lib/sage/venv-python3.12.5'
Using --optional=fedora,pip,sage,sage_spkg
Features to be detected: 4ti2,SAGE_SRC,benzene,bliss,buckygen,conway_polynomials,coxeter3,csdp,cvxopt,cvxopt,database_cremona_ellcurve,database_cremona_mini_ellcurve,database_cubic_hecke,database_ellcurves,database_graphs,database_jones_numfield,database_knotinfo,dot2tex,dvipng,ecm,fpylll,fricas,gap_package_atlasrep,gap_package_design,gap_package_grape,gap_package_guava,gap_package_hap,gap_package_polenta,gap_package_polycyclic,gap_package_qpa,gap_package_quagroup,gfan,giac,glucose,graphviz,imagemagick,info,ipython,jmol,jupymake,jupyter_sphinx,kenzo,kissat,latte_int,lrcalc_python,lrslib,mathics,matroid_database,mcqd,meataxe,mpmath,msolve,nauty,networkx,numpy,palp,pandoc,pdf2svg,pdftocairo,pexpect,phitigra,pillow,plantri,polytopes_db,polytopes_db_4d,pplpy,primecountpy,ptyprocess,pycosat,pycryptosat,pynormaliz,pyparsing,python_igraph,requests,rpy2,rubiks,sage.combinat,sage.geometry.polyhedron,sage.graphs,sage.groups,sage.libs.braiding,sage.libs.ecl,sage.libs.flint,sage.libs.gap,sage.libs.giac,sage.libs.homfly,sage.libs.linbox,sage.libs.m4ri,sage.libs.ntl,sage.libs.pari,sage.libs.singular,sage.misc.cython,sage.modular,sage.modules,sage.numerical.mip,sage.plot,sage.rings.complex_double,sage.rings.finite_rings,sage.rings.function_field,sage.rings.number_field,sage.rings.padics,sage.rings.polynomial.pbori,sage.rings.real_double,sage.rings.real_mpfr,sage.sat,sage.schemes,sage.symbolic,sage_numerical_backends_coin,sagemath_doc_html,scipy,singular,sirocco,sloane_database,sphinx,symengine_py,sympy,tdlib,threejs,topcom
Doctesting 1 file.
sage -t --long --warn-long 30.0 --random-seed=123 src/sage/doctest/util.py
**********************************************************************
File "src/sage/doctest/util.py", line 844, in sage.doctest.util.inaccuracy_tolerance
Failed example:
    with ensure_interruptible_after(2) as data: interruptible_sleep(1)
Expected:
    Traceback (most recent call last):
    ...
    RuntimeError: Function terminates early after 1.00... < 2.0000 seconds
Got:
    <BLANKLINE>
    Traceback (most recent call last):
      File "/home/release/Sage/src/sage/doctest/forker.py", line 716, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/release/Sage/src/sage/doctest/forker.py", line 1137, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.doctest.util.inaccuracy_tolerance[9]>", line 1, in <module>
        with ensure_interruptible_after(Integer(2)) as data: interruptible_sleep(Integer(1))
      File "/home/release/Sage/local/var/lib/sage/venv-python3.12.5/lib/python3.12/contextlib.py", line 144, in __exit__
        next(self.gen)
      File "/home/release/Sage/src/sage/doctest/util.py", line 899, in ensure_interruptible_after
        raise RuntimeError(f"Function terminates early after {elapsed:.4f} < {seconds:.4f} seconds")
    RuntimeError: Function terminates early after 1.1018 < 2.0000 seconds
**********************************************************************
File "src/sage/doctest/util.py", line 848, in sage.doctest.util.inaccuracy_tolerance
Failed example:
    with ensure_interruptible_after(1) as data: uninterruptible_sleep(2)
Expected:
    Traceback (most recent call last):
    ...
    RuntimeError: Function is not interruptible within 1.0000 seconds, only after 2.00... seconds
Got:
    <BLANKLINE>
    Traceback (most recent call last):
      File "/home/release/Sage/src/sage/doctest/util.py", line 880, in ensure_interruptible_after
        yield data
      File "<doctest sage.doctest.util.inaccuracy_tolerance[10]>", line 1, in <module>
        with ensure_interruptible_after(Integer(1)) as data: uninterruptible_sleep(Integer(2))
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "src/cysignals/signals.pyx", line 341, in cysignals.signals.python_check_interrupt
    cysignals.signals.AlarmInterrupt
    <BLANKLINE>
    During handling of the above exception, another exception occurred:
    <BLANKLINE>
    Traceback (most recent call last):
      File "/home/release/Sage/src/sage/doctest/forker.py", line 716, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/release/Sage/src/sage/doctest/forker.py", line 1137, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.doctest.util.inaccuracy_tolerance[10]>", line 1, in <module>
        with ensure_interruptible_after(Integer(1)) as data: uninterruptible_sleep(Integer(2))
      File "/home/release/Sage/local/var/lib/sage/venv-python3.12.5/lib/python3.12/contextlib.py", line 158, in __exit__
        self.gen.throw(value)
      File "/home/release/Sage/src/sage/doctest/util.py", line 891, in ensure_interruptible_after
        raise RuntimeError(
    RuntimeError: Function is not interruptible within 1.0000 seconds, only after 2.1039 seconds
**********************************************************************
File "src/sage/doctest/util.py", line 852, in sage.doctest.util.inaccuracy_tolerance
Failed example:
    data  # abs tol 0.01
Expected:
    {'alarm_raised': True, 'elapsed': 2.0}
Got:
    {'alarm_raised': True, 'elapsed': 2.103935956954956}
Tolerance exceeded:
    2.0 vs 2.103935956954956, tolerance 2e-1 > 1e-2
**********************************************************************
File "src/sage/doctest/util.py", line 854, in sage.doctest.util.inaccuracy_tolerance
Failed example:
    with ensure_interruptible_after(1): uninterruptible_sleep(2); raise RuntimeError
Expected:
    Traceback (most recent call last):
    ...
    RuntimeError: Function is not interruptible within 1.0000 seconds, only after 2.00... seconds
Got:
    <BLANKLINE>
    Traceback (most recent call last):
      File "/home/release/Sage/src/sage/doctest/util.py", line 880, in ensure_interruptible_after
        yield data
      File "<doctest sage.doctest.util.inaccuracy_tolerance[12]>", line 1, in <module>
        with ensure_interruptible_after(Integer(1)): uninterruptible_sleep(Integer(2)); raise RuntimeError
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "src/cysignals/signals.pyx", line 341, in cysignals.signals.python_check_interrupt
    cysignals.signals.AlarmInterrupt
    <BLANKLINE>
    During handling of the above exception, another exception occurred:
    <BLANKLINE>
    Traceback (most recent call last):
      File "/home/release/Sage/src/sage/doctest/forker.py", line 716, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/release/Sage/src/sage/doctest/forker.py", line 1137, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.doctest.util.inaccuracy_tolerance[12]>", line 1, in <module>
        with ensure_interruptible_after(Integer(1)): uninterruptible_sleep(Integer(2)); raise RuntimeError
      File "/home/release/Sage/local/var/lib/sage/venv-python3.12.5/lib/python3.12/contextlib.py", line 158, in __exit__
        self.gen.throw(value)
      File "/home/release/Sage/src/sage/doctest/util.py", line 891, in ensure_interruptible_after
        raise RuntimeError(
    RuntimeError: Function is not interruptible within 1.0000 seconds, only after 2.1019 seconds
**********************************************************************
File "src/sage/doctest/util.py", line 858, in sage.doctest.util.inaccuracy_tolerance
Failed example:
    data  # abs tol 0.01
Expected:
    {'alarm_raised': True, 'elapsed': 2.0}
Got:
    {'alarm_raised': True, 'elapsed': 2.103935956954956}
Tolerance exceeded:
    2.0 vs 2.103935956954956, tolerance 2e-1 > 1e-2
**********************************************************************
1 item had failures:
   5 of  17 in sage.doctest.util.inaccuracy_tolerance
    [186 tests, 5 failures, 13.03s wall]
----------------------------------------------------------------------
sage -t --long --warn-long 30.0 --random-seed=123 src/sage/doctest/util.py  # 5 doctests failed
----------------------------------------------------------------------

@user202729
Copy link
Contributor Author

user202729 commented Jan 2, 2025

What a headache. Evidently I can't reproduce the thing in CI machine or locally, but I wonder if it has anything to do with sagemath/cysignals#215 . (Entirely possible?)

On the other hand if that is the case, maybe it would be common for real uses of the function to have extraneous delay — reason to increase default tolerance.

@user202729
Copy link
Contributor Author

user202729 commented Jan 25, 2025

This test is failing on Python 3.12

        Check that large multiplications can be interrupted::

            sage: # needs sage.schemes
            sage: P = E([2,1,1])
            sage: alarm(0.001); 2^(10^8) * P
            Traceback (most recent call last):
            ...
            AlarmInterrupt

        Verify that cysignals correctly detects that the above
        exception has been handled::

            sage: # needs sage.schemes
            sage: from cysignals.tests import print_sig_occurred
            sage: print_sig_occurred()
            No current exception

Apparently it's because of cysignals using PyObject* instead of weakref, so that if there's a cyclic reference anywhere in the exception object, it will never be collected. (see also sagemath/cysignals#215 )

I think this can be handled on Python side python/cpython#129276 , but even without that, we can just workaround it.


In vbraun 's traceback above, Python 3.12 was used — I reproduced it locally (likely because of Sage Integer object destructor), and confirmed that the new changes fixed it. I think this should be good now.

@user202729 user202729 marked this pull request as ready for review January 25, 2025 04:37
@user202729 user202729 marked this pull request as draft January 25, 2025 04:44
@user202729
Copy link
Contributor Author

----------------------------------------------------------------------
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/doc/en/thematic_tutorials/group_theory.rst  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/combinat/designs/incidence_structures.py  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/graphs/generic_graph.py  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/groups/finitely_presented_named.py  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/groups/libgap_wrapper.pyx  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/groups/perm_gps/permgroup.py  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/groups/perm_gps/permgroup_morphism.py  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/topology/simplicial_complex.py  # Timed out
sage -t --warn-long 5.0 --random-seed=245664474828316314806669892272039920381 src/sage/topology/simplicial_complex_examples.py  # Timed out
----------------------------------------------------------------------

Something went wrong here… but some of the files was not touched at all by this pull request.

@user202729 user202729 marked this pull request as ready for review January 25, 2025 07:19
@user202729
Copy link
Contributor Author

user202729 commented Jan 25, 2025

Looks like the timed out tests don't always fail… without further evidence the most likely situation is probably that the probability of the test failing remains the same.

(side note: I accidentally cherry-picked #39275 in, reverted now.)

@user202729 user202729 requested a review from tobiasdiez January 25, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants