Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sp.tl.identify_interactions fails on Windows due to joblib issue in Github action tests #19

Open
mgbckr opened this issue Jan 25, 2024 · 3 comments
Labels
fix Need to be fixed in the future

Comments

@mgbckr
Copy link
Collaborator

mgbckr commented Jan 25, 2024

This happens in the tests triggered via Github actions (Windows only!) ... maybe this does not happen in real life? Should be tested on a separate Windows machine.

Here are the exception details:

================================== FAILURES ===================================
____________________ test_5_distance_permutation_analysis _____________________
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "D:\a\SAP\SAP\.tox\default\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 60, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The paging file is too small for this operation to complete.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\a\SAP\SAP\.tox\default\lib\site-packages\joblib\externals\loky\process_executor.py", line 426, in _process_worker
            adata = sc.read(processed_path / "adata_nn_demo_annotated_cn.h5ad")
            adata
    
            # %% [markdown]
            # ## 5.1 Identify potential interactions
    
            # %%
>           distance_pvals = sp.tl.identify_interactions(
                adata=adata,
                id="index",
                x_pos="x",
                y_pos="y",
                cell_type="celltype",
                region="unique_region",
                num_iterations=100,
                num_cores=10,
                min_observed=10,
                comparison="condition",
            )

tests\test_5_distance_permutation_analysis.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox\default\lib\site-packages\spacec\tools\_general.py:2206: in identify_interactions
    iterative_triangulation_distances = tl_iterate_tri_distances(
.tox\default\lib\site-packages\spacec\tools\_general.py:1892: in tl_iterate_tri_distances
    results = Parallel(n_jobs=num_cores)(
.tox\default\lib\site-packages\joblib\parallel.py:1952: in __call__
    return output if self.return_generator else list(output)
.tox\default\lib\site-packages\joblib\parallel.py:1595: in _get_outputs
    yield from self._retrieve()
.tox\default\lib\site-packages\joblib\parallel.py:1699: in _retrieve
    self._raise_error_fast()
.tox\default\lib\site-packages\joblib\parallel.py:1734: in _raise_error_fast
    error_job.get_result(self.timeout)
.tox\default\lib\site-packages\joblib\parallel.py:736: in get_result
    return self._return_or_raise()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <joblib.parallel.BatchCompletionCallBack object at 0x000001ED293F3AF0>

    def _return_or_raise(self):
        try:
            if self.status == TASK_ERROR:
>               raise self._result
E               joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

.tox\default\lib\site-packages\joblib\parallel.py:754: BrokenProcessPool
---------------------------- Captured stdout call -----------------------------
Computing for observed distances between cell types!
This function expects integer values for xy coordinates.
x and y will be changed to integer. Please check the generated output!
Save triangulation distances output to anndata.uns triDist
Permuting data labels to obtain the randomly distributed distances!
this step can take awhile
---------------------------- Captured stderr call -----------------------------
2024-01-25 22:30:53.916040: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:30:53.916149: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:30:54.153379: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:30:54.153416: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:28.487845: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:28.487946: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:28.671059: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:28.671096: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:29.818491: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:29.818528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:29.893667: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:29.893774: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:31.209286: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:31.209322: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:31.376889: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:31.376925: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-01-25 22:31:31.[38](https://github.com/yuqiyuqitan/SAP/actions/runs/7661196014/job/20880130326#step:6:39)3336: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-25 22:31:31.383[42](https://github.com/yuqiyuqitan/SAP/actions/runs/7661196014/job/20880130326#step:6:43)8: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
@mgbckr mgbckr added the bug Something isn't working label Jan 25, 2024
@mgbckr
Copy link
Collaborator Author

mgbckr commented Jan 25, 2024

Some other people having that issue mention reducing the number of workers pointing towards and actual memory issue in Github actions: Spandan-Madan/Pytorch_fine_tuning_Tutorial#10

@mgbckr mgbckr changed the title sp.tl.identify_interactions fails on Windows due to joblib issue sp.tl.identify_interactions fails on Windows due to joblib issue in Github action tests Jan 25, 2024
@TKempchen
Copy link
Collaborator

@yuqiyuqitan we can test this on the Stanford Windows server

@mgbckr mgbckr added this to the Publication milestone Jan 30, 2024
@mgbckr mgbckr added fix Need to be fixed in the future and removed bug Something isn't working labels Feb 18, 2024
@mgbckr
Copy link
Collaborator Author

mgbckr commented Feb 18, 2024

Note: This only seems to occur for Python 3.8 and 3.9. Happens for all Python versions.

This seems to be a joblib issue under Windows.
We fixed this for now by disabling parallelism in test_5_distance_permutation_analysis.py:test_5_distance_permutation_analysis: (45b5b3b)

        distance_pvals = sp.tl.identify_interactions(
            adata=adata,
            id="index",
            x_pos="x",
            y_pos="y",
            cell_type="celltype",
            region="unique_region",
            num_iterations=100,
            num_cores=1, # <----------------------- here
            # TODO: Windows runner on Github Actions hat trouble with multi-core.
            #       We should test with multiple cores, too.
            # num_cores=10,
            min_observed=10,
            comparison="condition",
        )

I am leaving this issue open for now since it still needs to be tested whether this is an issue with Github Actions or generally with Windows.

@mgbckr mgbckr removed this from the Publication milestone Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Need to be fixed in the future
Projects
None yet
Development

No branches or pull requests

2 participants