Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conda logs have many empty lines #228

Open
jameslamb opened this issue Jan 16, 2025 · 7 comments
Open

conda logs have many empty lines #228

jameslamb opened this issue Jan 16, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@jameslamb
Copy link
Member

Description

For example, see https://github.com/rapidsai/cuml/actions/runs/12799823619/job/35687977337?pr=6227

Image

That small screenshot shows 7 lines worth of information taking up 40 lines of space. Sometimes the ratio is even worse:

Image

We'd attempted to fix this by setting quiet: true in the conda config in CI images (rapidsai/build-planning#126, #217).

But had to revert that because that seemed to suppress Python exception names when conda operations failed, which we suspect broke the retrying mechanism we use to retry conda operations (#220, gha-tools / tools / rapids-conda-retry).

As a result of that reversion, this logging issue is still a problem adding friction to development (example: rapidsai/cuml#6224 (comment)).

Reproducible Example

See any recent CI runs across RAPIDS that perform a {conda, mamba} env create or {conda, mamba} env install.

example: https://github.com/rapidsai/cuml/actions/runs/12799823619/job/35687977337?pr=6227

Notes

It'd be helpful to find the root cause of this, and ideally a reproducible example that isn't RAPIDS-specific (in case this is an issue in conda / mamba or some library they depend on).

Any solution for RAPIDS should:

@jakirkham
Copy link
Member

While we are rolling out consolidated installs ( rapidsai/build-planning#22 ), which fixes some install flakiness that we see in different places (plus cuts down on test install time), we can add a -q flag to the install. This is what I have done so far

Should add would welcome other folks to also make these PRs. Think anyone on build-infra could help move this forward

@jameslamb
Copy link
Member Author

Consolidating installs is great and has its own standalone value. I'm 100% convinced in the value of doing that.

But I don't think that directly addresses the issue of "CI logs are full of hundreds of empty lines from progress bars".

And from #220 and the discussion that led to it, I think we're not confident that adding -q / --quiet is what we want here, because it seemed to be suppressing the Python exception names that this retry mechanism relies on:

https://github.com/rapidsai/gha-tools/blob/0558ffce255e4e7da5d5312e79f35dd81e444144/tools/rapids-conda-retry#L82

This would benefit from a minimal, reproducible example and some investigation to validate that, which is part of what this issue tracks.

@jakirkham
Copy link
Member

That is what the -q flag does solve. We use it in conda-forge frequently

Wasn't around when this discussion occurred. Would appreciate having some examples and context for why we think -q doesn't work: #220 (comment)

Should add we have been using -q in conda-forge for many years now. So am less inclined to believe -q is doing something wrong and more inclined to believe there is something in our own tooling that needs an update. Though could be convinced with good examples

@jameslamb
Copy link
Member Author

Right, I understand. This is why I said:

This would benefit from a minimal, reproducible example and some investigation to validate that, which is part of what this issue tracks.

@jakirkham
Copy link
Member

Yep just hard to create minimal viable reproducer without even an example. So for now will wait until someone who saw these issues can share more information

@bdice
Copy link
Contributor

bdice commented Jan 16, 2025

@pentschev noticed quite a few of these errors in the ucxx CI. Here are links to a few errors where Python exceptions were not shown, and we instead got cryptic messages.

Response ended prematurely:

Unexpected error 9 on netlink descriptor *. (Note that * can be any integer, like 5, 10 or 11 in the links below):

error libmamba File not valid: SHA256 sum doesn't match expectation "PATH" (where PATH can be any conda package path, such as /opt/conda/pkgs/xorg-libxdmcp-1.1.5-h57736b2_0.conda and /opt/conda/pkgs/lz4-4.3.3-py310h80b8a69_2.conda in the links below):

error libmamba Error opening for reading "/opt/conda/pkgs/locket-1.0.0-pyhd8ed1ab_0/info/index.json": No such file or directory:

Notice that the logs in these links are indeed very quiet! They hide a lot of information that we normally would see. We really just want to turn off the progress bars.

@jameslamb
Copy link
Member Author

I recently saw similar issues to what @bdice mentioned above, but in this repo:

One other random thought (haven't investigated this).... I wonder if some of these things are related to the fact that we're wrapping conda / mamba?

rapids-otel-wrap does some input/output redirection around conda / `mamba:

https://github.com/rapidsai/gha-tools/blob/0558ffce255e4e7da5d5312e79f35dd81e444144/tools/rapids-otel-wrap#L55-L59

And so I wonder if that could be causing some of these issues? For example, Unexpected error 9 on netlink descriptor * can happen when multiple threads try to close the same file descriptor at the same time:

Maybe libmamba is trying to unconditionally close its connection to stdout, and failing in these ways when it's not there any more because that's already been closed as part of the rapids-otel-wrap command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants