Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump timeout in machine tests #25058

Merged
merged 3 commits into from
Jan 21, 2025
Merged

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jan 20, 2025

Does this PR introduce a user-facing change?

None

Copy link
Contributor

openshift-ci bot commented Jan 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2025
@mheon
Copy link
Member

mheon commented Jan 20, 2025

I think two of the jobs still timed out... Unless Context canceled is something else?

The test pulls a big disk image every time which is slow. I see no good
way around that. Let's try to use /dev/null as image as we do not have
to run the VM at all and just can pass a NOP file to make the init
command happy.

That pull of that image seems to take over 2m so we safe quite a lot.
Also update the matcher for the slice. BeTrue() produces horrible
errors.

Signed-off-by: Paul Holzinger <[email protected]>
The regex match would return a horrible error message and is way more
complicated then it should be. Simply check that .exe is not part of the
output.

Signed-off-by: Paul Holzinger <[email protected]>
We see a ton of timeouts in bot the applehv and libkrun machine tests.
It seems 35m are no longer enough. I was not able to spot anything that
would explain why it increased all of the sudden as such I hope this is
enough.

Fixes containers#25057

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member Author

Luap99 commented Jan 20, 2025

I think two of the jobs still timed out... Unless Context canceled is something else?

Yeah "Context canceled" is different, maybe even more concerning. This can be if the connecting to the test machine was lost, i.e. reboot/shutdown. Or a new force push would trigger the same thing I think as that cancel already running tasks.

No idea why this happen there, let's hope it is a one of.

@mheon
Copy link
Member

mheon commented Jan 21, 2025 via email

@Luap99
Copy link
Member Author

Luap99 commented Jan 21, 2025

Just as reference the one test passed in 40min we had 35 min timeout so it seems totally valid to say we need a bigger timeout.

@baude @l0rd @ashley-cui PTAL

@l0rd
Copy link
Member

l0rd commented Jan 21, 2025

LGTM, in Cirrus, is there a URL with all the runs of a given job? It would be interesting to figure out when the machine on Mac jobs started to last more than 35 minutes.

@Luap99
Copy link
Member Author

Luap99 commented Jan 21, 2025

LGTM, in Cirrus, is there a URL with all the runs of a given job? It would be interesting to figure out when the machine on Mac jobs started to last more than 35 minutes.

please merge this given it is the second LGTM to avoid other PRs from suffering any longer

There is no such URL to my knowledge, you need to scrape all tasks from the cirrus API.
And guess what we had somebody do that work https://www.edsantiago.com/cirrus-timing-history/podman.html
But of course that is no longer maintained.

@l0rd
Copy link
Member

l0rd commented Jan 21, 2025

/lgtm

There is no such URL to my knowledge, you need to scrape all tasks from the cirrus API.
And guess what we had somebody do that work https://www.edsantiago.com/cirrus-timing-history/podman.html
But of course that is no longer maintained.

Ok thanks

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit e5b6382 into containers:main Jan 21, 2025
88 of 89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. machine release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants