Bump timeout in machine tests #25058

Luap99 · 2025-01-20T17:55:46Z

Does this PR introduce a user-facing change?

None

openshift-ci · 2025-01-20T17:55:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mheon · 2025-01-20T18:40:00Z

I think two of the jobs still timed out... Unless Context canceled is something else?

The test pulls a big disk image every time which is slow. I see no good way around that. Let's try to use /dev/null as image as we do not have to run the VM at all and just can pass a NOP file to make the init command happy. That pull of that image seems to take over 2m so we safe quite a lot. Also update the matcher for the slice. BeTrue() produces horrible errors. Signed-off-by: Paul Holzinger <[email protected]>

The regex match would return a horrible error message and is way more complicated then it should be. Simply check that .exe is not part of the output. Signed-off-by: Paul Holzinger <[email protected]>

We see a ton of timeouts in bot the applehv and libkrun machine tests. It seems 35m are no longer enough. I was not able to spot anything that would explain why it increased all of the sudden as such I hope this is enough. Fixes containers#25057 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2025-01-20T18:52:33Z

I think two of the jobs still timed out... Unless Context canceled is something else?

Yeah "Context canceled" is different, maybe even more concerning. This can be if the connecting to the test machine was lost, i.e. reboot/shutdown. Or a new force push would trigger the same thing I think as that cancel already running tasks.

No idea why this happen there, let's hope it is a one of.

mheon · 2025-01-21T12:20:20Z

LGTM, we are seeing timeouts so this will definitely help.

…

On Mon, Jan 20, 2025 at 13:52 Paul Holzinger ***@***.***> wrote: I think two of the jobs still timed out... Unless Context canceled is something else? Yeah "Context canceled" is different, maybe even more concerning. This can be if the connecting to the test machine was lost, i.e. reboot/shutdown. Or a new force push would trigger the same thing I think as that cancel already running tasks. No idea why this happen there, let's hope it is a one of. — Reply to this email directly, view it on GitHub <#25058 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCDMH6HFQRRPTPS76CD2LVAYRAVCNFSM6AAAAABVQ4HJOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBTGA3TGMBTGY> . You are receiving this because you commented.Message ID: ***@***.***>

Luap99 · 2025-01-21T12:26:49Z

Just as reference the one test passed in 40min we had 35 min timeout so it seems totally valid to say we need a bigger timeout.

@baude @l0rd @ashley-cui PTAL

l0rd · 2025-01-21T13:03:56Z

LGTM, in Cirrus, is there a URL with all the runs of a given job? It would be interesting to figure out when the machine on Mac jobs started to last more than 35 minutes.

Luap99 · 2025-01-21T13:12:01Z

LGTM, in Cirrus, is there a URL with all the runs of a given job? It would be interesting to figure out when the machine on Mac jobs started to last more than 35 minutes.

please merge this given it is the second LGTM to avoid other PRs from suffering any longer

There is no such URL to my knowledge, you need to scrape all tasks from the cirrus API.
And guess what we had somebody do that work https://www.edsantiago.com/cirrus-timing-history/podman.html
But of course that is no longer maintained.

l0rd · 2025-01-21T13:15:38Z

/lgtm

There is no such URL to my knowledge, you need to scrape all tasks from the cirrus API.
And guess what we had somebody do that work https://www.edsantiago.com/cirrus-timing-history/podman.html
But of course that is no longer maintained.

Ok thanks

openshift-ci bot added the release-note-none label Jan 20, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2025

github-actions bot added the machine label Jan 20, 2025

Luap99 added 3 commits January 20, 2025 19:50

pkg/machine/e2e: improve podman.exe match

bdc195d

The regex match would return a horrible error message and is way more complicated then it should be. Simply check that .exe is not part of the output. Signed-off-by: Paul Holzinger <[email protected]>

Luap99 force-pushed the machine-test branch from 67dbb42 to 6ee51c5 Compare January 20, 2025 18:50

openshift-ci bot assigned l0rd Jan 21, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2025

openshift-merge-bot bot merged commit e5b6382 into containers:main Jan 21, 2025
88 of 89 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump timeout in machine tests #25058

Bump timeout in machine tests #25058

Luap99 commented Jan 20, 2025

openshift-ci bot commented Jan 20, 2025

mheon commented Jan 20, 2025

Luap99 commented Jan 20, 2025

mheon commented Jan 21, 2025 via email

Luap99 commented Jan 21, 2025

l0rd commented Jan 21, 2025

Luap99 commented Jan 21, 2025

l0rd commented Jan 21, 2025

Bump timeout in machine tests #25058

Bump timeout in machine tests #25058

Conversation

Luap99 commented Jan 20, 2025

Does this PR introduce a user-facing change?

openshift-ci bot commented Jan 20, 2025

mheon commented Jan 20, 2025

Luap99 commented Jan 20, 2025

mheon commented Jan 21, 2025 via email

Luap99 commented Jan 21, 2025

l0rd commented Jan 21, 2025

Luap99 commented Jan 21, 2025

l0rd commented Jan 21, 2025