-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix enqueueing of Minion jobs breaking PARALLEL_ONE_HOST_ONLY=1
#6048
Open
Martchus
wants to merge
2
commits into
os-autoinst:master
Choose a base branch
from
Martchus:restart-git-clone
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would create a minion job for every job, only cluster jobs would be grouped together, right?
That would be way too many jobs in some cases.
The detection for identical git_clone tasks is not perfect, especially if they are created quickly after each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this leads to more Minion jobs attempted to be enqueued but I was hoping the code for de-duplicating Minion jobs you have recently introduced will ensure that we don't have too many after all.
Note that there will still only be one enqueuing attempt per job cluster. Only if one specified multiple jobs IDs explicitly (e.g. using #5971 when it gets merged) we would have more enqueuing attempts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise we really needed a "preparing" state.
I suppose it would work like this:
Step 2 and 3 would happen in one transaction. Step 4 would also happen in one transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I suppose I can fix the scheduler first while we think what's best here. With the scheduler fixed we still have this race condition but at least job clusters aren't torn apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of simplicity we could also reduce the number of Minion jobs on job restarts by avoiding Minion jobs for
git_auto_update
. Of course thengit_auto_update
would rely only on the periodic updates for restarted jobs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand step 4. Maybe you can explain tomorrow after the daily?
OTOH we could try this PR and see how it works out in practice.
Theoretically we could find out if any new minion job is required by keeping track of the git directories in the
%git_clones
hash while iterating over the jobs to restart. I guess in most cases we will only have one CASEDIR/NEEDLES_DIR or DISTRI.Just that we would need to pass an additional
%git_clones_not_yet_enqueued
hash through from the toplevel job_restart, which doesn't sound nice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a requirement though. We did have a complaint from someone who restarted a job and was wondering why it wasn't using the updated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I guess this should be:
Probably the reason for the test failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed introducing a new initial state as alternative when estimating https://progress.opensuse.org/issues/169510.
I guess there would be another alternative: Restart all jobs in one transaction (and also create all the Minion jobs in one transaction). Since we'd just append to the job, job settings and minion jobs tables I don't think this would be problematic (so I don't think it would cause deadlocks/conflicts). Then this route would also get a nice "all or nothing" behavior.
Doing all in one transaction would probably be simpler than introducing a new state and it is essentially what we already do when scheduling products (where also many jobs are created). Of course the scheduling has an async mode where it runs as Minion job so it is a bit different.
Creating a new state means changing many places in the code. It is probably the less fragile approach, though (if implemented correctly / all bugs are ironed out).
Just merging this PR is of course also still on the table. I actually don't think it would be too problematic.
So I'm a bit torn here.