Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove polling loop for job finishing event processing #15811

Open
wants to merge 4 commits into
base: feature_indirect-host-counting
Choose a base branch
from

Conversation

AlanCoding
Copy link
Member

SUMMARY

From the same conversation that led to #15805

This tries to remove the one largest remaining liability I can think of with this feature. This had a polling loop that waiting for events to finish processing. It had a timeout, but still...

I can easily think of a situation where the callback receiver is down, or otherwise completely overwhelmed. That means that almost all jobs go into this polling loop, and that could result in saturating max workers, which leads to generalized failure.

It's somewhat unclear how often the post_run_hook will work effectively to process events, but from the dev environment:

tools_awx_1       | 2025-02-04 13:41:00,291 INFO     [-] awx.analytics.job_lifecycle job-6 finished processing 12 events, running save indirect host entries {"type": "job", "task_id": 6, "state": "finished processing 12 events, running save_indirect_host_entries", "work_unit_id": "awx12G4kJ7L9", "task_name": "test_indirect_host_counting JT: run_task.yml"}

It does seem to go directly into processing events.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

Copy link

codecov bot commented Feb 4, 2025

Codecov Report

Attention: Patch coverage is 62.50000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 75.25%. Comparing base (b2d887b) to head (9fa6579).
Report is 6 commits behind head on feature_indirect-host-counting.

✅ All tests successful. No failed tests found.

❌ Your project check has failed because the head coverage (75.25%) is below the target coverage (100.00%). You can increase the head coverage or adjust the target coverage.

@AlanCoding
Copy link
Member Author

The test failed for what seems like a very surprising reason. I'm guessing that it has to do with me adding additional gating before calling the task in events_processed_hook.

Consulting the logs, it did hit our logic inside of that for several jobs (ids 2, 3, and 12). For those jobs, it seems to have worked correctly.

So my theory is that those jobs hit the logic via the job control task, and that our particular test hit it via the callback receiver. If using the callback receiver, it's very possible that you're operating with old model data, so I pushed a commit to hopefully fix that.

Copy link

sonarqubecloud bot commented Feb 4, 2025

@AlanCoding
Copy link
Member Author

Now I'm thinking the problem might be related to #15780

really, we should be able to do anything about what we're doing without that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants