Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#148] Update liveness probes #164

Merged
merged 9 commits into from
Jan 14, 2025
Merged

[#148] Update liveness probes #164

merged 9 commits into from
Jan 14, 2025

Conversation

SilviaAmAm
Copy link
Collaborator

Fixes #148

@SilviaAmAm SilviaAmAm marked this pull request as draft January 10, 2025 14:42
@SilviaAmAm SilviaAmAm force-pushed the feature/148-liveness-probes branch 2 times, most recently from f484d0e to d1f1bfe Compare January 10, 2025 14:49
@SilviaAmAm SilviaAmAm marked this pull request as ready for review January 10, 2025 14:51
@alextreme
Copy link
Member

Is to be verified first on our test-cluster before approval/merging

@Tiho0135
Copy link

Yes, I agree with the testing on our cluster. I will try to do it.

@SilviaAmAm
Copy link
Collaborator Author

SilviaAmAm commented Jan 10, 2025

Deployed on our test cluster:

  • Objecten
  • OAB
  • Open Forms
  • OIP
  • Open Klant
  • Open notificaties
  • Openzaak

I tested adding the livenessProbe to the worker with periodSeconds: 1 and failureThreshold: 1 and then killing redis (rabbitmq for ON) and seeing if the worker pod is restarted.

@alextreme
Copy link
Member

I attempted to resolve the merge conflicts however the linter failed, PR can be merged after resolving this properly

OAB has long running tasks (>2h), so only the celery active probe is added
The object API has a celerty task limit of 5 min (soft), 15 min (hard). So no ping was added.
Open forms has task limits (5 (soft), 15 (hard) minutes) so no ping was added
OIP has a hard task limit of 15 min, so no ping probe was added
Open Zaak has one long running tasks of which we don't know how long it
will take, so no ping probe was added.

The task: https://github.com/open-zaak/open-zaak/blob/70449f11a31a0e8a06d9e47d49e095b31d1068f6/src/openzaak/components/documenten/tasks.py#L333
@SilviaAmAm SilviaAmAm force-pushed the feature/148-liveness-probes branch from 3bf9fb2 to e7e49ed Compare January 14, 2025 09:46
@SilviaAmAm
Copy link
Collaborator Author

I will investigate!

@SilviaAmAm
Copy link
Collaborator Author

The problem was an issue in the bitnami charts (bitnami/charts#31348). It is now resolved.

@SilviaAmAm SilviaAmAm merged commit b451956 into main Jan 14, 2025
1 check passed
- /app/bin/check_celery_worker_liveness.py
- /bin/sh
- -c
- celery --workdir src --app openforms.celery inspect --destination celery@${HOSTNAME} active
Copy link
Member

@sergei-maertens sergei-maertens Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too naive - the worker host name can be set via envvars and is especially useful when using different queues (short lived / prio / long-running tasks). The celery worker script in OF has:

QUEUE=${1:-${CELERY_WORKER_QUEUE:=celery}}
WORKER_NAME=${2:-${CELERY_WORKER_NAME:="${QUEUE}"@%n}}

which indeed defaults to celery@$host, but is not guaranteed to be that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Celery worker fix liveness check
4 participants