-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck in setup #65
Comments
Hi thank you for the issue 👍 This happens when the job resource is modified from other operators or in some way. That's mostly true if you see the diff hash log/event.
If the job resource is not the same as it was before, we delete the job and recreate it. Because jobs can't be patched and so we delete and recreate it. The first issue
happens when the setup job is found but the name of the setup container inside the job is not shopware-setup. You can't modify the name, because it's hard coded in the code so I guess the issue is a timing issue with the api and the operator. We can fix this in the operator by ignoring fields for the hash generation. So if you can tell me what get's attached after the job is created, I can fix it. Annotations should be fine, but labels for example would trigger a recreate of the job. |
Thanks for your reply. I got the definitions of the resources. I cannot see that any other operator would modify the job. It is deleted almost instantly after creation (aprox. 4-5 sec). The created Pod is also being terminated almost instantly as the Job is deleted. apiVersion: batch/v1
kind: Job
metadata:
annotations:
batch.kubernetes.io/job-tracking: ""
shopware.com/last-config-hash: 24b308ee1d48a04d07293a69ed85e4f4
creationTimestamp: "2024-11-27T18:29:00Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2024-11-27T18:29:00Z"
finalizers:
- foregroundDeletion
generation: 2
labels:
store: my-shop
type: setup
name: my-shop-setup
namespace: my-shop
ownerReferences:
- apiVersion: shop.shopware.com/v1
blockOwnerDeletion: true
controller: true
kind: Store
name: my-shop
uid: 0cd29409-5b25-4e5e-ac71-89e86a48dc7b
resourceVersion: "513749357"
uid: 87de84cf-7c20-42d9-b5b1-c20202fb4cf4
spec:
backoffLimit: 6
completionMode: NonIndexed
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 87de84cf-7c20-42d9-b5b1-c20202fb4cf4
suspend: false
template:
metadata:
creationTimestamp: null
labels:
controller-uid: 87de84cf-7c20-42d9-b5b1-c20202fb4cf4
job-name: my-shop-setup
store: my-shop
type: setup
spec:
containers:
... apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-11-27T18:33:51Z"
deletionGracePeriodSeconds: 30
deletionTimestamp: "2024-11-27T18:34:21Z"
generateName: my-shop-setup-
labels:
controller-uid: 2c455c57-04c6-4657-8d3a-ef259f660179
job-name: my-shop-setup
store: my-shop
type: setup
name: my-shop-setup-hd8rl
namespace: my-shop
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: my-shop-setup
uid: 2c455c57-04c6-4657-8d3a-ef259f660179
resourceVersion: "513751839"
uid: 22f56eb5-a237-4119-8f3a-094767a66f45
spec:
containers:
- args:
- /setup
command:
- sh
- -c
env:
...
image: ghcr.io/shopware/shopware-kubernetes:latest
imagePullPolicy: IfNotPresent
name: shopware-setup
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/caddy
name: my-shop-caddy-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-nd4gj
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ...
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
shareProcessNamespace: true
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: my-shop-caddy-config
name: my-shop-caddy-config
- name: kube-api-access-nd4gj
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-11-27T18:33:51Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-11-27T18:33:51Z"
message: 'containers with unready status: [shopware-setup]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-11-27T18:33:51Z"
message: 'containers with unready status: [shopware-setup]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-11-27T18:33:51Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: ghcr.io/shopware/shopware-kubernetes:latest
imageID: ""
lastState: {}
name: shopware-setup
ready: false
restartCount: 0
started: false
state:
waiting:
reason: ContainerCreating
hostIP: ...
phase: Pending
qosClass: BestEffort
startTime: "2024-11-27T18:33:51Z" |
Updated shopware operator logs in the issue description |
The operator is changing the setup job because the resource is modified in some way. I can't see anything directly what cause this problem, but I will focus on the sub issue to make it visible. Sorry for that. |
Just for clearification, a local cluster without the operator? Which chart are you referring to? I'm not able to find one where the operator isn't used. Thanks for your help in advance :) |
Hi, we provider a helm-chart which you can use for testing. Can you maybe give us some more information about the current environment you are running the workload in? |
Exactly this helm chart is used to deploy the shop (incl. operator) and it was setup according to docs (see values in issue description). Environment is a k3s, self hosted. Furthermore I want to use the s3 backend provided by rook-ceph, which is used as storage backend, as I don't want to deploy multiple services providing a s3 (e. g. minio) |
Thank you, I will try to recreate the issue in k3s. |
Hi I am still on trying to recreate the issue. Are the secrets that are used for the store generated by other operators or did you create them manually? |
So I did manage to let the k3s setup run on my local machine and unfortunately it worked like expected. I used minio in this example but set the credentials by a test values yaml file:
After some debugging and testing my first assumption was wrong, the job resource doesn't get changed because that is also not possible for most of the values.
If you change for example some env variables for the store resource, the hash is different and that's when the job gets deleted and then created. Why this is happening I can only guess. Important would be that we are talking of the same helm-chart and the same operator. Operator image of the pod One Idea to see if something is changing the resource you can watch the resources like this:
Make sure to scale the shopware operator deployment down to 0, because the operator is changing the status of the resource from time to time. And if you run this command:
you should see different hashes each time the job get's created. This is not tested because it's working locally for me but should confirm my suspicions. |
Thanks for your support. I can provide the following answers:
The secret for the store is created manually.
I can confirm that the same image is being used.
When the operator is not running, there are no changes to the store resource.
The hash of the job is the same while it's being recreated multiple times:
|
Ok for me it's a bit strange, because the calculation of the hash looks good. So the crd is not getting changed at all, which is good. We pushed a new image which logs the reason why the object will be patched by the operator:
If you change the deployment of the operator to this image you should get more logs like this one: On thing to make sure that the operator is deleting the job would be, that you remove the permission for the deletion of the operator. Then we should also see an error if the operator is trying to delete it. The object |
I collected the following logs with the new image.
|
Ok found it, the
This is why it gets deleted and recreated. The Annotation |
Can confirm that it works with the added annotation. (The annotation has to be added to the store crd directly, its currently missing in the helm chart template.) |
This #68 will help for setting it only for the setup/migration job. This will be in the next release of the operator. Thanks for debugging this issue with me 👍 |
Steps to reproduce
Deployment of a Shop through shopware helm chart
What should happened?
The setup job should pass
What actually happened?
The setup job is instantly deleted (and the pod killed) after creation, leaving the operator stuck in the setup stage. Repeating every 10 seconds
Relevant log output
(Namespace, deployment name and URLs are changed)
Your custom resource
The text was updated successfully, but these errors were encountered: