-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate DRA job configs from a Jinja template #34010
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
7c3a83c
to
2f75bbd
Compare
d030275
to
c5999e8
Compare
[ci-node-e2e-cgrpv1-crio-dra] | ||
job_type = pr | ||
description = Runs E2E node tests for Dynamic Resource Allocation beta features with CRI-O using cgroup v1 | ||
cluster = k8s-infra-prow-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for not using the eks-prow-build-cluster
?
If not, then cluster
can go to DEFAULT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason is they are used this way in the current job. I'l get rid of cluster
variable and use eks-prow-build-cluster
for all jobs.
BTW, there is a difference in the kind jobs:
@@ -80,20 +74,15 @@
command:
- runner.sh
args:
- - /bin/bash
+ - /bin/sh
- -xc
- - |
- set -ex
- make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test"
- curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind
- kind build node-image --image=dra/node:latest .
- trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT
- # Which DRA features exist can change over time.
- features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') )
- echo "Enabling DRA feature(s): ${features[*]}."
- # Those additional features are not in kind.yaml, but they can be added at the end.
- kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; for feature in ${features}; do echo " ${feature}: true"; done) --image dra/node:latest
- KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation$(for feature in ${features}; do echo , ${feature}; done)} && !Flaky && !Slow"
+ - >
+ make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test" &&
+ curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind &&
+ kind build node-image --image=dra/node:latest . &&
+ trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT &&
+ kind create cluster --retain --config test/e2e/dra/kind.yaml --image dra/node:latest &&
+ KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=2h30m hack/ginkgo-e2e.sh -ginkgo.label-filter='Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky'
Is it possible to use the same arguments for both? If so, which one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unified that in #33993 with an if check:
test-infra/config/jobs/kubernetes/sig-node/dynamic-resource-allocation-canary.yaml
Lines 93 to 103 in 3942639
if ${with_all_features:-false}; then | |
# Which DRA features exist can change over time. | |
features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') ) | |
echo "Enabling DRA feature(s): ${features[*]}." | |
# Those additional features are not in kind.yaml, but they can be added at the end. | |
kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; for feature in ${features}; do echo " ${feature}: true"; done) --image dra/node:latest | |
KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation$(for feature in ${features}; do echo , ${feature}; done)} && !Flaky && !Slow" | |
else | |
kind create cluster --retain --config test/e2e/dra/kind.yaml --image dra/node:latest | |
KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=2h30m hack/ginkgo-e2e.sh -ginkgo.label-filter='Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky' | |
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, applied. PTAL.
# on a kind cluster with containerd updated to a version with CDI support. | ||
# | ||
# Compared to ci-kind-dra, this one enables all DRA-related features. | ||
[ci-kind-dra-all] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it so that we have common settings for normal periodics, normal presubmits, and canary presubmits?
There's still going to be a lot of duplication if we have to have three copies of this section and the ones below.
The same applies to the actual .jinja
template. The entries in the periodics
and presubmits
should be built from a single source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This makes sense. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Now gen.py generates 3 files: dynamic-resource-allocation-canary.yaml, dynamic-resource-allocation-pull.yaml and dynamic-resource-allocation-ci.yaml from dynamic-resource-allocation.conf and dynamic-resource-allocation.jinja
PTAL.
3259e4d
to
499379c
Compare
499379c
to
2e1e253
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very promising.
How to solve indention was my biggest concern when thinking about how to use Jinja. I am not sure whether this is addressed here (need to check test results).
# limitations under the License. | ||
|
||
.PHONY: generate | ||
generate-jobs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't match.
job_type = node | ||
description = Runs E2E node tests for Dynamic Resource Allocation beta features with CRI-O using cgroup v1 | ||
testgrid_dashboards = sig-node-cri-o, sig-node-dynamic-resource-allocation | ||
skip_report = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any job with skip_report = true
? I don't think this needs to be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ git grep -B5 'skip_report: true'
sig-node-presubmit.yaml- - name: pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
sig-node-presubmit.yaml- cluster: k8s-infra-prow-build
sig-node-presubmit.yaml- optional: true
sig-node-presubmit.yaml- always_run: false
sig-node-presubmit.yaml- run_if_changed: 'test/e2e/node/pod_resize.go|pkg/kubelet/kubelet.go|pkg/kubelet/kubelet_pods.go|pkg/kubelet/kuberuntime/kuberuntime_manager.go'
sig-node-presubmit.yaml: skip_report: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, "for our jobs". We should only make those things configurable which we need to be configurable - it'll be shorter and more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
testgrid_dashboards = sig-node-cri-o, sig-node-dynamic-resource-allocation | ||
skip_report = false | ||
image_config_file = /home/prow/go/src/k8s.io/test-infra/jobs/e2e_node/crio/latest/image-config-cgroupv1-serial.yaml | ||
inject_ssh_public_key = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: this can depend on the job type in the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can. not all presubmit jobs have this. It depends on a distro/image as far as I remember.
{%- if "containerd" in job_name %} | ||
{%- set testgrid_dashboards = testgrid_dashboards + ", sig-node-containerd" %} | ||
{%- endif %} | ||
- name: {{job_name}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So indention is the same for both periodic and presubmits?
The test bot seems to be stuck, but I suspect that a YAML linter would complain about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fortunately the indentation is the same for presubmits and periodics:
presubmits:
kubernetes/kubernetes:
- name: pull-kubernetes-e2e-containerd-gce
periodics:
# This jobs runs e2e.test with a focus on tests for the Dynamic Resource Allocation feature (currently beta)
# on a kind cluster with containerd updated to a version with CDI support.
- name: ci-kind-dra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if the lists were indented in the canonical way, it would be:
periodics:
- name: ci-kind-dra
YAML doesn't care, but there are stylecheckers which might.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I took that snipped from the existing yaml.
And CI doesn't complain about wrong indentation for this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. It runs yamllint, but that doesn't care, so we are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a last resort we can reindent in gen.py if it's really needed. It will be a little bit ugly though.
I suspect/hope that periodic and presubmit configs have the same indentation level in purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a YAML perspective, the nesting level is different.
I don't remember anymore where, but there are other jobs where the indention is different, which is very annoying when copy-pasting from presubmit to periodic or vice-versa. That made me think that it's enforced. It's not, so it indeed makes much more sense to use the same indention even if it's not "quite right" for periodics.
testgrid-tab-name: {{job_name}} | ||
description: {{description}} | ||
testgrid-alert-email: {{testgrid_alert_email}} | ||
fork-per-release: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canaries shouldn't get forked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,115 @@ | |||
{%- if beginning %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this file be moved into a templates
directory, as in kops?
When I look at the PR sidebar, I currently see four files with the identical dynamic-resource-allocation...
as name. Even if we shorten that to dra-
, keeping the source file separate would make it stand out more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it can be moved. Should I move .conf file as well?
Personally, I'd prefer flat structure with shorter names, e.g.
dra.conf
dra.jinja
dra-canary.yaml
dra-pull.yaml
dra-ci.yaml
And I hope that this approach can be used for all sig-node jobs and the final list of files will be something like this:
jobs.conf
jobs.jinja
jobs-canary.yaml
jobs-pull.yaml
jobs-ci.yaml
@pohly @kannon92 @SergeyKanzhelev @haircommander
Thank you. After fixing review comments, I'm going to remove -pull and -ci yamls from this PR, so we can only test -canary. I personally like it. Using it would allow us to
WDYT guys? |
4234630
to
28eda1b
Compare
d1abaa3
to
708901f
Compare
/retest |
@pohly @haircommander @kannon92 This PR is ready for review now. |
7298ac3
to
d3a621e
Compare
Can we make this PR complete (= generates everything) and then merge the generated canary jobs in advance via a second PR? The advantage of this approach is:
|
@pohly I was going to do it in 3 steps:
Would it work for you this way? |
I think two PRs as I had proposed is simpler. I'm not worried about breaking CI jobs: that has less impact than breaking a presubmit because only a few people will see the breakage. |
d3a621e
to
95b8375
Compare
aa54da3
to
cf7fa2d
Compare
cf7fa2d
to
2ed5a55
Compare
There's a genuine unit test failure:
|
If someone modifies the template locally and then submits only the updated canary YAML, it is impossible for others to review or replicate how they where generated. It may also be harder to verify that the changes for the canary jobs then get applied as tested to the actual jobs. I think I prefer the approach with @elieser1101 can be our first |
2ed5a55
to
1da3e45
Compare
spec: | ||
containers: | ||
- image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241230-3006692a6f-master | ||
- image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241218-d4b51bc3e8-master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the same image as on master.
- org: kubernetes | ||
repo: kubernetes | ||
base_ref: master | ||
path_alias: k8s.io/kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These extra_refs
are missing in the new generated CI jobs.
make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test" | ||
curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind | ||
kind build node-image --image=dra/node:latest . | ||
trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT | ||
# Which DRA features exist can change over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the comments?
make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test" | ||
curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind | ||
kind build node-image --image=dra/node:latest . | ||
trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT | ||
# Which DRA features exist can change over time. | ||
features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') ) | ||
echo "Enabling DRA feature(s): ${features[*]}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the debug output?
@@ -0,0 +1,120 @@ | |||
{%- if header %}{%- if kind == "ci" %}periodics:{%- else %}presubmits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this {%- if header %}
be dropped?
{%- endif %} | ||
{%- if job_type == "node" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{%- endif %} | |
{%- if job_type == "node" %} |
Redundant if check?
curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind | ||
kind build node-image --image=dra/node:latest . | ||
trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT | ||
{%- if kind == "canary" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this check "canary"?
This seems to overload the meaning of "canary":
- For canary PR jobs (= dra-canary.yaml).
- The "all features enabled" CI and presubmit jobs.
Let's use "canary" for "part of dra-canary.yaml" and something else for feature gates. How about "alpha"?
|
||
echo "Verifying generated jobs" | ||
hack/run-in-python-container.sh \ | ||
python3 hack/generate-jobs.py config/jobs/kubernetes/sig-node/*.conf --only-verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can start with only working on "our" (= SIG Node) jobs here for now. But in a future PR this should get extended to other generated jobs.
/cc @pohly @kannon92 @SergeyKanzhelev @haircommander