Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows conformance test failed with case "Service endpoints latency [It] should not be very high" #6892

Closed
wenyingd opened this issue Dec 31, 2024 · 1 comment · Fixed by #6906
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@wenyingd
Copy link
Contributor

Describe the bug

It is observed that test case "Service endpoints latency [It] should not be very high" may fail because the required image "registry.k8s.io/e2e-test-images/pause:3.10" was not pulled on the Windows.

These logs were observed in the logs,

[sig-network] Service endpoints latency [It] should not be very high [Conformance] [sig-network, Conformance]
k8s.io/kubernetes/test/e2e/network/service_latency.go:59

  Timeline >>
  STEP: Creating a kubernetes client @ 12/31/24 03:03:32.102
  I1231 03:03:32.102282 2213844 util.go:499] >>> kubeConfig: /var/lib/jenkins/kube.conf
  STEP: Building a namespace api object, basename svc-latency @ 12/31/24 03:03:32.103
  STEP: Waiting for a default service account to be provisioned in namespace @ 12/31/24 03:03:32.114
  STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 12/31/24 03:03:32.116
  I1231 03:03:32.118427 2213844 util.go:499] >>> kubeConfig: /var/lib/jenkins/kube.conf
  STEP: creating replication controller svc-latency-rc in namespace svc-latency-3125 @ 12/31/24 03:03:32.119
  I1231 03:03:32.124935 2213844 runners.go:193] Created replication controller with name: svc-latency-rc, namespace: svc-latency-3125, replica count: 1
  I1231 03:03:33.176231 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:03:34.176371 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:03:35.177395 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:03:36.177666 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:03:37.177915 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:03:38.178087 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:03:39.178259 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
...
  I1231 03:08:33.261756 2213844 runners.go:193] svc-latency-rc Pods: 1 out of 1 created, 0 running, 1 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
  I1231 03:08:33.266231 2213844 runners.go:193] Pod svc-latency-rc-8t75d	a-tapmw-win-1	Pending	<nil>
  [FAILED] in [It] - k8s.io/kubernetes/test/e2e/network/service_latency.go:105 @ 12/31/24 03:08:33.266
  I1231 03:08:33.266640 2213844 helper.go:122] Waiting up to 7m0s for all (but 0) nodes to be ready
  STEP: dump namespace information after failure @ 12/31/24 03:08:33.27
  STEP: Collecting events from namespace "svc-latency-3125". @ 12/31/24 03:08:33.27
  STEP: Found 6 events. @ 12/31/24 03:08:33.273
  I1231 03:08:33.273346 2213844 dump.go:53] At 2024-12-31 03:03:32 +0000 UTC - event for svc-latency-rc: {replication-controller } SuccessfulCreate: Created pod: svc-latency-rc-8t75d
  I1231 03:08:33.273360 2213844 dump.go:53] At 2024-12-31 03:03:32 +0000 UTC - event for svc-latency-rc-8t75d: {default-scheduler } Scheduled: Successfully assigned svc-latency-3125/svc-latency-rc-8t75d to a-tapmw-win-1
  I1231 03:08:33.273370 2213844 dump.go:53] At 2024-12-31 03:03:32 +0000 UTC - event for svc-latency-rc-8t75d: {AntreaPodConfigurator } NetworkReady: Installed Pod network forwarding rules
  I1231 03:08:33.273380 2213844 dump.go:53] At 2024-12-31 03:03:33 +0000 UTC - event for svc-latency-rc-8t75d: {kubelet a-tapmw-win-1} Pulling: Pulling image "registry.k8s.io/e2e-test-images/pause:3.10"
  I1231 03:08:33.273389 2213844 dump.go:53] At 2024-12-31 03:03:34 +0000 UTC - event for svc-latency-rc-8t75d: {kubelet a-tapmw-win-1} Failed: Error: ErrImagePull
  I1231 03:08:33.273398 2213844 dump.go:53] At 2024-12-31 03:03:34 +0000 UTC - event for svc-latency-rc-8t75d: {kubelet a-tapmw-win-1} BackOff: Back-off pulling image "registry.k8s.io/e2e-test-images/pause:3.10"
  I1231 03:08:33.278485 2213844 resource.go:168] POD                   NODE           PHASE    GRACE  CONDITIONS
  I1231 03:08:33.278548 2213844 resource.go:175] svc-latency-rc-8t75d  a-tapmw-win-1  Pending         [{PodReadyToStartContainers True 0001-01-01 00:00:00 +0000 UTC 2024-12-31 03:03:34 +0000 UTC  } {Initialized True 0001-01-01 00:00:00 +0000 UTC 2024-12-31 03:03:32 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2024-12-31 03:03:32 +0000 UTC ContainersNotReady containers with unready status: [svc-latency-rc]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2024-12-31 03:03:32 +0000 UTC ContainersNotReady containers with unready status: [svc-latency-rc]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2024-12-31 03:03:32 +0000 UTC  }]
  I1231 03:08:33.278559 2213844 resource.go:178] 
  I1231 03:08:33.281216 2213844 dump.go:109] 

After checking the CI logs, we also observed these errors at the beginning,

time="2024-11-07T23:37:02-08:00" level=info msg="trying next host" error="failed to do request: Head \"[https://registry.k8s.io/v2/e2e-test-images/agnhost/manifests/2.52\](https://registry.k8s.io/v2/e2e-test-images/agnhost/manifests/2.52/)": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: " host=registry.k8s.io
ctr: failed to resolve reference "registry.k8s.io/e2e-test-images/agnhost:2.52": failed to do request: Head "https://registry.k8s.io/v2/e2e-test-images/agnhost/manifests/2.52": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: 
time="2024-11-07T23:37:02-08:00" level=warning msg="DEPRECATION: The `tracing` property of `[plugins.\"io.containerd.internal.v1\".tracing]` is deprecated since containerd v1.6 and will be removed in containerd v2.0.Use OTEL environment variables instead: https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/"
time="2024-11-07T23:37:02-08:00" level=warning msg="DEPRECATION: The `mirrors` property of `[plugins.\"io.containerd.grpc.v1.cri\".registry]` is deprecated since containerd v1.5 and will be removed in containerd v2.0. Use `config_path` instead."
time="2024-11-07T23:37:02-08:00" level=info msg="trying next host" error="failed to do request: Head \"[https://registry.k8s.io/v2/e2e-test-images/jessie-dnsutils/manifests/1.5\](https://registry.k8s.io/v2/e2e-test-images/jessie-dnsutils/manifests/1.5/)": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: " host=registry.k8s.io
ctr: failed to resolve reference "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5": failed to do request: Head "https://registry.k8s.io/v2/e2e-test-images/jessie-dnsutils/manifests/1.5": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: 
time="2024-11-07T23:37:03-08:00" level=warning msg="DEPRECATION: The `tracing` property of `[plugins.\"io.containerd.internal.v1\".tracing]` is deprecated since containerd v1.6 and will be removed in containerd v2.0.Use OTEL environment variables instead: https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/"
time="2024-11-07T23:37:03-08:00" level=warning msg="DEPRECATION: The `mirrors` property of `[plugins.\"io.containerd.grpc.v1.cri\".registry]` is deprecated since containerd v1.5 and will be removed in containerd v2.0. Use `config_path` instead."
time="2024-11-07T23:37:03-08:00" level=info msg="trying next host" error="failed to do request: Head \"[https://registry.k8s.io/v2/e2e-test-images/nginx/manifests/1.14-2\](https://registry.k8s.io/v2/e2e-test-images/nginx/manifests/1.14-2/)": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: " host=registry.k8s.io
ctr: failed to resolve reference "registry.k8s.io/e2e-test-images/nginx:1.14-2": failed to do request: Head "https://registry.k8s.io/v2/e2e-test-images/nginx/manifests/1.14-2": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: 
time="2024-11-07T23:37:03-08:00" level=warning msg="DEPRECATION: The `mirrors` property of `[plugins.\"io.containerd.grpc.v1.cri\".registry]` is deprecated since containerd v1.5 and will be removed in containerd v2.0. Use `config_path` instead."
time="2024-11-07T23:37:03-08:00" level=warning msg="DEPRECATION: The `tracing` property of `[plugins.\"io.containerd.internal.v1\".tracing]` is deprecated since containerd v1.6 and will be removed in containerd v2.0.Use OTEL environment variables instead: https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/"
time="2024-11-07T23:37:03-08:00" level=info msg="trying next host" error="failed to do request: Head \"[https://registry.k8s.io/v2/pause/manifests/3.10\](https://registry.k8s.io/v2/pause/manifests/3.10/)": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: " host=registry.k8s.io
ctr: failed to resolve reference "registry.k8s.io/pause:3.10": failed to do request: Head "https://registry.k8s.io/v2/pause/manifests/3.10": tls: failed to verify certificate: x509: certificate has expired or is not yet valid:

links:
http://10.164.243.223/view/Windows/job/antrea-windows-conformance-for-pull-request/56/consoleFull
http://10.164.243.223/view/Windows/job/antrea-windows-conformance-for-pull-request/59/consoleFull

To Reproduce

Expected

Actual behavior

Versions:

Additional context

@wenyingd wenyingd added the kind/bug Categorizes issue or PR as related to a bug. label Dec 31, 2024
@XinShuYang
Copy link
Contributor

In windows CI script, we first pull the specified repository image and then retag it for use in conformance testing. Even if the initial pull fails, most images can still be successfully pulled from the new repo during the test, except for registry.k8s.io/e2e-test-images/pause:3.10. Therefore, we need to avoid the first image pull from failing.

After investigating, I found that the failure is caused by Windows time synchronization issues. The root cause is that when the time in the Windows snapshot differs significantly from the current time, PowerShell time synchronization fails. As a short-term fix, updating the snapshot resolves the issue. However, for a permanent solution, we need a new patch to address the Windows time synchronization problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants