Etcd Unstable - Container spawning multiple PIDs, Context Deadline Exceeded warnings in log continuously #16704

ajayudayagiri-hpe · 2023-03-27T16:54:07Z

ajayudayagiri-hpe
Mar 27, 2023

What happened?

We have multiple multi-node baremetal clusters running on etcd v3.4.3 where etcd instability is observed on longevity(after 20-30 days).

The instability starts with Context Deadline Exceeded warnings in etcd logs as it is taking long time to respond to kubeapi server livez requests. This also resulting in restarting of kubeapi server due to continuous failures of liveness probes.

Jan 30 14:27:15 cap-5n-ibp-cz-1.aqurryeyna.com etcd[57471]: {"level":"warn","ts":"2023-01-30T14:27:15.261Z","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":12640906947059278029,"retry-timeout":"500ms"} Jan 30 14:27:15 cap-5n-ibp-cz-1.aqurryeyna.com etcd[57471]: {"level":"warn","ts":"2023-01-30T14:27:15.761Z","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":12640906947059278029,"retry-timeout":"500ms"} Jan 30 14:27:16 cap-5n-ibp-cz-1.aqurryeyna.com etcd[57471]: {"level":"warn","ts":"2023-01-30T14:27:16.261Z","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":12640906947059278029,"retry-timeout":"500ms"} Jan 30 14:27:16 cap-5n-ibp-cz-1.aqurryeyna.com etcd[57471]: {"level":"warn","ts":"2023-01-30T14:27:16.760Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.999967528s","expected-duration":"101ms","prefix":"read-only range ","request":"key:\"/registry/health\" ","response":"","error":"context deadline exceeded"}

In few days the frequency of Context Deadline Exceeded warnings in logs increases along with etcd restarting multiple times. This is leading to multiple etcd leader elections and soon cluster gets into complete unstable state.

Once the cluster gets into this state kubeapi server is not getting a chance to recover as etcd is continuously restarting. To understand the issue we have also checked the docker stats of etcd containers on the cluster. We have observed that PIDs count is too high around 50-70, while on a stable cluster they are around 10-15. It seems etcd is continuously spawning up multiple threads to form a cluster as it got into no leader state due to restarts on all nodes.

Below are the further details on the cluster

Setup Configuration

Cluster Size - 5
Master Nodes - 3
Member Nodes - 2
Kubernetes version - v1.18.20
Etcd version - v3.4.3

H/W of each node

80 CPUs - Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
512 GB RAM
10 Gig Network Adapter
4 TB SSD Storage

What did you expect to happen?

The expectation is etcd stays stable even with longevity.

How can we reproduce it (as minimally and precisely as possible)?

A multi-node etcd cluster with longevity of around 20-30 days will help to reproduce the issue.

Etcd version

$ etcdctl version
etcdctl version: 3.4.3
API version: 3.4

Etcd configuration

--advertise-client-urls
- --cert-file
- --client-cert-auth=true
- --data-dir=
- --experimental-initial-corrupt-check=true
- --initial-advertise-peer-urls
- --initial-cluster
- --key-file
- --listen-client-urls
- --listen-metrics-urls
- --listen-peer-urls
- --name=xx.xx.xx.xx
- --peer-cert-file
- --peer-client-cert-auth=true
- --peer-key-file
- --peer-trusted-ca-file
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --heartbeat-interval=500
- --election-timeout=5000

Etcd debug information

$ etcdctl member list -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.22.190.11:2379 | 2f8624a3bb271a9f |   3.4.3 |   61 MB |     false |      false |        11 |     144776 |             144776 |        |
| https://10.22.190.12:2379 | 85d5de1f24b7d164 |   3.4.3 |   60 MB |      true |      false |        11 |     144776 |             144776 |        |
| https://10.22.190.13:2379 | c19c3f206ef7bcd8 |  3.4.3 |   61 MB |     false |      false |        11 |     144776 |             144776 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

lavacat · 2023-03-27T18:45:39Z

lavacat
Mar 27, 2023
Collaborator

apply request took too long","took":"1.999967528s"

Seams to me that your cluster is getting overloaded.

Check etcd_server_proposals_pending metric - if constantly > 0, that indicates that etcd can't process all proposals fast enough.
Next check etcd_disk_wal_fsync_duration and etcd_network_peer_round_trip_time_seconds see How does a slow disk (ssd) influence on etcd performance? #14181 for reference.

I don't think this qualifies as a bug. Probably it's best to move to https://github.com/etcd-io/etcd/discussions

0 replies

serathius · 2023-03-28T08:32:30Z

serathius
Mar 28, 2023
Maintainer

Based on my experience I would guess this is an issue with ReadIndex requests being dropped during leader election. This was fixed in newer versions of etcd, so please upgrade to newest v3.4 version.

0 replies

ajayudayagiri-hpe · 2023-06-21T03:46:31Z

ajayudayagiri-hpe
Jun 21, 2023
Author

@serathius - We have upgraded to etcd v3.5.6. However, the issue is still seen. Any idea on this ?

0 replies

ajayudayagiri-hpe · 2023-06-21T03:53:34Z

ajayudayagiri-hpe
Jun 21, 2023
Author

@lavacat - We have checked the metrics and observations are as follows -

etcd_server_proposals_pending - It is mostly 0.
etcd_disk_wal_fsync_duration - this is sometimes below 10ms and sometimes more than 20ms. To understand the disk performance, we have used fio as suggested in IBM Blog. As the blog says 99th percentile of fdatasync should be less than 10ms, in our tests it is always less than 4ms.

We are seeing this issue on a single-node cluster too. Please let me know if any other metrics would help to understand the issue.

2 replies

jmhbnz Oct 7, 2023
Maintainer

Hey @ajayudayagiri-hpe - This issue is quite dated so please let us know if you are still encountering it or if you managed to resolve it?

If still occurring a further metric you could explore would be in relation to cpu contention / starvation. Please refer to: https://etcd.io/docs/v3.4/faq/#what-does-the-etcd-warning-apply-entries-took-too-long-mean

ajayudayagiri-hpe Oct 9, 2023
Author

@jmhbnz - This issue is still present. We have allocated 2 CPUs to etcd pod and CPU starvation was not observed during these Context deadline warnings.

fuweid · 2023-10-07T14:01:46Z

fuweid
Oct 7, 2023
Collaborator

To understand the issue we have also checked the docker stats of etcd containers on the cluster. We have observed that PIDs count is too high around 50-70, while on a stable cluster they are around 10-15.

Did you check the IO pressure (iostat -dmx 1) or (/proc/loadavg)?

80 CPUs - Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz

Since you have 80 cpus, if the etcd server has pressure somehow, the go runtime might clone threads to consume the requests

2 replies

ajayudayagiri-hpe Oct 9, 2023
Author

@fuweid - We checked IO stats and CPU stats when etcd pids are high in count, no resource usage spike/pressure is observed.

fuweid Oct 9, 2023
Collaborator

@ajayudayagiri-hpe Is there any cpu throttling event?

The throttled_time is in cpu/cpu.stat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etcd Unstable - Container spawning multiple PIDs, Context Deadline Exceeded warnings in log continuously #16704

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Etcd Unstable - Container spawning multiple PIDs, Context Deadline Exceeded warnings in log continuously #16704

ajayudayagiri-hpe Mar 27, 2023

What happened?

Setup Configuration

H/W of each node

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Etcd version

Etcd configuration

Etcd debug information

Replies: 5 comments · 4 replies

lavacat Mar 27, 2023 Collaborator

serathius Mar 28, 2023 Maintainer

ajayudayagiri-hpe Jun 21, 2023 Author

ajayudayagiri-hpe Jun 21, 2023 Author

jmhbnz Oct 7, 2023 Maintainer

ajayudayagiri-hpe Oct 9, 2023 Author

fuweid Oct 7, 2023 Collaborator

ajayudayagiri-hpe Oct 9, 2023 Author

fuweid Oct 9, 2023 Collaborator

ajayudayagiri-hpe
Mar 27, 2023

Replies: 5 comments 4 replies

lavacat
Mar 27, 2023
Collaborator

serathius
Mar 28, 2023
Maintainer

ajayudayagiri-hpe
Jun 21, 2023
Author

ajayudayagiri-hpe
Jun 21, 2023
Author

jmhbnz Oct 7, 2023
Maintainer

ajayudayagiri-hpe Oct 9, 2023
Author

fuweid
Oct 7, 2023
Collaborator

ajayudayagiri-hpe Oct 9, 2023
Author

fuweid Oct 9, 2023
Collaborator