-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory usage by k8s-dqlite #196
Comments
Hi there, Same here, on my 3 node experimental cluster with 16GB ram per node, while running barely any workloads the memory usage for dqlite is 6.9% of total memory - 1.1 GB for a 12MB database? Commands used (note: by the time I ran this it was 7.0% - 4th column of the output of
|
@jcjveraa to be clear, my issue is not about high memory usage, but about a continuous memory usage increase, suggesting a memory leak. |
Yes same here, in the meantime I’ve switched to k3s in high availability (etdc) mode, with the same workload, and memory consumption is completely flat. For me “my workloads + microk8s” was certainly the cause for the memory leak, and I suspect it to be in dqlite. |
Hi @sbidoul, @jcjveraa and @svetlak0f, Thank you for reporting your issue with us. While the behavior you are observing looks like a memory leak it is actually a consequence of a configuration in Dqlite. In a nutshell, this has to do with the amount of transactions that Dqlite caches in memory. Would you be able to try a workaround for this issue? There exists a tuning.yaml file which can be placed in the dqlite directory
Please restart the k8s-dqlite service with Please let us know if this workaround helps you! |
Hello @louiseschmidtgen, Thanks a lot for looking into this! When I try that procedure, I observe a important surge in CPU usage: (1,2,3) is the addition of tuning.yaml with |
We are also seeing high memory usage, especially noticable on very small clusters. The memory consumed by dqlite is a lot larger than the disk size of /var/snap/microk8s/current/var/kubernetes/backend. ~6.3GB of memory usage and a lot less on disk.
We are also seeing high CPU usage after setting trailing to a lower value. |
Hi @sbidoul, Thanks for testing the workaround. Generally, the snapshot.threshold parameter needs to be adjusted to twice the trailing (or 4 times). Best regards, |
Hi @louiseschmidtgen, Thank you very much for your help with this issue, I also tested this workaround, but unfortunately, it doesnot seem to have resolved the problem.
|
Although I can see the continuous growth of VmRSS, before I complied the k8s-dqlite part of hack/dynamic-dqlite.sh with asan support and did not see any memory leaks during runtime (pls refer to asan result for more details - https://paste.ubuntu.com/p/bVmdHq9tVR/), but today when I tried to enable asan support for sqlite3 part and then compiled it with the command 'make dynamic',
then I saw these memory leaks WARNING - https://paste.ubuntu.com/p/R55HsqhYTB/
but I checked the code /root/k8s-dqlite/hack/.build/dynamic/sqlite/tool/lemon.c, it has the following code comment 'Just leak it', so it feels like this leak was intentional
|
Hi @zhhuabj, Thank you for helping us debug the issue. Lemon is the parsing tool for SQLite, the We're continuing to look into a mitigation on our end. Thank you for your patience and contribution! |
Hello @sbidoul, @jcjveraa, @jnugh, @zhhuabj and @hartyporpoise, Thank you all for your contributions to the issue. We will update the snapshot default configurations for the next release. Currently, the default snapshot configuration is 1024 for the threshold and 8192 for trailing which are values that are quite large for small clusters. In the mean time, I would recommend setting smaller values in the tuning configuration, such as:
... Or trailing 1024, threshold 512, or your custom parameter combination. Setting only the trailing parameter sets the threshold to 0 leading to the CPU issue mentioned in this issue. Ensure that any combination used for the tuning should have trailing > threshold. See attached a sample mem/cpu usage for different configurations on an idle microk8s cluster recorded over 20 minutes. I would appreciate your feedback on the tuning configuration options as a mitigation for your issues. All the best, |
Hello,
Since a few weeks (sorry I cannot be more precise), I notice an increased memory utilization over time on dqlite nodes.
Memory usage on the node over a week looks like this, and I can attribute the increase in RAM use to the k8s-dqlite process.
The cluster has been on 1.29.x since early 2024, but the leak only started to manifest itself recently, presumably following an automatic minor snap update.
Is there anything I can do to help diagnosing this?
The text was updated successfully, but these errors were encountered: