Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write operations that immediately follow write operations sometimes cause a disk I/O-error, followed by loss of leadership and high latency #522

Open
fbrandherm opened this issue Jun 13, 2023 · 3 comments
Labels
Bug Confirmed to be a bug Incomplete Waiting on more information from reporter

Comments

@fbrandherm
Copy link

I am using dqlite (version 1.14) for an internal project and I observed some unexpected behavior in my benchmarks (on localhost): If I rapidly spam write-operations (INSERT OR REPLACE INTO kv_table (KEY, VALUE) VALUES (?,?);, using request type 8 of the wire protocol), there are some random latency spikes (see picture) that do not appear, if I wait 1ms between requests. What happens is that these outlier requests return SQLite's "disk I/O error", and retrying the request returns "not leader" for some time. I suspect what happens is that this bug triggers a leader election. The files are on a ramdisk and I cannot reproduce the bug if the files are on an SSD, so the bug is probably timing-related.

dqlite-io-erros
Regarding the plot: blue dots are 100 write operations on node 1, red dots are 100 read-operations on node 2 (1st red dot is a leadership transfer to node 2). There were 3 voting nodes in the cluster.

@MathieuBordere
Copy link
Contributor

Can you share your code to reproduce this?

@fbrandherm
Copy link
Author

Sorry, but I can't share the full code since it's a large project that uses DQLite as a backend behind a lot of other logic and isn't open sourced (yet). I'm sure it could be reproduced by much simpler code, but I don't have the time to implement a simple demo reproducing the bug until the end of the month. I should note however, that my code is using a custom client implemented in C++, which could also make a difference.

@MathieuBordere
Copy link
Contributor

No problem, we'll try to reproduce this.

@MathieuBordere MathieuBordere added Bug Confirmed to be a bug Incomplete Waiting on more information from reporter labels Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug Incomplete Waiting on more information from reporter
Projects
None yet
Development

No branches or pull requests

2 participants