-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an API for querying the last raft log entry #683
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #683 +/- ##
==========================================
+ Coverage 77.87% 77.89% +0.02%
==========================================
Files 197 197
Lines 27953 28066 +113
Branches 5534 5552 +18
==========================================
+ Hits 21768 21863 +95
- Misses 4346 4350 +4
- Partials 1839 1853 +14 ☔ View full report in Codecov by Sentry. |
47b7ea2
to
6546d54
Compare
Rather than fix the external libraft CI job, I am going to block this on a separate PR to make Other test failures seem to be nondeterministic and due to these numbers varying from test to test, which I need to dig into: |
d1f09c8
to
1b9e443
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this Cole, just a couple of questions.
test/integration/test_cluster.c
Outdated
* but we seem to get two of them pretty consistently in this test.) */ | ||
size_t fudge = n_records >= 2200 ? 6 : | ||
n_records >= 993 ? 3 : | ||
2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit brittle especially the part about the barriers which are non-deterministic which could lead to CI failures down the line. What about changing the assertion so that if n_records < 993 then, last_entry_index <= n_records + 3 && last_entry_index >= n_records
? (or something similar, the idea is to use an interval and not a single value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified this to use the calculated number just as a lower bound. Feels to me like writing something like last_entry_index <= n_records + CONSTANT
is not much better than picking an exact value because we don't have an explanation for the specific CONSTANT
; I would rather just have the assertion reflect our belief about the code i.e. "some number of barriers, we don't know how many".
src/raft/raft.c
Outdated
if (rv != 0) { | ||
return rv; | ||
} | ||
POST(ERGO(n_entries == 0, snapshot != NULL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to crash when raft_io_describe_last_entry
is called and there are no entries in the raft log + snapshots, correct? If that is the case, should this be a crash or an empty return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks, this could potentially be used with the data directory of a node that never got the chance to write a log entry to disk (when bootstrapping we always write the entry with the initial configuration, but joiner nodes can start with nothing at all). In this case we should probably return (0, 0)
since that will compare less than the tuple for any node that does have entries on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to publish one comment :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Updated to reflect all review comments |
This enables better automation of dqlite_node_recover_ext by exposing enough information to determine which of several nodes has the most up-to-date log. Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
We should gracefully handle the case where there are no snapshots and no entries on disk. Signed-off-by: Cole Miller <[email protected]>
2115606
to
8f012e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Cole, I reviewed the last three commits and everything is there. I only added one comment to set an upper bound for the assertions with the extra records. Also, I remember we talked about the term number not increasing, is that what is happening here? (only one node, term is not increased)
Yep. |
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
This enables better automation of dqlite_node_recover_ext by exposing enough information to determine which of several nodes has the most up-to-date log. See canonical/go-dqlite#299. Exposing this from libdqlite is the first step, then we can add a wrapper API in go-dqlite.
The implementation in this PR is as simple as calling the
load
method of the uv raft_io object. This is wasteful---it loads all snapshots and entries into memory---but that seems acceptable to me, since it's no more expensive than restarting the node normally and the use-case is an exceptional failure condition (loss of quorum).Signed-off-by: Cole Miller [email protected]