-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start instrumenting existing code with state machines #678
Start instrumenting existing code with state machines #678
Conversation
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
This fires on all our invocations of vsnprintf, but only with clang (gcc makes an exception for variadic functions presumably for just this reason). I think the value of this lint for us is not worth its price in \#pragma verbosity. Signed-off-by: Cole Miller <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #678 +/- ##
==========================================
+ Coverage 74.21% 81.07% +6.86%
==========================================
Files 195 197 +2
Lines 27738 29164 +1426
Branches 2794 4066 +1272
==========================================
+ Hits 20585 23644 +3059
+ Misses 4827 3875 -952
+ Partials 2326 1645 -681 ☔ View full report in Codecov by Sentry. |
Overloading the err_after_request_alloc label resulted in a potential uninitialized use of `j`. Signed-off-by: Cole Miller <[email protected]>
Pictures! Leader's view of executing a write transaction in a one-node cluster ( From top to bottom, we have the Follower's view of handling an AppendEntries ( At the top is the main Finally, a different view of AppendEntries handling on the follower's side. This one uses the fixture ( Here, the follower truncated an entry from its log because of the AppendEntries. You see the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks good to me. Just minor changes are needed.
EXEC_NR, | ||
}; | ||
|
||
static const struct sm_conf exec_states[EXEC_NR] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wanted to leave this code here to show that it's possible to define states in a more compact way:
#define _S(name, flags, allowed) \
[name] = { \
.flags = flags, \
.name = #name, \
.allowed = allowed \
}
_S(INIT, SM_INITIAL, BITS(INIT, DONE)),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I experimented with switching the state machine definitions over to this style, and it works well for some of them, but once you have more than a few allowed transitions out of some state the lines get long enough to need breaking, and at that point you're more or less back to the beginning. I've pushed a commit that rewrites the definitions in this style, let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not pushing for this style, but it's my personal preference.
src/raft/log.c
Outdated
if (ref == NULL && !collision) { | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial logic looks a bit different. !collision
looks to be a new element in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, the two return values are unnecessarily hard to reason about so I switched to using a non-NULL sentinel for the collision case, which allows hewing more closely to the original code structure.
Signed-off-by: Cole Miller <[email protected]>
realloc does not promise to return NULL when the size argument is 0. Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
EXEC_NR, | ||
}; | ||
|
||
static const struct sm_conf exec_states[EXEC_NR] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not pushing for this style, but it's my personal preference.
m->is_locked = is_locked; | ||
m->id = ++id; | ||
m->pid = getpid(); | ||
*m = (struct sm){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool as it's easier to grep out initialization with grep "(struct sm){"
.
if (rv != 0) { | ||
goto err; | ||
index = start; | ||
for (unsigned i = 0; i < n; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason of removing logAppendCommands()
helper is just cleaner code or I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a cleanup, yes.
if (n > 0) { | ||
struct raft_entry *entries; | ||
|
||
/* Create a new array of entries holding the non-truncated | ||
* entries */ | ||
entries = raft_malloc(n * sizeof *entries); | ||
if (entries == NULL) { | ||
return RAFT_NOMEM; | ||
} | ||
memcpy(entries, io->entries, n * sizeof *io->entries); | ||
|
||
/* Release any truncated entry */ | ||
if (io->entries != NULL) { | ||
size_t i; | ||
for (i = n; i < io->n; i++) { | ||
raft_free(io->entries[i].buf.base); | ||
} | ||
raft_free(io->entries); | ||
} | ||
io->entries = entries; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that if { }
statement goes to ioFlushTruncate()
. I wasn't able to identify where else {}
statement goes in the new code. Could you clarify this?
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
Signed-off-by: Cole Miller <[email protected]>
This PR introduces several state machines to track and instrument the behavior of existing dqlite code. Specifically, we instrument:
leader__exec
)COMMAND
entry to the raft log (raft_apply
)replicationAppend
)log.c
)UvAppend
)UvTruncate
)There are lots more things we could instrument but I think this is a good starting point: it gives us enough visibility to follow a write transaction over its whole lifecycle and down to the disk I/O level, on both the follower and the leader. (A big missing piece is linking the histories across nodes; that's nontrivial because our raft messages don't include any kind of ID or room for extensibility, although we could fake something in the I/O fixture.)
The tracking of in-memory log entries is the trickiest part of this PR. I was initially uncertain whether to attach SMs to individual log entries at all, but I found this served as a convenient "hub" to connect other state machines together (e.g. bridging the
raft_apply
andUvAppend
state machines), plus it gives a foothold for tracking how long it takes to apply each entry.I also added state machines for the append and truncate requests in the raft I/O fixture (
raft/fixture.c
). This was necessary because the code inraft/replication.c
now assumes that the I/O backend makes an SM available to callsm_relate
with. The SMs in the fixture are not copies of the ones in the uv I/O backend, but simpler ones.Finally, I made a few fixes and additions to the sm code.