-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lagging follower commit after message drops #138
Comments
Isn't this check in place to ensure that followers' logs don't regress? Do we have any other safeguards for that? We don't check that during |
Yes. Currently, we rely on the fact that the follower's log is in sync with the leader's up to My proposal is: teach the follower to understand that it matches the leader's log. Then this safety check can be done at the follower end on receiving |
The test added in #137 demonstrates that a short network blip / message drop can lead to delaying the follower to learn that entries in its log are committed up to
HeartbeatInterval + 3/2 * RTT
. This is suboptimal, and can be reduced toHeartbeatInterval + 1/2 * RTT
.The reason why it takes an extra roundtrip is that the leader cuts the commit index sent to the follower at
Progress.Match
, i.e. it never sends a commit index exceeding the known match size of the follower's log. This is to ensure that the follower does not crash at receiving a commit index greater than its log size.However, the follower can safely process out-of-bounds commit indices by simply (with a caveat: see below) cutting them at the log size on its end. In fact, the follower already does so for the commit indices it receives via
MsgApp
messages. When sendingMsgApp
, the leader does not cut the commit index, correspondingly.Fix
Fixing this is a matter of:
(2) is true after the follower received at least one successful
MsgApp
from this leader. From that point, the follower's log is guaranteed to be a prefix of the leader's log, and it is safe to assume that any index <=Commit
is committed.Issue
However, simply doing (1) is unsafe in mixed-version clusters (during a rolling upgrade). If there are followers running old code, they will crash upon seeing a high commit index. To fix the problem completely, there must be a safe migration. Only after all the nodes are running the new code, it is safe to do (1).
Action Items
Alternative Fix
The same delay reduction can be achieved if the leader sends an empty
MsgApp
after theHeartbeatInterval
. We already have periodic emptyMsgApp
sends in a throttled state, although they are currently hooked in toMsgHeartbeatResp
messages (to ensureMsgApp
flow is conditional to some connectivity with the follower), which in this case will result in the same+1 RTT
delay.The text was updated successfully, but these errors were encountered: