-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always set expiry on server resource in heartbeats #5008
Conversation
e630208
to
c35db3c
Compare
lib/srv/heartbeat.go
Outdated
@@ -313,6 +313,11 @@ func (h *Heartbeat) fetch() error { | |||
h.reset(HeartbeatStateInit) | |||
return trace.Wrap(err) | |||
} | |||
// Always set server expiry, no server resource should linger forever after | |||
// deletion. | |||
if server.Expiry().IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be logic internal to services.Server
? Why not add this to CheckAndSetDefaults
for a services.Server
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed through the implementations of GetServerInfo
but some do return references to internal values while others generate a new value instead, so there's a chance that this will race as the heartbeat's Run is executed in a separate goroutine.
Also, it looks like only the lib/kube/proxy#TLSServer.GetServerInfo
does not update the TTL - should it be done there instead since it has more context and can protect the state if necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @a-palchikov, didn't think about GetServerInfo
returning data with references to shared memory.
I tried making a deep-copy on the heartbeat side, but for some reason proto.Clone
fails.
I tried setting the expiry in services.Server.CheckAndSetDefaults
, but it broke a bunch of tests and also resulted in logs of plumbing changes to pass clockwork.Clock
in.
Changed the PR to only update GetServerInfo
in lib/kube/proxy
for now and set Expiry
like all the other servers do. At some point, I do want to centralize this logic, since all servers use the same constant TTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried making a deep-copy on the heartbeat side, but for some reason proto.Clone fails.
This might be related to this work. Which error did you get from proto.Clone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error was also reflect-related, but different.
I don't have it on-hand right now unfortunately.
c35db3c
to
adf8f27
Compare
Without this, deleted kube_services linger in the backend and show up as obsolete kubernetes clusters in tsh. Ideally, this TTL logic should be enforced centrally, but I'd like to fix the bug first, and do a larger refactoring later.
adf8f27
to
feb4ef0
Compare
Without this, deleted kube_services linger in the backend and show up as obsolete kubernetes clusters in tsh. Ideally, this TTL logic should be enforced centrally, but I'd like to fix the bug first, and do a larger refactoring later.
All servers (ssh/kube/app/db) should get cleaned up some time after
deletion. If a server implements
GetServerInfo
without populatingExpiry
, default expiry toServerTTL
in the heartbeat code.If we don't clean up deleted server objects, things like kube routing
will try to reach obsolete endpoints and users will forever see deleted
kube clusters in
tsh
.