-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat[BMQ]: Add an admin command to remove a domain #541
base: main
Are you sure you want to change the base?
Conversation
a7cb475
to
d760dbb
Compare
fcd0589
to
40eb300
Compare
035193e
to
bc06ff4
Compare
434b216
to
12e54ee
Compare
3f686e0
to
bd5cec4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only substantial change we need to make is to clear the domain resolver cache in the second phase as well (since loading the domain will repopulate the cache).
Great!
@@ -73,6 +73,14 @@ struct CommandDefinition { | |||
"Clear the domain resolution cache entry of the optionally specified " | |||
"'domain', or clear all domain resolution cache entries if 'ALL' is " | |||
"specified."}, | |||
{"DOMAINS REMOVE <domain> [finalize]", | |||
"Remove a domain with an optional keyword 'finalize'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this description should reveal less of the implementation:
Remove a domain from the cluster. If the keyword 'FINALIZE' is not supplied, perform the first phase of domain deletion, failing if there is an open queue on this domain and blocking any open queue requests subsequent to the command being issued. After the first phase of domain deletion, it is safe to remove the domain configuration file from disk. If the keyword 'FINALIZE' is specified, perform the second phase of domain deletion, failing if the domain has not had the first phase of domain deletion performed. After the second phase of domain deletion, it is safe to create a new domain reusing the same name and to connect to it.
Open questions from the above:
- Your text here reads like it is undefined behavior if the second pass occurs without the first pass. If this is true, I think we should change it to just fail in that case.
- Reading this written out, I think we need to clean the domain resolver cache and config provider cache in the second phase, not the first. If someone connects to a domain that has been deleted in the first pass, doesn't the broker need to go through the domain resolver to know what
Domain
object to talk to, so it can find out the domain has been deleted and reject the request? There will still be a tiny race condition, but I think clearing these caches right before you delete the domain object is the correct behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We do check the state of the domain to make sure we only continue the second round when the first is completed.
- Good catch!
|
||
DomainSp domainSp; | ||
|
||
if (0 != locateOrCreateDomain(&domainSp, name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this, but just to record it down: the reason we do this here is due to the lazy loading of domains that BlazingMQ does. We don't want to fail if a user creates a domain configuration file, happens not to connect, and then issues this admin command, or creates a domain configuration file, uses the domain, purges the domain, and restarts the broker before issuing this admin command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, this will load the domain into the resolver cache, so we do need to clear that on the second pass.
Signed-off-by: Emelia Lei <[email protected]>
implement the first pass where the config file of the domain is still on the disk, and the admin command can only be sent to leader/primary Signed-off-by: Emelia Lei <[email protected]>
Signed-off-by: Emelia Lei <[email protected]>
Remove the domain object in the second pass Signed-off-by: Emelia Lei <[email protected]>
…d pass. If an open queue request is issued right fter the first pass, the domain will be loaded into domainResolver and configProvider. We need to move cache cleaning to right before the remove of the domain object. Signed-off-by: Emelia Lei <[email protected]>
703efb4
to
9dcd5c1
Compare
I think we should add a test-case or two just to verify the change in that final commit. Then should be good to go. |
The existing admin command "
DOMAINS DOMAIN <name> PURGE
" only add purge record to the journal files for the queues belonging to this domain. This may cause race condition where it's possible for a user to connect to a queue that's purged.This PR adds a new admin command "
DOMAINS DOMAIN <name> REMOVE
" to:domainResolver
andconfigProvider
With a temporary second pass "
DOMAINS DOMAIN <name> REMOVE FINALIZE
" to: