Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup should not start before event handlers have finished #60

Open
guusdk opened this issue Sep 6, 2021 · 1 comment
Open

Cleanup should not start before event handlers have finished #60

guusdk opened this issue Sep 6, 2021 · 1 comment

Comments

@guusdk
Copy link
Member

guusdk commented Sep 6, 2021

When the plugin detects that a remote cluster node leaves the cluster, it fires off an event to reflect this (which invokes leftCluster(nodeId)) on all listeners. The plugin then cleans up things. The corresponding code can mostly be found in

// Trigger event that a node left the cluster
ClusterManager.fireLeftCluster(nodeID.toByteArray());
// Clean up directed presences sent from entities hosted in the leaving node to local entities
// Clean up directed presences sent to entities hosted in the leaving node from local entities
cleanupDirectedPresences(nodeID);
if (!seniorClusterMember && isSeniorClusterMember()) {
seniorClusterMember = true;
ClusterManager.fireMarkedAsSeniorClusterMember();
}
cleanupNode(nodeID);
// Remove traces of directed presences sent from local entities to handlers that no longer exist.
// At this point c2s sessions are gone from the routing table so we can identify expired sessions
XMPPServer.getInstance().getPresenceUpdateHandler().removedExpiredPresences();

There is a race condition here, that is caused by the invocation of the event listeners (in the first line) to be asynchronous. It causes the execution of that invocation to be parallel to the execution of the cleanup. This can cause problems for implementations of the event listener, if they operate on the same data. As the clean-up modifies pretty low-level state (routing table, session manager), the chance of indirect interference is pretty high.

It would probably be good to separate the two, to make sure that there's no interference. I assume (but haven not verified) that the existing order of execution (first event listeners, then cleanup) is OK (although maybe it's worth investigating if that needs reversal). In any case, the first should have had time to finish execution before the second is started. Some kind of callback implementation could be used to achieve this.

@guusdk
Copy link
Member Author

guusdk commented Sep 8, 2021

This probably isn't needed if we fix #50 as the callback would be used to invoke code that has by then be migrated to Openfire.

evdherberg pushed a commit to evdherberg/openfire-hazelcast-plugin that referenced this issue Oct 21, 2021
…ber join until the new member has finished cleaning up old data.
evdherberg pushed a commit to evdherberg/openfire-hazelcast-plugin that referenced this issue Oct 21, 2021
…members of new member join until the new member has finished cleaning up old data.
guusdk pushed a commit that referenced this issue Oct 21, 2021
…ntil the new member has finished cleaning up old data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant