[Membership] Use an expander graph to improve eviction speed when multiple hosts fail simultaneously #9301
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR changes how silos select which other silos to monitor. The current scheme arranges all silos into a hash ring, and each silo monitors the
![image](https://private-user-images.githubusercontent.com/203839/407846100-618796b3-77cd-4ec9-9656-6cbed8cc59ec.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDYwODYsIm5iZiI6MTczOTA0NTc4NiwicGF0aCI6Ii8yMDM4MzkvNDA3ODQ2MTAwLTYxODc5NmIzLTc3Y2QtNGVjOS05NjU2LTZjYmVkOGNjNTllYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQyMDE2MjZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00ODdlOTg4MTdiNGM2MmFhZTM5M2JhYzgxOTQ0ZTc2NTkxOGE1YzcyMTM2NWViYWUwZjE2OWQ5YTkwOTgxNTRhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.ik7RVPXdnel2JZ-vvEKdawIcBgf_qbn5UWqBm783QUs)
NumProbedSilos
subsequent silos in the ring. This works well when there are very simultaneous failures, but when multiple silos fail simultaneously, the current scheme can be slow to detect the failures. To understand why, consider this example ring:The 3 red silos, A, B, and C have failed but have not been evicted yet. If two votes are required to evict a silo, then before C can be evicted either A or B must be evicted first. Note that for each silo, there exists a silo whose monitored set differs by a single silo. We can improve detection in this correlated failure scenario by selecting monitored silos using an expander graph instead.
This PR implements that approach by probabilistically constructing an expander graph. This minimizes overlap in monitoring sets between any two silos, thus helping to avoid cases where one failed silo must be evicted before another failed silo will be monitored by enough silos to have it evicted.
The idea to use an expander graph is taken from "Stable and Consistent Membership at Scale with Rapid" by Lalith Suresh et al:
https://www.usenix.org/conference/atc18/presentation/suresh
Microsoft Reviewers: Open in CodeFlow