-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: hash collision leads to inconsistent sharding #5
Comments
Do you have a test that confirms this? If so, could you add it to consistent_test.go? Thanks! |
Here are a few colliding strings: "abear|17", "solidiform|0" I won't be able to do a pull request until some time next week. Here is a simple script to find a bunch more if you care
|
Below is a Go test that currently fails. @or-else, are you working on a pull request or would it be helpful if I made one?
|
Hi @edsrzf I thought about adding a test along the lines of what you suggested but then decided against it. It would not make the code more reliable because the test would be too brittle. Change the "|" to ":" or 0..20 to A..T and it would stop failing. A more reliable test is to have the hash function substituted with a dummy one which returns just 4 or 8 different values, like
But then what's the point of testing for it when it's easier to get rid of the problem all together either by getting rid of map[] or by using md5 or sha1 as a hash function. Collisions would still be possible in the latter case but extremely unlikely. Original Tom White's implementation also had this problem then everybody copied it. |
That's a fair criticism, but I still think it's worth having a test, even if my test above is too brittle. |
@or-else it sounds like you are proposing a substantial rewrite of this package. Since in its current state, it has been working fine for us for a long time, we're not going to spend any time rewriting it. If you'd like to submit a pull request with changes and show that the performance and memory benchmarks are equal or better, I'd be happy to merge it in. I chose crc32 because it is fast. I just looked quickly at Brad Fitzpatrick's consistenthash implementation in github.com/golang/groupcache. He uses crc32 and a map. It looks quite similar. While collisions are possible, based on our qualitative observations, the distribution is consistent enough for our needs. And I agree with @edsrzf that a new test should be included with your changes to show that you fixed an issue. Thanks. |
@patrickxb, the package is just 250 lines. It's not a big deal. Regarding groupcache and bradfitz, no one is immune from making mistakes. Tom White introduced the original bug then everybody copied it. Regarding your observations, the probability of a collision in crc32 is on the scale of 1e-7 for 10 nodes. The fact that you have not hit the bug is not surprising. Nonetheless it's there. |
I agree with @or-else on this one. |
A hash collision is possible at /consistent.go#L76. That means the consistency of results is dependent on the order of Add() calls.
I think the easiest fix is to get rid of Consistent.circle altogether and just use .sortedHashes to keep something like struct {hash uint32, key string}.
search() would have to be updated to compare not just the hashes but key values too.
Regardless of this bug, Consistent.circle adds unnecessary complexity.
The text was updated successfully, but these errors were encountered: