Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constructive collisions #6

Open
bftjoe opened this issue Dec 11, 2024 · 6 comments
Open

constructive collisions #6

bftjoe opened this issue Dec 11, 2024 · 6 comments

Comments

@bftjoe
Copy link

bftjoe commented Dec 11, 2024

Please add a mode where hash collisions are allowed if two different keys have the same value stored in the table.

@renzibei
Copy link
Owner

Hi, thanks for your suggestion. The behavior of handling duplicate keys isn’t explicitly guaranteed by the standard for std::unordered_map; in practice, most implementations simply take the first occurrence. For this perfect hash map, handling duplicates at constructor function would slow down the build phase and increase memory usage.

If you need custom duplicate handling, consider preprocessing your input data to remove or merge duplicates before constructing the table. Alternatively, you can insert elements one by one using insert(first, last) function, as it’s essentially a convenience wrapper over single insertions, allowing you to resolve duplicates externally as needed.

@bftjoe
Copy link
Author

bftjoe commented Dec 19, 2024

Hi, I meant different keys that map to the same value, not the same keys.

@renzibei
Copy link
Owner

Hi, I’m not entirely sure I understand what you mean. If you’re suggesting that we allow different keys to map to the same slot (i.e., introduce collisions), that runs counter to the whole purpose of this project. The goal here is to create a perfect hash map, which by definition avoids collisions. If we allowed collisions, we’d have to handle them like a normal hash map, losing the advantages of a perfect hashing approach. Could you clarify your use case or the rationale behind this request?

@bftjoe
Copy link
Author

bftjoe commented Dec 20, 2024

The key collisions would only be allowed if the stored value was the same, thus saving both time (as in reducing the need to rehash) and space while preserving the perfect hash (unless the stored value was changed later).

@renzibei
Copy link
Owner

Thanks for clarifying. If I understand your suggestion correctly, you’d like different keys that hash to the same slot—and have identical values—to share that single slot. While that might sound straightforward, it would actually require redesigning large parts of the data structure due to the following reasons:

Storing Multiple Keys

Even if two keys share the same value, we still need both keys available for lookups. Our current design stores exactly one key-value pair per slot, and changing that would mean creating an entirely new scheme that accommodates multiple keys for a single slot.

Comparing Values

To confirm whether two values are the same, we’d need to compare them. That’s not part of our current API requirements, nor does the standard library mandate that all value types be comparable. Adding value-comparison logic would significantly alter our design.

Cache Friendliness & Space Usage

Our layout is optimized for storing one pair per slot in a compact array. If we allow an unknown number of keys in any single slot, it’s not obvious how to maintain both the perfect-hash efficiency and cache-friendliness.

Because of these issues, it’s not a direction we plan to pursue. Our goal is to provide a perfect hash solution without collisions or extra comparisons. If collisions need to be managed—or if value comparisons are required—there are more traditional hash table implementations that handle those scenarios well.

Thanks again for the feedback.

@bftjoe
Copy link
Author

bftjoe commented Dec 21, 2024

Hi, I was assuming that if this separate mode was enabled, the user would be responsible for handling if the mapped values were to suddenly change for the keys which map to the same slot, not the library. This would enable the support to be trivial on the library side. I see you are not interested, which is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants