constructive collisions #6

bftjoe · 2024-12-11T22:58:47Z

Please add a mode where hash collisions are allowed if two different keys have the same value stored in the table.

renzibei · 2024-12-19T10:01:28Z

Hi, thanks for your suggestion. The behavior of handling duplicate keys isn’t explicitly guaranteed by the standard for std::unordered_map; in practice, most implementations simply take the first occurrence. For this perfect hash map, handling duplicates at constructor function would slow down the build phase and increase memory usage.

If you need custom duplicate handling, consider preprocessing your input data to remove or merge duplicates before constructing the table. Alternatively, you can insert elements one by one using insert(first, last) function, as it’s essentially a convenience wrapper over single insertions, allowing you to resolve duplicates externally as needed.

bftjoe · 2024-12-19T22:34:25Z

Hi, I meant different keys that map to the same value, not the same keys.

renzibei · 2024-12-20T09:21:38Z

Hi, I’m not entirely sure I understand what you mean. If you’re suggesting that we allow different keys to map to the same slot (i.e., introduce collisions), that runs counter to the whole purpose of this project. The goal here is to create a perfect hash map, which by definition avoids collisions. If we allowed collisions, we’d have to handle them like a normal hash map, losing the advantages of a perfect hashing approach. Could you clarify your use case or the rationale behind this request?

bftjoe · 2024-12-20T10:48:04Z

The key collisions would only be allowed if the stored value was the same, thus saving both time (as in reducing the need to rehash) and space while preserving the perfect hash (unless the stored value was changed later).

renzibei · 2024-12-21T07:29:36Z

Thanks for clarifying. If I understand your suggestion correctly, you’d like different keys that hash to the same slot—and have identical values—to share that single slot. While that might sound straightforward, it would actually require redesigning large parts of the data structure due to the following reasons:

Storing Multiple Keys

Even if two keys share the same value, we still need both keys available for lookups. Our current design stores exactly one key-value pair per slot, and changing that would mean creating an entirely new scheme that accommodates multiple keys for a single slot.

Comparing Values

To confirm whether two values are the same, we’d need to compare them. That’s not part of our current API requirements, nor does the standard library mandate that all value types be comparable. Adding value-comparison logic would significantly alter our design.

Cache Friendliness & Space Usage

Our layout is optimized for storing one pair per slot in a compact array. If we allow an unknown number of keys in any single slot, it’s not obvious how to maintain both the perfect-hash efficiency and cache-friendliness.

Because of these issues, it’s not a direction we plan to pursue. Our goal is to provide a perfect hash solution without collisions or extra comparisons. If collisions need to be managed—or if value comparisons are required—there are more traditional hash table implementations that handle those scenarios well.

Thanks again for the feedback.

bftjoe · 2024-12-21T13:51:27Z

Hi, I was assuming that if this separate mode was enabled, the user would be responsible for handling if the mapped values were to suddenly change for the keys which map to the same slot, not the library. This would enable the support to be trivial on the library side. I see you are not interested, which is fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

constructive collisions #6

constructive collisions #6

bftjoe commented Dec 11, 2024

renzibei commented Dec 19, 2024

bftjoe commented Dec 19, 2024

renzibei commented Dec 20, 2024

bftjoe commented Dec 20, 2024 •

edited

Loading

renzibei commented Dec 21, 2024

bftjoe commented Dec 21, 2024 •

edited

Loading

constructive collisions #6

constructive collisions #6

Comments

bftjoe commented Dec 11, 2024

renzibei commented Dec 19, 2024

bftjoe commented Dec 19, 2024

renzibei commented Dec 20, 2024

bftjoe commented Dec 20, 2024 • edited Loading

renzibei commented Dec 21, 2024

Storing Multiple Keys

Comparing Values

Cache Friendliness & Space Usage

bftjoe commented Dec 21, 2024 • edited Loading

bftjoe commented Dec 20, 2024 •

edited

Loading

bftjoe commented Dec 21, 2024 •

edited

Loading