Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on hashing #149

Open
ChrisJefferson opened this issue Oct 25, 2022 · 0 comments
Open

Thoughts on hashing #149

ChrisJefferson opened this issue Oct 25, 2022 · 0 comments

Comments

@ChrisJefferson
Copy link
Member

I have been thinking about hashing, and wanted to write down some thoughts on the topic. Comments are welcome, but not expected!

There are, broadly speaking two types of hash functions we might want:

  1. A GenericHash which, given any GAP variable, can calculate a hash function for it.

  2. ChooseHash, a way of getting a highly efficient hash function for a particular type of data (this might fall back on Hash when it cannot make a more clever decision).

I think Hash is always required, I (for example) sometimes hash things like rec(points := 6, parts := [1,2]), which I don't think would ever be handled well by an "efficient hash function".

There are other parts of GAP which implement ChooseHash, and we have an example (old) PR. There is an implementation of GenericHash, but it can't currently be extended.

In an ideal world, I think we want the following:

  • An easy way to add new hash functions, which will by default be added to both GenericHash and ChooseHash, but which can be only added to one or the other in special cases.

The main problem arises with GenericHash, and if a new overload is added once a hash object already exists which changes the result for an existing object. This would break these hash containers.

I am unsure if it is reasonable to forbid adding GenericHash overloads which change the hash of an object which can already be hashed -- in particular this might get broken if two packages added overloads which could hash the same object.

Therefore, there are 2(ish) "reasonable" ways to fix this.

  1. We could provide some efficient way of detecting if an operation has changed -- a hash function could store this, and force rehashing the container if a new method is installed.

  2. We could provide a way to clone an operation (I assume this would be done efficiently, so if the operation never changed only one clone would ever be made, and repeatedly handed out) -- then a hash contain could clone GenericHash when constructed, and never change hash functions during it's lifetime. This could create strange effects of two hash containers being able to hash different objects, depending on when they were created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant