-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Simplify/improve Hash::PartialEq impls and remove constant_time_eq crate #417
Comments
Given that the description of Now you might be inclined to point out that the generated AVX (or even SSE2) instructions only use constant-time-safe bit operations. However, switch the godbolt target to armv8 and you will see lots of |
The requirement for a constant-time comparison here mostly concerns I'm not surprised that |
Hello. I'm currently working on a small project for hashing (and subsequently verifying) entire file directories using blake3. As part of validating hashed file results, I need to use Hash::eq quite a lot, and while trying to micro-optimize everything I discovered that the current Hash::eq is pretty bad, and I think it can be better. As far as I can tell, constant_time_eq doesn't do anything special, but if there's a specific quality of it's comparison that blake3 relies on that makes a proposal like this unfeasible than I'm sorry to be a pain. Here's the asm currently generated by blake3. It seems like it's doing what's fundamentally a pretty simple operation in quite a complicated way.
All that Hash::eq really needs to do (again, as far as I understand) is ensure all 256 bits of one Hash are equal to all 256 bits of another, so I wrote a naive AVX-based eq() to do so.
Then I dug into the rust lib and discovered that they have a specialized PartialEq implementation for same-size arrays. It uses internal intrinsics to tell the compiler to directly compare bytes, which generates an even better eq().
It's also extremely simple to implement, with no conditional compilation, external crates, or hand-written asm required, since it just calls the built-in eq() and defers to the rust compiler to generate asm based on the target arch. In this case I was targeting x86-64-v3, but this works equally well for v2 (see godbolt link below) and v1.
The only potential issue I can think of with this approach is alignment. From everything I've been able to find online and from what little testing I've done, modern CPUs (I use a relatively old R7 2700 for testing) with AVX and greater don't seem to care about aligned vs. unaligned all that much. But older CPUs and non x86 cores might perform poorly. If this proposal were to be implemented, it might be beneficial to make Hash #[repr(align(32))], but I'm not sure if this could be considered a backwards-compatible change per semver. Since people using blake3 who have internal Structs holding Hash instances may see Struct sizes change, depending on how Struct members are arranged. Not sure how you'd want to handle that or if it's even worth considering but figured I'd mention it. It may be enough to leave alignment alone and let the compiler make it's own decisions based on the target arch.
Here's the godbolt if you're interested in seeing how different platforms respond to the change. Note that x86-64-v2 and v1 generate two additional loads when Hash isn't manually aligned to 32-bytes.
If this is something you'd be fine with I can make a PR including comments explaining usage of rust-provided eq() over alternatives.
The text was updated successfully, but these errors were encountered: