Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Stable Div Rem #32

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Add Stable Div Rem #32

wants to merge 8 commits into from

Conversation

NCGThompson
Copy link
Contributor

closes #11

@NCGThompson NCGThompson marked this pull request as ready for review November 13, 2023 04:29
@NCGThompson
Copy link
Contributor Author

You may notice that I included a compiler hint for in some of the divide by zero checks. e.g.:

https://github.com/NCGThompson/ethnum-rs/blob/295dd6d0a0743026c3ca593e750b0268b65e4357/src/uint.rs#L372-L386

Inequality operators (e.g. <=, >) are overloaded with the cmp method, which has a custom implementation based on the lexicographical comparison of (u128, u128):

https://github.com/NCGThompson/ethnum-rs/blob/295dd6d0a0743026c3ca593e750b0268b65e4357/src/uint/cmp.rs#L18-L20

The eq method, on the other hand, is derived automatically. It performs the eq method on each of U256's fields, which is a single array of u128s.

In LLVM, the lexicographical comparison is translated to two icmps, while the eq method is translated to a bcmp. The bcmp is very fast, and can take advantage of the SIMD registers. When compared to a constant 0 on x86_64 with 256-bit registers, bcmp translates to test ymmm, ymmm. icmp is not optimized the same way.

However, the compiler seems to inconsistently elide redundant bcmps, even if they are compiled in the same module. For example,

if a == 0 {
    unsafe { unreachable_unchecked() }
}
if a == 0 {
    panic!();
}

compiles to a single bcmp with all optimizations on. Also, the compiler doesn't understand that the result of the bcmp indicates the result of the icmp of each of the u128s. Notably, the / expressions used in udivmod4 use icmp, so

if a > u128::MAX {
    unsafe { unreachable_unchecked() }
}
if a == 0 {
    unsafe { unreachable_unchecked() }
}
if a == 0 {
    panic!();
}
_ = U256::new(3) / a;

compiles to a 256-bit bcmp and a 128-bit icmp each with their own panic statement, even if udivmod4 is inlined and optimized.

So, you may ask, why not use a custom implementation of eq that includes the compiler hint? I definitely think that is worth considering, but the effect varies based on the context as well as the opt settings. I don't know if it will be a pessimization in some cases. Interestingly, if lto is not set to "fat", then it makes the biggest difference when comparing a 256-bit to a 128-bit and the hint is placed directly in the self.eq(128) rather than a custom self.eq(Self).

src/uint.rs Outdated Show resolved Hide resolved
Copy link
Owner

@nlordell nlordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Thanks so much for this, just a small nit on the alignment.

@NCGThompson NCGThompson marked this pull request as draft November 17, 2023 16:43
@NCGThompson
Copy link
Contributor Author

Fixed the alignment. The other change I made was to the docs and doc tests of some methods. One could argue it was better before, but this way makes it more consistent with std.

@NCGThompson NCGThompson marked this pull request as ready for review November 17, 2023 18:58
@NCGThompson
Copy link
Contributor Author

I found an error. In the std library, overflow is a guaranteed panic even without overflow checks. I made this depend on debug assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add {checked_,overflowing_,wrapping_,}div_rem
2 participants