-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise LoadMaskBits for uint8_t on SVE #2453
Optimise LoadMaskBits for uint8_t on SVE #2453
Conversation
This technique is used by Arm and is known to work for GCC and Clang See https://gitlab.arm.com/networking/ral/-/blob/7ea3a5f7a03e7a3dd1fecd898db9802d4cd36e89/src/intrinsics.h#L422
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Thanks for the pull request! Are you able to sign a CLA? This is a nice improvement. I thought the only documented way was PMOV (SVE 2.1), can you point me to any related documentation? I did see this but it doesn't specifically mention predicate regs:
Also, it looks like GCC failed to handle this until about 2021. Please add a check for For clang also, it seems like version 19 is required, here's a godbolt for testing. |
We're currently looking into the company's CLA. I'm not convinced this method is strictly documented but has been used in Arm's public code and does appear to work. It looks to compile into an ldr instruction that directly loads into the predicate register. I believe the PMOV instruction coming in SVE2.1 will formalise this into a proper intrinsic. Whilst the above bug report lists this being fixed in GCC 12.0, godbolt appears to be happy to compile down to 10.2. I've stuck to GCC 12.0 for now. |
Sounds good. Thanks for looking into the CLA and checking versions. |
@jan-wassenberg I have been informed that our organisation has indeed signed the CLA, but we are still failing the CLA check on the PRs. We have attempted to query this using the email address for CLA submissions, however we are receiving no response. Would it be possible to flag this internally? |
Thanks for following up! I just checked the email address and it's "wil******er@cambridgeconsultants.com>". Oh, we had a similar issue with Azim's series of recent patches. What does https://cla.developers.google.com/clas say for you? I manually confirmed that your corporate CLA is indeed signed. This can take a couple of days to propagate to the automated check. In the meantime, I can approve this PR manually. |
SVE can load a uint8_t pointer directly into a mask through casting.
This technique is used by Arm and is known to work for GCC and Clang.
See https://gitlab.arm.com/networking/ral/-/blob/7ea3a5f7a03e7a3dd1fecd898db9802d4cd36e89/src/intrinsics.h#L422