You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although in theory the 128-bit routines in Blandwidth are written to ensure they can use 2 read and 2 write ports on every cycle where available, and CLANG produces ASM that appears to work properly, MSVC produces really bizarre ASM that does register shuffles in the middle of the inner loop. Although I have not investigated this matter yet, it would appear that the 128-bit op bandwidth reported by the CLANG build is probably the accurate (higher) one, whereas 128-bit op bandwidth reported by MSVC is probably wrong.
- Casey
The text was updated successfully, but these errors were encountered:
Although in theory the 128-bit routines in Blandwidth are written to ensure they can use 2 read and 2 write ports on every cycle where available, and CLANG produces ASM that appears to work properly, MSVC produces really bizarre ASM that does register shuffles in the middle of the inner loop. Although I have not investigated this matter yet, it would appear that the 128-bit op bandwidth reported by the CLANG build is probably the accurate (higher) one, whereas 128-bit op bandwidth reported by MSVC is probably wrong.
- Casey
The text was updated successfully, but these errors were encountered: