-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86 sse2/avx2: rewrite loongarch immediate operand shift instruction #1247
Conversation
3afebc4
to
1555b14
Compare
Looks good to me. |
@jinboson @HecaiYuan Thanks. Do you know why the CI didn't catch this already? |
Hi, @mr-c . In the CI test , for example, when running test for simde_mm_slli_epi16(), the second shift parm passed is immediate value, like below:
however, we found that in some projects(like ,aom) that user would pass varibales to the second shift parm, like below:
In this case loongarch implementation for simde_mm_slli_epi16() would emit error at compile time before the PR. It seems other arch ran into the problem already, see here : #905 (comment) |
simde/x86/sse2.h
Outdated
@@ -6163,7 +6163,7 @@ simde_mm_sll_epi16 (simde__m128i a, simde__m128i count) { | |||
#elif defined(SIMDE_ARM_NEON_A32V7_NATIVE) | |||
r_.neon_u16 = vshlq_u16(a_.neon_u16, vdupq_n_s16(HEDLEY_STATIC_CAST(int16_t, count_.u64[0]))); | |||
#elif defined(SIMDE_LOONGARCH_LSX_NATIVE) | |||
r_.lsx_i64 = __lsx_vslli_h(a_.lsx_i64, count_.u64[0]); | |||
r_.lsx_i64 = __lsx_vsll_h(a_.lsx_i64, __lsx_vreplgr2vr_h(count_.u64[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the implementation to the line 6160, Othewise the codepath will step into #if defined(SIMDE_VECTOR_SUBSCRIPT_SCALAR) branch.
simde/x86/sse2.h
Outdated
@@ -6199,7 +6199,7 @@ simde_mm_sll_epi32 (simde__m128i a, simde__m128i count) { | |||
#elif defined(SIMDE_ARM_NEON_A32V7_NATIVE) | |||
r_.neon_u32 = vshlq_u32(a_.neon_u32, vdupq_n_s32(HEDLEY_STATIC_CAST(int32_t, count_.u64[0]))); | |||
#elif defined(SIMDE_LOONGARCH_LSX_NATIVE) | |||
r_.lsx_i64 = __lsx_vslli_w(a_.lsx_i64, count_.u64[0]); | |||
r_.lsx_i64 = __lsx_vsll_w(a_.lsx_i64, __lsx_vreplgr2vr_w(count_.u64[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the implementation to the line 6196, Othewise the codepath will step into #if defined(SIMDE_VECTOR_SUBSCRIPT_SCALAR) branch.
I haven't looked at SIMDe recently but I feel that I should chip in on this. |
That's really a dilemma, thank you for setting me straight. |
If one really must allow a variable shift count, I can see two possible workarounds.
But, for me, the real question is whether SIMDe should accept the invalid use of an intrinsic. |
Hi, @mr-c . This is ready for review, then is there anything else needed to move this forward ? |
Hi, @mr-c . Regarding this issue, do you have any good suggestions? What should we do next? |
I am discussing with the compiler team about allowing the slli instruction to accept variables as parameters, as well as allowing immediate values expanding the range(b 8, h 16, w 32) . |
The x86 instructions like _mm_srli_epi8() can accept variable or immediate operand as the second parm, however, the corresponding loongarch instructions like __lsx_vsrli_b only accept immediate operand as the second parm, so we need to rewirite them to avoid compilation error.
This can be closed now, see #1263 for details. Thank you every one. |
The x86 instructions like _mm_srli_epi8() can accept variable or immediate operand as the second parm, however, the corresponding loongarch instructions like __lsx_vsrli_b only accept immediate operand as the second parm, so we need to rewirite them to avoid compilation error.