Skip to content

v0.7.4

Compare
Choose a tag to compare
@mr-c mr-c released this 05 May 05:35
· 310 commits to master since this release
0c26988

SIMDe 0.7.4

Summary

  • Minimum meson version is now 0.54
  • 40 new NEON families implemented, SVE API implementation started (14 families)
  • Initial support for x86 F16C API
  • Initial support for MIPS MSA API
  • Initial support for Arm Scalable Vector Extensions (SVE) API
  • Initial support for WASM SIMD128 API
  • Initial support for the E2K (Elbrus) architecture
  • MSVC has many fixes, now compiled in CI using /ARCH:AVX, /ARCH:AVX2, and /ARCH:AVX512

X86

There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far.
Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)

Newly added function families

Additions to existing families

  • AVX512F: 579 additional, 856 total of 2660 (31.80%)
  • AVX512BW: 178 additional, 335 total of 828 (40.46%)
  • AVX512DQ: 77 additional, 111 total of 399 (27.82%)
  • AVX512_VBMI: 9 additional, 30 total of 30 💯
  • KNCNI: 113 additional, 114 total of 595 (19.16%)
  • VPCLMULQDQ: 1 additional, 2 total of 2 💯

Neon

SIMDe currently implements 3745 out of 6670 (56.15%) NEON functions. If you don't count 16-bit floats and poly types, it's 3745 / 4969 (75.37%).

Newly added families

  • addhn
  • bcax
  • cage
  • cmla
  • cmla_rot90
  • cmla_rot180
  • cmla_rot270
  • fma
  • fma_lane
  • fma_n
  • ld2
  • ld4_lane
  • mlal_high_n
  • mlal_lane
  • mls_n
  • mlsl_high_n
  • mlsl_lane
  • mull_lane
  • qdmulh_lane
  • qdmulh_n
  • qrdmulh_lane
  • qrshrn_n
  • qrshrun_n
  • qshlu_n
  • qshrn_n
  • qshrun_n
  • recpe
  • recps
  • rshrn_n
  • rsqrte
  • rsqrts
  • shll_n
  • shrn_n
  • sqadd
  • sri_n
  • st2
  • st2_lane
  • st3_lane
  • st4_lane
  • subhn
  • subl_high
  • xar

MSA

Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.

Details

Implementation of Arm intrinsics

NEON

SVE Intrinsics

WASM intrinsics

x86 intrinsics

SSE*

AVX

  • avx: work around missing _mm256_{load,store}u_m128{,i,d} on LCC a3a39e2 @nemequ
  • avx: try to detect prior inclusion of AVX header and handle it e8b7a2e @nemequ
  • avx, avx512/cmp: properly handle NaN in _mm{,256,512}_cmp_{ps,pd,ss,sd} 491d3fa @nemequ
  • avx: use internal symbols in clang fallbacks for cmp_ps/pd functions 35b86b7 @nemequ
  • avx: work around incorrect maskload/store definitions on clang < 3.8 a9313de @nemequ
  • avx: add native calls for _mm256_insertf128_{pd,ps,si256} bab30bb @LaurentThomas
  • avx: add test for simde_mm256_permute2f128{_pd,_si256} 04a0497 @mr-c
  • avx{,2}: fix maskload illegal mem access 39f723e @k-dominik
  • avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 698bc2e @mr-c
  • avx{,2}: some intrinsics are missing from older MSVC versions bb274b8 @mr-c

AVX2

AVX512

GFNI

  • gfni: improve ARM NEON implementation a99a3ec @rosbif
  • gfni: add ARM, PPC and WASM implementations of gf2p8mul intrinsics 61126b3 @rosbif
  • gfni: add cast to work around -Wimplicit-int-conversion warning d066a1c @nemequ
  • gfni: remove unintentional dependency on vector extensions bdfa828 @nemequ
  • gfni: work around clang bug #50932 7d4beba @nemequ
  • gfni: work around error with vec_bperm on clang-10 on POWER 8620bd0 @nemequ
  • gfni: replace vec_and and vec_xor with & and ^ on z/arch f5577dc @nemequ
  • gfni: add many x86, ARM, z/Arch, PPC and WASM implementations 97eb961 @rosbif

XOP

  • xop: fix NEON implementation of maccs functions to use NEON types 6ecc0e3 @nemequ

F16C

  • f16c: initial implementation 62c1087 @nemequ
  • f16c: use __ARM_FEATURE_FP16_VECTOR_ARITHMETIC to detect Arm support eaeac09 @nemequ
  • msvc 2022: enable F16C if AVX2 present a66cbb0 @mr-c
  • f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph 5d2b53d @mr-c

FMA

SVML

MIPS MSA intrinics

Arch support

arm64

z/Arch

Altivec

  • sse, sse2: generate to/from altivec functions for SSE/SSE2 types. dd3ff53 @nemequ
  • docker: power9-clang ignore deprecated-altivec-src-compat warnings b70f1a2 @mr-c
  • sse4.1: PPC AltiVec has no vec_splat_s64 debbf73 @rosbif
  • arch: fix SIMDE_ARCH_POWER_ALTIVEC_CHECK to include AltiVec check 8534e64 @nemequ
  • simd128: add AltiVec implementations of any/all_true a3b2630 @nemequ

e2k (Elbrus)

Power

  • gcc power: bugs 1007[012] fixed in GCC 12.1 c23208d @mr-c
  • gcc power: vec_cpsgn argument reversal fixed in 12.0 296362c @mr-c

Testing with Docker/Podman & CI

Appveyor

Azure

Circle CI

Cirrus CI

Local testing with Docker/Podman

Drone.io

Currently non-functional. Jobs queue, but are eventually killed before they start running. Assistance fixing that is welcome!

GitHub Actions

  • gh-actions: add some bionic-era GCC builds ccdd24b @nemequ
  • gh-actions: add several clang builds e4b4646 @nemequ
  • gh-actions: add some bionic-era GCC builds. ccdd24b @nemequ
  • gh-actions: temporarily disable emscripten build 71ea291 @nemequ
  • codeql: analyze the merge commit d3a40e1 @mr-c
  • gh-actions: automatically detect whether to use SDE bb69b54 @nemequ
  • gh-actions: disable clang-3.9 build 7fcb64d @nemequ
  • gh-actions: use ctest to run CMake tests so we can output on failure 03f6ebe @nemequ
  • gh-actions: try commit message witohut quotes on implementation-status 3f81cac @nemequ
  • gh-actions: add action to update the implementation-status repo 333f077 @nemequ
  • gh-actions: use -O2 instead of -O3 on emscripten 636f145 @nemequ
  • gh actions: Add Windows ARM64 CI f12fd00 @tommyvct
  • gh-actions: only run mSVC Arm checks on msvc-arm branch 3d8a516 @nemequ
  • gh-actions: switch emscripten build to Meson bde2cb1 @nemequ
  • gh-actions: ubuntu-16.04 has been retired, migrate to ubuntu-18.04 6d0c65c @mr-c
  • gh-actions: pin to macos-10.15 instead of -latest d64de8c @mr-c
  • ga-actions: trim flags for icx/icpc 201dcdb @mr-c
  • gh-actions, circleci: debian testing gcc: -Wno-error=stringop-overread af24d0c @mr-c
  • gh-actions, docker: turn off emscripten's -Wunsafe-buffer-usage for the tests 3caf71d @mr-c
  • gh-actions: test using Intel® oneAPI DPC++/C++ Compiler instead of ICC df144ff @mr-c
  • gh-actions: Ubuntu 22.04 + system meson dd0b662 @mr-c
  • gh-actions: Update codecov to v3 for Node 16 support bd7f8df @wrv
  • gh-actions: Update macos build to 11 c30a29b @wrv
  • gh-actions: Comment out Ubuntu 18.04 build as will be unsupported in April 2023 6cefe47 @wrv
  • gh-actions: Update to actions/checkout@v3 to avoid Node 12 warning 511b5b7 @wrv
  • gh-actions: add -fp-model precise for icx/icpx 7ec32ff @wrv
  • gh-actions: update OSSAR action versions a1a63ac @wrv
  • gh-actions: cancel workflows if there is a newer commit 8c56459 @mr-c
  • gh-actions: test with gcc-12 f6db95d @mr-c
  • gh-actions: remove GCC 4.7 build 3997b8f @nemequ
  • gh-actions: add action to push to the simde-no-tests repository 1b4647f @milot-mirdita
  • gh-actions: move push-to-no-tests.yml into the right directory. 7fbb9c9 @nemequ
  • gh-actions: give up on getting commit ID in message for status repo 05ecb5d @nemequ
  • gh-actions: add missing jobs property ddd453a @nemequ
  • gh-actions, docker: add -fno-lax-vector-conversions to clang flags ccdfca9 @nemequ
  • gh-actions: add -ffast-math builds for GCC and clang de616e7 @nemequ
  • gh-actions: resume testing on aarch64 4d1639a @mr-c
  • gh-actions: cross-build & test powepc64le, s390x (later) f0f3d09 @mr-c
  • gh-actions: sleef: no ccache due to -march=native c709922 @mr-c
  • gh-actions: use ccache to speed up builds 73dddb7 @mr-c
  • gh-actions: clang 1[45]; gcc 12 on riscv64 with qemu e5c02d4 @mr-c
  • gh-actions: Resume running the mscv arm tests on all branches 782d816 @mr-c
  • gh-actions: Emscripten: temporarily only run "native" tests 1b6bde7 @mr-c
  • gh-actions: actionlint/shellcheck inspired cleanups 8182065 @mr-c
  • gh-actions qemu: resuming build+test on s390x cb6a0da @mr-c
  • gh-actions: drop cmake for meson. 9d69cff @mr-c

Travis

Currenttly non-functional, partially replaced by the s390x quemu GitHub Action build.
See #903 for the status of POWER9 (ppc64le)

Netlify

Currently broken

Semaphore CI

Currently failing for old GCC-5

  • semaphore CI: fix test execution by using mason 1b05684 @mr-c

Misc

New Contributors

Full Changelog: v0.7.2...v0.7.4