-
Notifications
You must be signed in to change notification settings - Fork 289
Notes on SIMD programming
s-trinh edited this page Jan 17, 2019
·
1 revision
Current state on intrinsics code in ViSP:
- only x86 SSE (no AVX, AVX2, ARM NEON, ...)
- SSE headers must be included in
.cpp
file to detect if the compiler support the generation of corresponding intrinsics code at compilation time:
#if defined __SSE2__ || defined _M_X64 || (defined _M_IX86_FP && _M_IX86_FP >= 2)
#include <emmintrin.h>
#define VISP_HAVE_SSE2 1
#if defined __SSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <pmmintrin.h>
#define VISP_HAVE_SSE3 1
#endif
#if defined __SSSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <tmmintrin.h>
#define VISP_HAVE_SSSE3 1
#endif
#endif
- use CMake options to enable SSE2 / SSE3 / SSSE3, this will add the necessary flags (e.g.
-msse2
) - use
vpCPUFeatures::checkSSE2()
to check if the CPU support SSE2 instructions set at run time - this is necessary to avoid issue when for example ViSP is built with SSSE3 support but is run on a computer that does not support SSSE3
AVX2 has been added since Haswell architecture (2013). Correct way to support AVX2, AVX512, ... would be:
- SSE and AVX2 code must be separated into separate compilation units
- source files that contain SSE code will be compiled with only SSE flags (e.g.
msse2
) and source files that contain AVX2 code with AVX2 flag (e.g.-mavx2
or/arch:AVX2
for MSVC), see CPU dispatcher topics - when packaging ViSP for Linux distributions, the best is to have (see also):
- one option to enable baseline intrinsics (e.g. SSE2 or SSE3), regular and files that contain SSE code will have the SSE flags added
- one option to add dispatched intrinsics (e.g. AVX2, AVX512, ...), source files that contain AVX2 code will have the
-mavx2
flag added - this way, we assume that we target at minimum SSE2 or SSE3 cpus, source files with no intrinsics code will also be compiled with
-msse2
or-msse3
flags (so the compiler may be able to generate SSE code even if no SSE intrinsics code are written, see for instance this example with-03
or-march=native
compiler flags) - users with recent cpu will be able to benefit from code written with AVX2 intrinsics
- some warnings with SSE-AVX transition penalty
Some additional references: