- cmake 3.22
- python 3.10
- gcc 12.2.0
- aocc 4.0
#Configure for GCC build
# Default Native Build
$ cmake -D CMAKE_C_COMPILER=gcc -S <source_dir> -B <build_dir>
# Cross Compiling AVX2 binary on AVX512 machine
$ cmake -D CMAKE_C_COMPILER=gcc -D ALMEM_ARCH=avx2 -S <source_dir> -B <build_dir>
# Cross Compiling AVX512 binary on AVX2 machine
$ cmake -D CMAKE_C_COMPILER=gcc -D ALMEM_ARCH=avx512 -S <source_dir> -B <build_dir>
# Enabling Tunable Parameters
$ cmake -D CMAKE_C_COMPILER=gcc -D ENABLE_TUNABLES=Y -S <source_dir> -B <build_dir>
#Configure for AOCC(Clang) build
# Default Native Build
$ cmake -D CMAKE_C_COMPILER=clang -S <source_dir> -B <build_dir>
# Cross Compiling AVX2 binary on AVX512 machine
$ cmake -D CMAKE_C_COMPILER=clang -D ALMEM_ARCH=avx2 -S <source_dir> -B <build_dir>
# Cross Compiling AVX512 binary on AVX2 machine
$ cmake -D CMAKE_C_COMPILER=clang -D ALMEM_ARCH=avx512 -S <source_dir> -B <build_dir>
# Enabling Tunable Parameters
$ cmake -D CMAKE_C_COMPILER=clang -D ENABLE_TUNABLES=Y -S <source_dir> -B <build_dir>
# Build
$ cmake --build <build_dir>
# Install
$ cmake --install <build_dir>
## For custom install path, run configure with "CMAKE_INSTALL_PREFIX"
Both shared library: 'libaocl-libmem.so' and static library: 'libaocl-libmem.a' are installed under '<build_dir>/lib/' path.
To enable logging build the source as below
$ cmake -D ENABLE_LOGGING=Y -S <source_dir> -B <build_dir>
Logs will be stored in the"/tmp/libmem.log"
file.
Enable debugging logs by uncommenting the below line from "CMakeLists.txt" in root directory.
debugging logs: add_definitions(-DLOG_LEVEL=4)
Run the application by preloading the shared 'libaocl-libmem.so' generated from the above build procedure.
$ LD_PRELOAD=<path to build/lib/libaocl-libmem.so> <executable> <params>
WARNING: Do not load/run AVX512 library on Non-AVX512 machine. Running AVX512 on non-AVX512 will lead to crash(invalid instructions).
Best fit implementation for the underlying ZEN microarchitecture will be chosen by the library.
There are two tunables that will be parsed by libmem.
LIBMEM_OPERATION
:- instruction based on alignment and cacheabilityLIBMEM_THRESHOLD
:- the threshold for ERMS and Non-Temporal instructions
The library will choose the implementation based on the tuned parameter at run time.
Setting this tunable will let you choose implementation which is a combination of move instructions and alignment of the source and destination addresses.
LIBMEM_OPERATION format: <operations>,<source_alignment>,<destination_alignmnet>
<operations> = [avx2|avx512|erms]
<source_alignment> = [b|w|d|q|x|y|n]
<destination_alignmnet> = [b|w|d|q|x|y|n]
e.g.: To use only avx2 based move operations with both unaligned source and destination addresses.
LD_PRELOAD=<build/lib/libaocl-libmem.so> LIBMEM_OPERATION=avx2,b,b <executable>
Setting this tunable will let us configure the threshold values for the supported instruction set.
LIBMEM_THRESHOLD format: <repmov_start_threshold>,<repmov_stop_threshold>,<nt_start_threshold>,<nt_stop_threshold>
<repmov_start_threshold> = [0, +ve integers]
<repmov_stop_threshold> = [0, +ve integers, -1]
<nt_start_threshold> = [0, +ve integers]
<nt_stop_threshold> = [0, +ve integers, -1]
One has to make sure that they provide valid start and stop range values. If the size has to be set to maximum length then pass "-1"
e.g.: To use REP MOVE instructions for a range of 1KB to 2KB and non_temporal instructions for a range of 512KB and above.
LD_PRELOAD=<build/lib/libaocl-libmem.so> LIBMEM_THRESHOLD=1024,2048,524288,-1 <executable>
Kindly refer to User Guide(docs/User_Guide.md) for the detailed tuning of parameters.