Adjust scalar/vector math function handling #3357
Merged
+304
−240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
Part A:
sincos function is used both inside and outside openmp offload region.
When used inside target region, the generic call signature defined in glibc is expected.
We need to be sure they can be compiled and linked on both host and device.
On the other hand, vendor may provide optimized functions requires call redirection like amd_sincos.
If we use inline function, this pollutes the compilation of target region. For this reason, This PR splits math functions in two namespaces.
Part B:
When doing the above work, I reviewed all the scalar vector math handling. To reduce entanglement,
QMC_MATH_VENDOR
is introduced with providers GENERIC, INTEL_VML, AMD_LIBM, IBM_MASS.ENABLE_MASS
has been removed.When MKL is detected in BLAS/LAPACK, INTEL_VML will be picked as in the past. But if GENERIC is requested, the use of VML can be skipped. In general, no change for users but we have more precise control.
Another note, I explored the AMD LibM but it is slower than Clang inlined functions. The status of vector functions are also not clear, amd/aocl-libm-ose#8 So it is added only for experiments and should be avoided for production use.
Part C:
Added a unit test for StructFact. it is very useful to quickly check sincos performance.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
epyc-server, dell-laptop, summit
Checklist