WIP: Instruction set detection/dispatch #16

drbenmorgan · 2018-06-25T16:29:07Z

This is a small WIP on tools/examples on coding/packaging of instruction set specific code (SIMD etc). At present, it simply implements:

A shell script to query the system (macOS/Linux only at present) and print SIMD instructions supported by host.
A small C++14 program to do similar

I'm requesting an initial review now to solicit comments of the remaining items:

How to format SIMD flags for portability, e.g. macOS sysctl gives "SSE4.1", Linux /proc/cpuinfo gives "sse4_1"?
Demonstration of runtime dispatch/fat binaries. I think this is useful, but also needs documentation on performance penalties, and some actual benchmarks.

Let me know what you think.

Use /proc/cpuinfo (Linux), sysctl (macOS) to print list of available capabilities of host CPU.

Vendor VectorClass v1.25 code for instruction set detection from upstream library: http://www.agner.org/optimize/#vectorclass Implement minimal use of interface to print an integer representing the highest instruction set provided by host system. Add a basic CMake script to build program, and extend README to document its use.

Vendor VectorClass v1.28 code for instruction set detection from upstream library: http://www.agner.org/optimize/#vectorclass Implement minimal use of interface to print an integer representing the highest instruction set provided by host system. Add a basic CMake script to build program, and extend README to document its use.

Add method to filter and print just the SIMD capabilities from the full CPU caps listing. Add CLI arguments to make script a friendlier program for querying al or just SIMD capabilities. Implement usage/help arguments/functions.

Make it print out supported SIMD sets in human readable form.

…ckaging into instruction-set-detection

drbenmorgan · 2018-06-25T16:30:32Z

@amadio I couldn't add you as a reviewer, but your feedback would be very welcome here in light of the overlap with VecCore!

Implement dumb program to print message when SIMD preprocessor macros like __SSE__ are defined. Compile the program into several exes, distinguished by different values for the -march or -m flags. Document behaviour and ability to compile "Illegal instruction" code. Briefly outline "dispatch by configuration management" method.

amadio · 2018-06-26T08:02:23Z

Hi @drbenmorgan, interesting project. However, I don't understand the objective that well. Do you want to query SIMD properties of a machine to add proper build flags in the build system? Or do you want to have some way for testing at runtime what is supported to call the right code? I will go through the code with more time and add specific comments later.

For your reference, I gave a talk for the vectorization working group of the IXPUG a while ago, and you can check out the slides here. The IXPUG has lots of resources for this sort of thing. There is also another project made by a Gentoo dev that does part of what you are doing here. It's meant to detect what SIMD is supported by the CPU, so you can add the proper configuration to Portage. It currently supports Intel and ARM CPUs. I think the way it's implemented there is simpler than what is in VCL.

drbenmorgan · 2018-06-26T08:29:52Z

Hi @amadio,

Hi @drbenmorgan, interesting project. However, I don't understand the objective that well. Do you want to query SIMD properties of a machine to add proper build flags in the build system? Or do you want to have some way for testing at runtime what is supported to call the right code? I will go through the code with more time and add specific comments later.

It's the later more than the former. Given that we'd like to distribute binary packages and these may run on a range of CPU families, what techniques are available to ensure the "compatible and most performant" code is run on a client CPU.

For your reference, I gave a talk for the vectorization working group of the IXPUG a while ago, and you can check out the slides here. The IXPUG has lots of resources for this sort of thing. There is also another project made by a Gentoo dev that does part of what you are doing here. It's meant to detect what SIMD is supported by the CPU, so you can add the proper configuration to Portage. It currently supports Intel and ARM CPUs. I think the way it's implemented there is simpler than what is in VCL.

Thanks, those are very useful! I think this PR as it stands though is more focussed on runtime than build time, and the later could be addressed separately (indeed, part of the project would be to not be smart about selecting flags!).

amadio · 2018-06-26T08:55:46Z

If your intent is to do runtime checks for SIMD features, I think that implementing something like the intrinsic _may_i_use_cpu_feature from ICC in a way that works for all compilers would be the best way to go. Also, we should map the CPU features to the ones used there. I think that in some places you simplified AVX512 support, which has different versions (e.g. KNL, Skylake) with different subsets supported.

As for selecting flags, if you want a multi-arch binary, you have to select them anyway, so a mechanism needs to be in place for it. Vc has a system to compile for multiple architectures, may be worth having a look.

drbenmorgan · 2018-06-26T09:12:44Z

@amadio I think I oversold the intent of this PR, so I'll make a few changes to clarify the very limited nature of its aim as a minimal demo (but I agree with your points long term!)

drbenmorgan added 6 commits June 24, 2018 17:40

Implement minimal CPU capabilities query

4bca5c0

Use /proc/cpuinfo (Linux), sysctl (macOS) to print list of available capabilities of host CPU.

Implement initial CLI for ist-detect script

ceb2443

Add method to filter and print just the SIMD capabilities from the full CPU caps listing. Add CLI arguments to make script a friendlier program for querying al or just SIMD capabilities. Implement usage/help arguments/functions.

Update minimal C/C++ SIMD detection program

ca163a2

Make it print out supported SIMD sets in human readable form.

Merge branch 'instruction-set-detection' of github.com:drbenmorgan/pa…

fa94ae4

…ckaging into instruction-set-detection

drbenmorgan added enhancement help wanted labels Jun 25, 2018

drbenmorgan requested a review from graeme-a-stewart June 25, 2018 16:29

drbenmorgan added 2 commits June 25, 2018 19:14

Build VCL as a static library

76c1142

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Instruction set detection/dispatch #16

WIP: Instruction set detection/dispatch #16

drbenmorgan commented Jun 25, 2018

drbenmorgan commented Jun 25, 2018

amadio commented Jun 26, 2018

drbenmorgan commented Jun 26, 2018

amadio commented Jun 26, 2018 •

edited

Loading

drbenmorgan commented Jun 26, 2018

WIP: Instruction set detection/dispatch #16

Are you sure you want to change the base?

WIP: Instruction set detection/dispatch #16

Conversation

drbenmorgan commented Jun 25, 2018

drbenmorgan commented Jun 25, 2018

amadio commented Jun 26, 2018

drbenmorgan commented Jun 26, 2018

amadio commented Jun 26, 2018 • edited Loading

drbenmorgan commented Jun 26, 2018

amadio commented Jun 26, 2018 •

edited

Loading