-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Instruction set detection/dispatch #16
base: master
Are you sure you want to change the base?
Conversation
Use /proc/cpuinfo (Linux), sysctl (macOS) to print list of available capabilities of host CPU.
Vendor VectorClass v1.25 code for instruction set detection from upstream library: http://www.agner.org/optimize/#vectorclass Implement minimal use of interface to print an integer representing the highest instruction set provided by host system. Add a basic CMake script to build program, and extend README to document its use.
Vendor VectorClass v1.28 code for instruction set detection from upstream library: http://www.agner.org/optimize/#vectorclass Implement minimal use of interface to print an integer representing the highest instruction set provided by host system. Add a basic CMake script to build program, and extend README to document its use.
Add method to filter and print just the SIMD capabilities from the full CPU caps listing. Add CLI arguments to make script a friendlier program for querying al or just SIMD capabilities. Implement usage/help arguments/functions.
Make it print out supported SIMD sets in human readable form.
…ckaging into instruction-set-detection
@amadio I couldn't add you as a reviewer, but your feedback would be very welcome here in light of the overlap with VecCore! |
Implement dumb program to print message when SIMD preprocessor macros like __SSE__ are defined. Compile the program into several exes, distinguished by different values for the -march or -m flags. Document behaviour and ability to compile "Illegal instruction" code. Briefly outline "dispatch by configuration management" method.
Hi @drbenmorgan, interesting project. However, I don't understand the objective that well. Do you want to query SIMD properties of a machine to add proper build flags in the build system? Or do you want to have some way for testing at runtime what is supported to call the right code? I will go through the code with more time and add specific comments later. For your reference, I gave a talk for the vectorization working group of the IXPUG a while ago, and you can check out the slides here. The IXPUG has lots of resources for this sort of thing. There is also another project made by a Gentoo dev that does part of what you are doing here. It's meant to detect what SIMD is supported by the CPU, so you can add the proper configuration to Portage. It currently supports Intel and ARM CPUs. I think the way it's implemented there is simpler than what is in VCL. |
Hi @amadio,
It's the later more than the former. Given that we'd like to distribute binary packages and these may run on a range of CPU families, what techniques are available to ensure the "compatible and most performant" code is run on a client CPU.
Thanks, those are very useful! I think this PR as it stands though is more focussed on runtime than build time, and the later could be addressed separately (indeed, part of the project would be to not be smart about selecting flags!). |
If your intent is to do runtime checks for SIMD features, I think that implementing something like the intrinsic As for selecting flags, if you want a multi-arch binary, you have to select them anyway, so a mechanism needs to be in place for it. Vc has a system to compile for multiple architectures, may be worth having a look. |
@amadio I think I oversold the intent of this PR, so I'll make a few changes to clarify the very limited nature of its aim as a minimal demo (but I agree with your points long term!) |
This is a small WIP on tools/examples on coding/packaging of instruction set specific code (SIMD etc). At present, it simply implements:
I'm requesting an initial review now to solicit comments of the remaining items:
sysctl
gives "SSE4.1", Linux/proc/cpuinfo
gives "sse4_1"?Let me know what you think.