Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake: add USE_ARCH_INTRINSICS and USE_COMPILER_INTRINSICS CMake options #1149

Merged
merged 9 commits into from
Jun 20, 2024

Conversation

illwieckz
Copy link
Member

@illwieckz illwieckz commented May 15, 2024

This introduces the USE_ARCH_INTRINSICS CMake option. It is enabled by default.

Disabling it is meant to disable custom asm code and usage of intrinsincs functions for the target platform in the Dæmon code base, it may also be used by games built with the Dæmon common code base.

It is not meant to disable asm or intrinsincs usage in third-party libraries.

It is not meant to prevent the compiler to use such intrinsics in its optimization passes.

It is not meant to disable the compiler flags we set to tell the compiler to try to use such intrinsics in its optimization passes. For this, one should disable USE_CPU_RECOMMENDED_FEATURES instead.

For obvious reason the asm code in the BREAKPOINT() implementation is not meant to be disabled by USE_ARCH_INTRINSICS, the alternate code is to do nothing anyway.

The macro syntax is: DAEMON_ARCH_INTRINSICS_(architecture)[_extension]

Examples:

  • DAEMON_ARCH_INTRINSICS_i686: i686 specific code, including asm code.
  • DAEMON_ARCH_INTRINSICS_i686_sse: i686 SSE specific code.
  • DAEMON_ARCH_INTRINSICS_i686_sse2: i686 SSE2 specific code.

If a platform inherits feature from an parent platform, the parent platform name is used. For example on amd64, the definition enabling SSE code is DAEMON_ARCH_INTRINSICS_i686_sse, enabling SSE code on both i686 with SSE and amd64 platforms, and both DAEMON_ARCH_INTRINSICS_amd64 and DAEMON_ARCH_INTRINSICS_i686 are available.

This also introduces USE_COMPILER_BUILTINS CMake option. It is enabled by default.

Disabling it is meant to test the unknown compiler code.

@illwieckz
Copy link
Member Author

@illwieckz
Copy link
Member Author

This will make possible to test the non-SSE code and things like that in CI or in a local build.

@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 4 times, most recently from d649d3f to 9622187 Compare May 15, 2024 10:15
Copy link
Member

@slipher slipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about compiler-provided functions that are designed to get intrinsics to be used, but are provided for architectures without them as well? The functions used to implement CountTrailingZerores, __builtin_ctz* and BitScanForward* are like this. Also a lot of other things beginning with __builtin_.

src/common/Platform.h Outdated Show resolved Hide resolved
@illwieckz
Copy link
Member Author

What about compiler-provided functions that are designed to get intrinsics to be used, but are provided for architectures without them as well? The functions used to implement CountTrailingZerores, __builtin_ctz* and BitScanForward* are like this. Also a lot of other things beginning with __builtin_.

I don't really know about them. Is the question about do we want to disable them or not?

The purpose of this PR is to be able to test all our code base, I consider a compiler-provided function not within scope of our test bed, the same way it's not our responsibility to check all alternate code GLM may implement for a single function. So this option is not to configure how compiler-provided functions behave. We should trust those compiler-provided functions.

What I want is to prevent code dying or rotting because we always compile the optimized code we wrote.

@slipher
Copy link
Member

slipher commented May 15, 2024

Disabling such functions would help to test all our code. I provided a platform-independent version of CountTrailingZeroes, but I don't think it's possible to actually use it on any supported compiler. So disabling those functions would help to test it. On the other hand, one might argue that things not used on any supported compiler should be nuked.

@illwieckz
Copy link
Member Author

illwieckz commented May 15, 2024

About compiler-provided functions, the same need will surface if we do something like:

#ifdef __GNUC__
compilerFunction();
#else
genericCode();
#endif

But maybe we would prefer another CMake option and compiler define.

@illwieckz
Copy link
Member Author

illwieckz commented May 15, 2024

So reading the source, yes it looks like we can benefit from some USE_COMPILER_BUILTINS option to disable.

On the other hand, exotic compilers are more rare than exotic platforms, almost everything can be built using a GCC or Clang-based compiler today. Clang is becoming the Chromium of compilers… with Clang pretending to be GCC like Chromium itself pretending to be Mozilla…

@illwieckz
Copy link
Member Author

We can do that:

diff --git a/src/common/Compiler.h b/src/common/Compiler.h
index c3649c5a7..592ee8076 100644
--- a/src/common/Compiler.h
+++ b/src/common/Compiler.h
@@ -40,7 +40,7 @@ int CountTrailingZeroes(unsigned long x);
 int CountTrailingZeroes(unsigned long long x);
 
 // GCC and Clang
-#ifdef __GNUC__
+#if defined(DAEMON_USE_COMPILER_BUILTINS) && defined( __GNUC__ )
 
 // Emit a nice warning when a function is used
 #define DEPRECATED __attribute__((__deprecated__))
@@ -133,7 +133,7 @@ inline int CountTrailingZeroes(unsigned long x) { return __builtin_ctzl(x); }
 inline int CountTrailingZeroes(unsigned long long x) { return __builtin_ctzll(x); }
 
 // Microsoft Visual C++
-#elif defined( _MSC_VER )
+#elif defined(DAEMON_USE_COMPILER_BUILTINS) && defined( _MSC_VER )
 
 // Disable some warnings
 #pragma warning(disable : 4100) // unreferenced formal parameter

@illwieckz
Copy link
Member Author

I can do a commit for that.

@slipher
Copy link
Member

slipher commented May 15, 2024

So this option is not to configure how compiler-provided functions behave. We should trust those compiler-provided functions.

The two cases seem pretty similar to me. Yes, we trust the intrinsic/compiler-provided function. We want to be able to test both the version of our code with the intrinsic/compiler-provided function and the version without the intrinsic/compiler-provided function.

About compiler-provided functions, the same need will surface if we do something like:

#ifdef __GNUC__
compilerFunction();
#else
genericCode();
#endif

But maybe we would prefer another CMake option and compiler define.

The problem with a "generic compiler" option would be that the program is likely to not actually build or work, since some of the compiler-specific things are required. For example, you may get an error about a function not returning a value when NORETURN or UNREACHABLE is not implemented. At runtime things may break if ALIGNED is not implemented.

A "generic compiler" option may end up being a near superset of disabling intrinsics. SSE-family intrinsics are a rare case where there is some sort of standard syntax across compilers. I think most other things will be platform-specific, e.g. inline assembly.

There is not really a bright line between "true" intrinsics, and functions designed to be implemented with intrinsics. An optimizing compiler may remove an intrinsic function and implement it in a different way. For example your compiler replaced the _mm_store_ps followed by scalar addition with an all-SSE implementation. Maybe some compilers would implement _mm_* functions even on a non-SSE platform. I read of a library that implements them using ARM Neon instructions. An intrinsic is just a compiler-provided function that is not implemented on some platforms.

For these reasons I favor including __builtin_ functions, etc., under the umbrella of USE_CPU_INTRINSICS.

@illwieckz
Copy link
Member Author

For these reasons I favor including __builtin_ functions, etc., under the umbrella of USE_CPU_INTRINSICS.

OK that makes sense.

I read of a library that implements them using ARM Neon instructions.

Yes there is at least sse2neon and SIMD Everywhere (that does more than ARM, it can even be used for WebAssembly).

I used sse2neon with an old version of Embree used by LuxMark3 and I used SIMDe to with LuxMark3 to port LuxMark3 on ARM.

I may evaluate SIMDe for Dæmon on ARM64 at some point (we already require NEON).

@illwieckz illwieckz changed the title cmake: add USE_CPU_INTRINSICS CMake option and rewrite the definitions cmake: add USE_INTRINSICS CMake option and rewrite the definitions May 16, 2024
…tomization code

- move code using compiler intrinsics to a dedicated place
- group compiler customization code
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 2 times, most recently from ddd7eb8 to aaca763 Compare May 16, 2024 11:58
@illwieckz
Copy link
Member Author

Interesting, disabling arch builtins break that unit test:

/home/vsts/work/1/s/src/common/cm/unittest.cpp:168: Failure
Expected equality of these values:
  tr.fraction
    Which is: 0.19213901
  0.1921389

Log here.

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

So, there is now:

  • USE_ARCH_INTRINSICS which is meant to enable asm code, SSE and things like CountTrailingZeroes.
    This make possible to test alternate code for when the architecture is unknown.
  • USE_COMPILER_BUILTINS which is meant to disable specific keywords using compiler-specific builtins like UNREACHABLE and things like that.
    This make possible to test the alternate code for when the compiler is unknown.

I rewrote a bit parts of Compiler.h, for example defining C++ features according to C++ version definitions instead of compiler definitions.

In the end, while not being the purpose of this PR, the compilation of the server and the TTY client with C++11 is working again. The graphical client makes use of C++14 features and there is no plan to change that (it's fine). But all of this means the code is better written and is more robust, being more compatible with older or unknown compilers.

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

I don't know why but the appveyor build fails on some curious errors…

engine/renderer/tr_types.h(142,15): error C2220:
  the following warning is treated as an error
  (compiling source file common\cm\cm_load.cpp) [build\engine-lib.vcxproj]
engine/renderer/tr_types.h(142,15): warning C4324:
  'refBone_t': structure was padded due to alignment specifier
  (compiling source file common\cm\cm_load.cpp) [C:\projects\daemon\build\engine-lib.vcxproj]

See https://ci.appveyor.com/project/DolceTriade/daemon/builds/49827605/job/tgqld13lrg696s61

@illwieckz illwieckz changed the title cmake: add USE_INTRINSICS CMake option and rewrite the definitions cmake: add USE_ARCH_INTRINSICS and USE_COMPILER_BUILTINS CMake options May 16, 2024
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch from aaca763 to 18a7772 Compare May 16, 2024 12:42
@illwieckz
Copy link
Member Author

Well, there was a more meaningful error after that:

common\FileSystem.cpp(309,22):
  error C2694: 'const char *FS::minizip_category_impl::name(void) const':
  overriding virtual function has less restrictive exception specification than base class virtual member function 'const char *std:

MSVC doesn't properly report __cplusplus by default to keep compatibility with broken legacy code:

@illwieckz
Copy link
Member Author

I've fixed the problem related to __cplusplus, but the padding error still remains.

@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch from 074ba03 to b216773 Compare May 16, 2024 13:20
@illwieckz illwieckz changed the title cmake: add USE_ARCH_INTRINSICS and USE_COMPILER_INTRINSINCS CMake options cmake: add USE_ARCH_INTRINSICS and USE_COMPILER_INTRINSICS CMake options May 25, 2024
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch from 02d4d8a to cc4226b Compare May 25, 2024 11:59
@illwieckz
Copy link
Member Author

I removed the HACK including xmmintrin.h in non-SSE MSVC contexts now that the build doesn't fail thanks to:

src/common/Platform.h Outdated Show resolved Hide resolved
src/common/Compiler.h Outdated Show resolved Hide resolved
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 5 times, most recently from 75290fd to 74aae51 Compare May 25, 2024 19:58
@illwieckz
Copy link
Member Author

So there is now:

  • USE_ARCH_INTRINSICS, for enabling custom code making use of SSE and asm and things like that.
  • USE_COMPILER_INTRINSICS, for enabling custom CountTrailingZeroes and things like that.
  • USE_COMPILER_CUSTOMIZATION, for enabling custom compiler attributes and operators.

They are all enabled by default and marked as advanced options (not displayed by default to the user in ccmake).

One can disable them either to test non-SSE code, to test code not making use of compiler builtins like __builtin_ctz, or the unknown compiler preprocessor code path.

@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch from 74aae51 to f1736e2 Compare May 26, 2024 00:02
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 3 times, most recently from d80f4ad to 3b7fafb Compare May 28, 2024 09:53
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 2 times, most recently from ce2b19c to ac8cf02 Compare May 28, 2024 11:11
@illwieckz
Copy link
Member Author

I reworded a bit the way fallback definitions are done: when there is a known implementation the define is set, and at the end the fallback is set if nothing defined it before. It avoids copy-pasting the fallback definitions for every condition with the risk to miss some.

I also reworded this:

int CountTrailingZeroes(unsigned int x);
int CountTrailingZeroes(unsigned long x);
int CountTrailingZeroes(unsigned long long x);
inline int CountTrailingZeroes(unsigned int x) { return __builtin_ctz(x); }
inline int CountTrailingZeroes(unsigned long x) { return __builtin_ctzl(x); }
inline int CountTrailingZeroes(unsigned long long x) { return __builtin_ctzll(x); }

Into this:

inline int CountTrailingZeroes(unsigned int x) { return __builtin_ctz(x); }
inline int CountTrailingZeroes(unsigned long x) { return __builtin_ctzl(x); }
inline int CountTrailingZeroes(unsigned long long x) { return __builtin_ctzll(x); }

I also removed the useless warning on missing BREAKPOINT implementation, we never did anything with it and we silence it for NaCl since forever.

I believe I have nothing to add within the scope of that PR.

Alongside the addition of CMake options, the way things are done are now much more meaningful.

There can be other things done later like evaluating if things are still needed, but I think it's better this way as a first step, and future clean-up commits may even be simpler.

I noticed the server and tty still build with c++11 (and run properly after that) so for now I see no reason to drop the related fallbacks.

This looks ready to me.

@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch from ac8cf02 to 1e530ef Compare May 28, 2024 13:54
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch from 1e530ef to 9837dd8 Compare June 15, 2024 11:27
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 2 times, most recently from ecd69b5 to f022768 Compare June 20, 2024 04:14
Copy link
Member

@slipher slipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/common/Compiler.h Outdated Show resolved Hide resolved
@illwieckz illwieckz force-pushed the illwieckz/use-intrinsics branch 2 times, most recently from 7952958 to 3cbe309 Compare June 20, 2024 08:52
@illwieckz illwieckz merged commit da2d24b into master Jun 20, 2024
9 checks passed
@illwieckz illwieckz deleted the illwieckz/use-intrinsics branch June 20, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants