Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss ALPAKA_ASSERT_OFFLOAD #2186

Closed
fwyzard opened this issue Oct 31, 2023 · 5 comments · Fixed by #2199
Closed

Discuss ALPAKA_ASSERT_OFFLOAD #2186

fwyzard opened this issue Oct 31, 2023 · 5 comments · Fixed by #2199
Labels
State:Duplicate duplicate issue or pull-request (link main issue!) Type:Enhancement Type:Question Type:Refactoring

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Oct 31, 2023

I find the current implementation of ALPAKA_ASSERT_OFFLOAD very confusing.

It's currently defined as

#if defined(ALPAKA_DEBUG_OFFLOAD_ASSUME_HOST) || defined(SYCL_EXT_ONEAPI_ASSERT)
#    define ALPAKA_ASSERT_OFFLOAD(EXPRESSION) ALPAKA_ASSERT(EXPRESSION)
#elif defined __AMDGCN__ && (!defined NDEBUG)
#    define ALPAKA_ASSERT_OFFLOAD(EXPRESSION)                                         \
        do                                                                            \
        {                                                                             \
            if(!(EXPRESSION))                                                         \
                __builtin_trap();                                                     \
        } while(false)
#else
#    define ALPAKA_ASSERT_OFFLOAD(EXPRESSION)                                         \
        do                                                                            \
        {                                                                             \
        } while(false)
#endif

My understanding is that

  • while building CPU code - including code for CPU-based accelerators - ALPAKA_ASSERT_OFFLOAD() becomes assert(), that is controlled by NDEBUG;
  • while building CUDA code, ALPAKA_ASSERT_OFFLOAD() is always disabled;
  • while building HIP/ROCm code, ALPAKA_ASSERT_OFFLOAD() is controlled by NDEBUG;
  • while building SYCL/oneAPI code, ALPAKA_ASSERT_OFFLOAD() becomes assert(), that is controlled by NDEBUG.

However, if ALPAKA_DEBUG_OFFLOAD_ASSUME_HOST is defined, ALPAKA_ASSERT_OFFLOAD() always becomes assert(), and is controlled by NDEBUG.

I find two aspects problematic:

  • ALPAKA_ASSERT_OFFLOAD() is disabled by default for CUDA targets, and it can only be enabled by setting ALPAKA_DEBUG_OFFLOAD_ASSUME_HOST;
  • ALPAKA_ASSERT_OFFLOAD() cannot be disabled for GPU targets without setting NDEBUG, which disabled asserts also for the standard CPU compilation.

In the CMS software we are using a somewhat different approach:

  • assert() are always enabled for CPU code (because we never define NDEBUG);
  • assert() are disabled by default in CUDA code, and can be enabled by setting the GPU_DEBUG macro.

The rationale is that in host code assert() are cheap enough and the benefit from the error checking outweighs the minor performance hit we may have, while in CUDA code the impact can be significant1 so we enable assert() only while debugging.

As we are porting our code from CUDA to Alpaka, we are looking to implement a similar functionality:

  • assert() are always enabled for CPU code (because we never define NDEBUG);
  • assert() are disabled by default in CUDA, HIP/ROCm and SYCL/oneAPI code, and can be enabled setting an appropriate preprocessor macro.

Is this an approach that would be useful in general for Alpaka ?

If so, we can make a PR to rework ALPAKA_ASSERT_OFFLOAD(); otherwise we can implement it in CMSSW.

Footnotes

  1. this is due in part due to the extra checks and trap() functionality, in part due to the impact on register usage caused by potentially printing messages.

@psychocoderHPC
Copy link
Member

IMO the strange behavior is a BUG and should be fixed, the macro should be guarded by NDEBUG

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 3, 2023

If it is guarded only by NDEBUG, should it replace ALPAKA_ASSERT ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 4, 2023

Looks like I'm a repetitive person... this is mostly a duplicate of #2001 .

@fwyzard fwyzard added State:Duplicate duplicate issue or pull-request (link main issue!) Type:Enhancement Type:Question Type:Refactoring labels Nov 4, 2023
@psychocoderHPC
Copy link
Member

If it is guarded only by NDEBUG, should it replace ALPAKA_ASSERT ?

I think ALPAKA_ASSERT_OFFLOAD should be removed and ALPAKA_ASSERT should be used instead. The offload version was a workaround from when we added OpenAcc and OpenMP offload. IMO the OFFLOAD version is not required anymore.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Dec 1, 2023
fix alpaka-group#2186
fix alpaka-group#2001

Provide a device side assert `ALPAKA_ASSERT_ACC` which can be disabled
by defining `ALPAKA_DISABLE_ASSERT_ACC` in the C++ code or by the CMake
option `alpaka_ASSERT_ACC_ENABLE`
@psychocoderHPC
Copy link
Member

I opened #2199 to solve the issue.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Dec 19, 2023
fix alpaka-group#2186
fix alpaka-group#2001

Provide a device side assert `ALPAKA_ASSERT_ACC` which can be disabled
by defining `ALPAKA_DISABLE_ASSERT_ACC` the C++ code or by the CMake
option `alpaka_ASSERT_ACC_ENABLE`.
For CPU devices or host side code the assert behaves
like`ALPAKA_ASSERT`.

Co-authored-by: Andrea Bocci <[email protected]>
psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Dec 20, 2023
fix alpaka-group#2186
fix alpaka-group#2001

Provide a device side assert `ALPAKA_ASSERT_ACC` which can be disabled
by defining `ALPAKA_DISABLE_ASSERT_ACC` the C++ code or by the CMake
option `alpaka_ASSERT_ACC_ENABLE`.
For CPU devices or host side code the assert behaves
like`ALPAKA_ASSERT`.

Co-authored-by: Andrea Bocci <[email protected]>
bernhardmgruber pushed a commit that referenced this issue Dec 20, 2023
fix #2186
fix #2001

Provide a device side assert `ALPAKA_ASSERT_ACC` which can be disabled
by defining `ALPAKA_DISABLE_ASSERT_ACC` the C++ code or by the CMake
option `alpaka_ASSERT_ACC_ENABLE`.
For CPU devices or host side code the assert behaves
like`ALPAKA_ASSERT`.

Co-authored-by: Andrea Bocci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
State:Duplicate duplicate issue or pull-request (link main issue!) Type:Enhancement Type:Question Type:Refactoring
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants