Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn TEST_[HALF|BF]_T into function-style macros and fix some tests #3608

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jan 30, 2025

This is pulled out from #3384 and paves the way for some future changes. The TEST_HALF_T and TEST_BF_T macros are changed to function-style macros and some tests are fixes, which checked with #ifdef TEST_HALF_H but forgot to include a header that defined the macros.

Comment on lines 33 to 34
# if defined(_CCCL_HAS_NVFP16) && defined(_LIBCUDACXX_HAS_NVFP16)
# define TEST_HALF_T() 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_CCCL_HAS_NVFP16 only tells us that the CTK headers are present, but _LIBCUDACXX_HAS_NVFP16 is needed to know that libcu++ also provides traits, limits, etc. on host and device.

@bernhardmgruber bernhardmgruber force-pushed the fix_test_fp16_macros branch 2 times, most recently from 269dfbe to 183e65c Compare January 31, 2025 10:46
Copy link
Contributor

🟨 CI finished in 1h 12m: Pass: 96%/89 | Total: 1d 11h | Avg: 24m 05s | Max: 54m 19s | Hits: 411%/10018
  • 🟨 cub: Pass: 93%/44 | Total: 1d 04h | Avg: 39m 02s | Max: 54m 19s | Hits: 538%/2634

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/42  | Total:  1d 02h | Avg: 38m 29s | Max: 52m 09s | Hits: 538%/2634  
      🟩 arm64              Pass: 100%/2   | Total:  1h 41m | Avg: 50m 56s | Max: 54m 19s
    🔍 ctk: 12.0 🔍
      🔍 12.0               Pass:  40%/5   | Total:  3h 25m | Avg: 41m 06s | Max: 47m 22s
      🟩 12.5               Pass: 100%/2   | Total:  1h 38m | Avg: 49m 04s | Max: 49m 16s
      🟩 12.6               Pass: 100%/37  | Total: 23h 34m | Avg: 38m 13s | Max: 54m 19s | Hits: 538%/2634  
    🔍 cudacxx: nvcc12.0 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 43m | Avg: 51m 47s | Max: 52m 09s
      🔍 nvcc12.0           Pass:  40%/5   | Total:  3h 25m | Avg: 41m 06s | Max: 47m 22s
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 38m | Avg: 49m 04s | Max: 49m 16s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 21h 50m | Avg: 37m 27s | Max: 54m 19s | Hits: 538%/2634  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 47s | Max: 52m 09s
      🔍 nvcc               Pass:  92%/42  | Total:  1d 02h | Avg: 38m 26s | Max: 54m 19s | Hits: 538%/2634  
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 44m 25s | Avg: 22m 12s | Max: 24m 03s
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 39m | Avg: 27m 28s | Max: 45m 31s
      🔍 v100               Pass:  91%/34  | Total:  1d 00h | Avg: 42m 45s | Max: 54m 19s | Hits: 538%/2634  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  91%/37  | Total:  1d 02h | Avg: 42m 17s | Max: 54m 19s | Hits: 538%/2634  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 55s | Avg: 21m 55s | Max: 21m 55s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 11s | Avg: 17m 11s | Max: 17m 11s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 27s | Max: 24m 05s
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 51s | Avg: 21m 55s | Max: 22m 31s
    🟨 cxx
      🟨 Clang14            Pass:  50%/4   | Total:  2h 57m | Avg: 44m 15s | Max: 46m 36s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 24m | Avg: 42m 16s | Max: 42m 54s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 24m | Avg: 42m 22s | Max: 43m 42s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 29m | Avg: 44m 35s | Max: 46m 14s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 47m | Avg: 41m 06s | Max: 54m 19s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 23m | Avg: 41m 45s | Max: 42m 00s
      🟩 GCC8               Pass: 100%/1   | Total: 41m 19s | Avg: 41m 19s | Max: 41m 19s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 31m | Avg: 45m 51s | Max: 47m 22s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 35m | Avg: 47m 38s | Max: 47m 41s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 30m | Avg: 45m 19s | Max: 46m 43s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 09m | Avg: 32m 22s | Max: 43m 34s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 57m | Avg: 29m 42s | Max: 47m 33s
      🟨 MSVC14.29          Pass:  50%/2   | Total:  1h 01m | Avg: 30m 54s | Max: 33m 49s | Hits: 538%/878   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 05m | Avg: 32m 40s | Max: 33m 14s | Hits: 538%/1756  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 38m | Avg: 49m 04s | Max: 49m 16s
    🟨 cxx_family
      🟨 Clang              Pass:  88%/17  | Total: 12h 03m | Avg: 42m 32s | Max: 54m 19s
      🟩 GCC                Pass: 100%/21  | Total: 12h 49m | Avg: 36m 38s | Max: 47m 41s
      🟨 MSVC               Pass:  75%/4   | Total:  2h 07m | Avg: 31m 47s | Max: 33m 49s | Hits: 538%/2634  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 38m | Avg: 49m 04s | Max: 49m 16s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 44m 25s | Avg: 22m 12s | Max: 24m 03s
      🟩 90a                Pass: 100%/1   | Total: 17m 45s | Avg: 17m 45s | Max: 17m 45s
    🟨 std
      🟨 17                 Pass:  90%/20  | Total: 14h 00m | Avg: 42m 02s | Max: 51m 25s | Hits: 538%/1756  
      🟨 20                 Pass:  95%/24  | Total: 14h 37m | Avg: 36m 33s | Max: 54m 19s | Hits: 538%/878   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 6h 33m | Avg: 9m 22s | Max: 37m 40s | Hits: 365%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 45s | Avg:  8m 22s | Max: 10m 52s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  6h 23m | Avg:  9m 35s | Max: 37m 40s | Hits: 365%/7384  
      🟩 arm64              Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  5m 07s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 45m 24s | Avg:  9m 04s | Max: 24m 28s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 30m 02s | Avg: 15m 01s | Max: 15m 41s
      🟩 12.6               Pass: 100%/35  | Total:  5h 18m | Avg:  9m 05s | Max: 37m 40s | Hits: 365%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 14s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 45m 24s | Avg:  9m 04s | Max: 24m 28s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 02s | Avg: 15m 01s | Max: 15m 41s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  5h 07m | Avg:  9m 19s | Max: 37m 40s | Hits: 365%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 14s
      🟩 nvcc               Pass: 100%/40  | Total:  6h 23m | Avg:  9m 34s | Max: 37m 40s | Hits: 365%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 51s | Avg:  5m 12s | Max:  5m 41s
      🟩 Clang15            Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 24s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  5m 53s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  5m 52s
      🟩 Clang18            Pass: 100%/7   | Total: 43m 50s | Avg:  6m 15s | Max: 10m 12s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 39s
      🟩 GCC8               Pass: 100%/1   | Total: 37m 40s | Avg: 37m 40s | Max: 37m 40s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 53s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 28s | Avg:  5m 44s | Max:  5m 45s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 47s | Avg:  5m 53s | Max:  6m 18s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 41s
      🟩 GCC13              Pass: 100%/8   | Total: 57m 58s | Avg:  7m 14s | Max: 11m 15s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 52m 20s | Avg: 26m 10s | Max: 27m 52s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 58m 52s | Avg: 29m 26s | Max: 30m 45s | Hits: 365%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 02s | Avg: 15m 01s | Max: 15m 41s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 38m | Avg:  5m 47s | Max: 10m 12s
      🟩 GCC                Pass: 100%/19  | Total:  2h 33m | Avg:  8m 06s | Max: 37m 40s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 51m | Avg: 27m 48s | Max: 30m 45s | Hits: 365%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 02s | Avg: 15m 01s | Max: 15m 41s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 05m | Avg:  8m 09s | Max: 11m 15s
      🟩 v100               Pass: 100%/34  | Total:  5h 28m | Avg:  9m 39s | Max: 37m 40s | Hits: 365%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 46m | Avg:  9m 21s | Max: 37m 40s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 14s | Avg:  7m 37s | Max:  7m 38s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 19s | Avg: 10m 46s | Max: 11m 15s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 54s | Avg:  4m 54s | Max:  4m 54s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 39m | Avg: 10m 58s | Max: 37m 40s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/20  | Total:  2h 37m | Avg:  7m 52s | Max: 30m 45s | Hits: 365%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 58s | Avg: 3m 29s | Max: 4m 42s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  4m 42s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
      🟩 Test               Pass: 100%/1   | Total:  4m 42s | Avg:  4m 42s | Max:  4m 42s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
+/- Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Copy link
Contributor

🟩 CI finished in 1h 11m: Pass: 100%/89 | Total: 15h 24m | Avg: 10m 23s | Max: 37m 25s | Hits: 421%/10896
  • 🟩 cub: Pass: 100%/44 | Total: 8h 44m | Avg: 11m 55s | Max: 37m 25s | Hits: 539%/3512

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  8h 34m | Avg: 12m 15s | Max: 37m 25s | Hits: 539%/3512  
      🟩 arm64              Pass: 100%/2   | Total:  9m 55s | Avg:  4m 57s | Max:  5m 01s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 51m | Avg: 22m 23s | Max: 37m 25s | Hits: 538%/878   
      🟩 12.5               Pass: 100%/2   | Total: 18m 50s | Avg:  9m 25s | Max:  9m 41s
      🟩 12.6               Pass: 100%/37  | Total:  6h 34m | Avg: 10m 38s | Max: 33m 16s | Hits: 539%/2634  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  4m 31s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 51m | Avg: 22m 23s | Max: 37m 25s | Hits: 538%/878   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 50s | Avg:  9m 25s | Max:  9m 41s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 25m | Avg: 11m 00s | Max: 33m 16s | Hits: 539%/2634  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  4m 31s
      🟩 nvcc               Pass: 100%/42  | Total:  8h 35m | Avg: 12m 17s | Max: 37m 25s | Hits: 539%/3512  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 24m | Avg: 21m 05s | Max: 37m 25s
      🟩 Clang15            Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 36s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 39s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  5m 57s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 08m | Avg:  9m 48s | Max: 23m 20s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 38s | Avg:  5m 49s | Max:  5m 51s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  5m 44s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 38s | Avg:  5m 49s | Max:  5m 54s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 02s
      🟩 GCC12              Pass: 100%/4   | Total: 40m 49s | Avg: 10m 12s | Max: 24m 55s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 45m | Avg: 13m 13s | Max: 25m 00s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 56m 55s | Avg: 28m 27s | Max: 29m 44s | Hits: 538%/1756  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 03m | Avg: 31m 38s | Max: 33m 16s | Hits: 539%/1756  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 50s | Avg:  9m 25s | Max:  9m 41s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 06m | Avg: 10m 58s | Max: 37m 25s
      🟩 GCC                Pass: 100%/21  | Total:  3h 19m | Avg:  9m 29s | Max: 25m 00s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 00m | Avg: 30m 02s | Max: 33m 16s | Hits: 539%/3512  
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 50s | Avg:  9m 25s | Max:  9m 41s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 24m 55s
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 19m | Avg: 17m 26s | Max: 25m 00s
      🟩 v100               Pass: 100%/34  | Total:  5h 55m | Avg: 10m 28s | Max: 37m 25s | Hits: 539%/3512  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  6h 12m | Avg: 10m 03s | Max: 37m 25s | Hits: 539%/3512  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 54s | Avg: 19m 54s | Max: 19m 54s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 40s | Avg: 16m 40s | Max: 16m 40s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 13m | Avg: 24m 25s | Max: 25m 00s
      🟩 TestGPU            Pass: 100%/2   | Total: 42m 49s | Avg: 21m 24s | Max: 22m 34s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 24m 55s
      🟩 90a                Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 37m | Avg: 10m 52s | Max: 35m 51s | Hits: 538%/2634  
      🟩 20                 Pass: 100%/24  | Total:  5h 07m | Avg: 12m 48s | Max: 37m 25s | Hits: 539%/878   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 5h 59m | Avg: 8m 34s | Max: 30m 14s | Hits: 365%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 17s | Avg:  8m 08s | Max: 10m 33s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  5h 50m | Avg:  8m 45s | Max: 30m 14s | Hits: 365%/7384  
      🟩 arm64              Pass: 100%/2   | Total:  9m 48s | Avg:  4m 54s | Max:  5m 09s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 46m 15s | Avg:  9m 15s | Max: 24m 58s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 29m 05s | Avg: 14m 32s | Max: 14m 37s
      🟩 12.6               Pass: 100%/35  | Total:  4h 44m | Avg:  8m 07s | Max: 30m 14s | Hits: 365%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 34s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 46m 15s | Avg:  9m 15s | Max: 24m 58s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 05s | Avg: 14m 32s | Max: 14m 37s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  4h 33m | Avg:  8m 17s | Max: 30m 14s | Hits: 365%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 34s
      🟩 nvcc               Pass: 100%/40  | Total:  5h 48m | Avg:  8m 43s | Max: 30m 14s | Hits: 365%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 36s | Avg:  5m 24s | Max:  5m 47s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 38s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  5m 43s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  5m 57s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 23s | Avg:  6m 20s | Max: 10m 10s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 32s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  5m 50s
      🟩 GCC10              Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 34s
      🟩 GCC11              Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  5m 35s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 24s | Avg:  6m 12s | Max:  6m 29s
      🟩 GCC13              Pass: 100%/8   | Total: 57m 55s | Avg:  7m 14s | Max: 10m 59s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 51m 45s | Avg: 25m 52s | Max: 26m 47s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 58m 31s | Avg: 29m 15s | Max: 30m 14s | Hits: 365%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 05s | Avg: 14m 32s | Max: 14m 37s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 40m | Avg:  5m 53s | Max: 10m 10s
      🟩 GCC                Pass: 100%/19  | Total:  2h 00m | Avg:  6m 20s | Max: 10m 59s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 50m | Avg: 27m 34s | Max: 30m 14s | Hits: 365%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 05s | Avg: 14m 32s | Max: 14m 37s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 05m | Avg:  8m 08s | Max: 10m 59s
      🟩 v100               Pass: 100%/34  | Total:  4h 54m | Avg:  8m 40s | Max: 30m 14s | Hits: 365%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 12m | Avg:  8m 26s | Max: 30m 14s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 42s | Avg:  7m 51s | Max:  7m 53s
      🟩 TestGPU            Pass: 100%/3   | Total: 31m 42s | Avg: 10m 34s | Max: 10m 59s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 44s | Avg:  4m 44s | Max:  4m 44s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 04m | Avg:  9m 14s | Max: 28m 17s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/20  | Total:  2h 38m | Avg:  7m 56s | Max: 30m 14s | Hits: 365%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 57s | Avg: 3m 28s | Max: 4m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  4m 51s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
      🟩 Test               Pass: 100%/1   | Total:  4m 51s | Avg:  4m 51s | Max:  4m 51s
    
  • 🟩 python: Pass: 100%/1 | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
+/- Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber marked this pull request as draft January 31, 2025 16:31
Copy link

copy-pr-bot bot commented Jan 31, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bernhardmgruber
Copy link
Contributor Author

bernhardmgruber commented Jan 31, 2025

Temporarily switching back to testing FP16 types when CTK headers are available, since #3535 may change the game.

@bernhardmgruber
Copy link
Contributor Author

/ok to test

@bernhardmgruber bernhardmgruber changed the title Only test half/bfloat16 with C2H when libcu++ supports them Turn TEST_[HALF|BF]_T into function-style macros and fix some tests Jan 31, 2025
@bernhardmgruber bernhardmgruber marked this pull request as ready for review January 31, 2025 18:37
Copy link
Contributor

🟩 CI finished in 1h 12m: Pass: 100%/89 | Total: 1d 00h | Avg: 16m 16s | Max: 1h 04m | Hits: 402%/10896
  • 🟩 cub: Pass: 100%/44 | Total: 17h 29m | Avg: 23m 51s | Max: 1h 04m | Hits: 478%/3512

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total: 16h 53m | Avg: 24m 07s | Max:  1h 04m | Hits: 478%/3512  
      🟩 arm64              Pass: 100%/2   | Total: 36m 12s | Avg: 18m 06s | Max: 18m 13s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 46m | Avg: 33m 17s | Max:  1h 00m | Hits: 477%/878   
      🟩 12.5               Pass: 100%/2   | Total:  1h 31m | Avg: 45m 46s | Max: 45m 58s
      🟩 12.6               Pass: 100%/37  | Total: 13h 11m | Avg: 21m 24s | Max:  1h 04m | Hits: 478%/2634  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 23m 20s | Avg: 11m 40s | Max: 12m 04s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 46m | Avg: 33m 17s | Max:  1h 00m | Hits: 477%/878   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 31m | Avg: 45m 46s | Max: 45m 58s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 12h 48m | Avg: 21m 57s | Max:  1h 04m | Hits: 478%/2634  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 23m 20s | Avg: 11m 40s | Max: 12m 04s
      🟩 nvcc               Pass: 100%/42  | Total: 17h 06m | Avg: 24m 26s | Max:  1h 04m | Hits: 478%/3512  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 09m | Avg: 17m 23s | Max: 18m 05s
      🟩 Clang15            Pass: 100%/2   | Total: 35m 42s | Avg: 17m 51s | Max: 18m 00s
      🟩 Clang16            Pass: 100%/2   | Total: 32m 23s | Avg: 16m 11s | Max: 16m 23s
      🟩 Clang17            Pass: 100%/2   | Total: 33m 45s | Avg: 16m 52s | Max: 17m 54s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 57m | Avg: 16m 44s | Max: 23m 52s
      🟩 GCC7               Pass: 100%/2   | Total: 34m 11s | Avg: 17m 05s | Max: 17m 42s
      🟩 GCC8               Pass: 100%/1   | Total: 15m 52s | Avg: 15m 52s | Max: 15m 52s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 19m | Avg: 39m 40s | Max:  1h 00m
      🟩 GCC10              Pass: 100%/2   | Total: 35m 35s | Avg: 17m 47s | Max: 18m 52s
      🟩 GCC11              Pass: 100%/2   | Total: 36m 24s | Avg: 18m 12s | Max: 18m 34s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 17m | Avg: 19m 22s | Max: 24m 17s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 29m | Avg: 18m 42s | Max: 22m 02s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 00s | Max:  1h 04m | Hits: 478%/1756  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits: 478%/1756  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 31m | Avg: 45m 46s | Max: 45m 58s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  4h 48m | Avg: 16m 58s | Max: 23m 52s
      🟩 GCC                Pass: 100%/21  | Total:  7h 08m | Avg: 20m 24s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 01m | Avg:  1h 00m | Max:  1h 04m | Hits: 478%/3512  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 31m | Avg: 45m 46s | Max: 45m 58s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 42m 19s | Avg: 21m 09s | Max: 24m 17s
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 37m | Avg: 19m 41s | Max: 23m 52s
      🟩 v100               Pass: 100%/34  | Total: 14h 09m | Avg: 24m 59s | Max:  1h 04m | Hits: 478%/3512  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 15h 03m | Avg: 24m 24s | Max:  1h 04m | Hits: 478%/3512  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 48s | Avg: 19m 48s | Max: 19m 48s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 04s | Avg: 15m 04s | Max: 15m 04s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 20s | Max: 24m 17s
      🟩 TestGPU            Pass: 100%/2   | Total: 41m 33s | Avg: 20m 46s | Max: 22m 02s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 42m 19s | Avg: 21m 09s | Max: 24m 17s
      🟩 90a                Pass: 100%/1   | Total: 16m 43s | Avg: 16m 43s | Max: 16m 43s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  9h 01m | Avg: 27m 04s | Max:  1h 04m | Hits: 478%/2634  
      🟩 20                 Pass: 100%/24  | Total:  8h 28m | Avg: 21m 10s | Max:  1h 02m | Hits: 478%/878   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 6h 06m | Avg: 8m 42s | Max: 33m 59s | Hits: 365%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 15s | Avg:  8m 37s | Max: 10m 49s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  5h 55m | Avg:  8m 53s | Max: 33m 59s | Hits: 365%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 10m 09s | Avg:  5m 04s | Max:  5m 18s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 25s | Avg:  8m 53s | Max: 23m 39s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 30m 12s | Avg: 15m 06s | Max: 15m 21s
      🟩 12.6               Pass: 100%/35  | Total:  4h 51m | Avg:  8m 19s | Max: 33m 59s | Hits: 365%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 32s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 25s | Avg:  8m 53s | Max: 23m 39s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 12s | Avg: 15m 06s | Max: 15m 21s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  4h 40m | Avg:  8m 29s | Max: 33m 59s | Hits: 365%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 32s
      🟩 nvcc               Pass: 100%/40  | Total:  5h 54m | Avg:  8m 52s | Max: 33m 59s | Hits: 365%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 16s | Avg:  5m 19s | Max:  5m 52s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  6m 03s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  5m 52s
      🟩 Clang17            Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  5m 42s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 34s | Avg:  6m 22s | Max: 10m 17s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 54s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 57s | Avg:  5m 57s | Max:  5m 57s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 53s | Avg:  5m 26s | Max:  5m 32s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 37s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 07s | Avg:  5m 33s | Max:  5m 34s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 45s | Avg:  6m 22s | Max:  6m 31s
      🟩 GCC13              Pass: 100%/8   | Total: 58m 57s | Avg:  7m 22s | Max: 11m 05s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 51m 33s | Avg: 25m 46s | Max: 27m 54s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 02m | Avg: 31m 09s | Max: 33m 59s | Hits: 365%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 12s | Avg: 15m 06s | Max: 15m 21s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 39m | Avg:  5m 52s | Max: 10m 17s
      🟩 GCC                Pass: 100%/19  | Total:  2h 02m | Avg:  6m 25s | Max: 11m 05s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 27s | Max: 33m 59s | Hits: 365%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 12s | Avg: 15m 06s | Max: 15m 21s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 06m | Avg:  8m 15s | Max: 11m 05s
      🟩 v100               Pass: 100%/34  | Total:  4h 59m | Avg:  8m 49s | Max: 33m 59s | Hits: 365%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 18m | Avg:  8m 35s | Max: 33m 59s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max:  8m 30s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 11s | Avg: 10m 43s | Max: 11m 05s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 52s | Avg:  4m 52s | Max:  4m 52s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 05m | Avg:  9m 17s | Max: 28m 19s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/20  | Total:  2h 42m | Avg:  8m 08s | Max: 33m 59s | Hits: 365%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 43s | Avg: 3m 21s | Max: 4m 42s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  4m 42s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s
      🟩 Test               Pass: 100%/1   | Total:  4m 42s | Avg:  4m 42s | Max:  4m 42s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 47s | Avg: 25m 47s | Max: 25m 47s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
+/- Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

3 participants