Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Merge Sort implementation for c.parallel #3636

Merged
merged 38 commits into from
Feb 7, 2025

Conversation

NaderAlAwar
Copy link
Contributor

@NaderAlAwar NaderAlAwar commented Jan 31, 2025

Description

Closes #2547

This PR adds the c.parallel merge_sort API. It also includes some changes to merge_sort.cuh and other included headers to enable NVRTC compilation.

Notes to reviewers

  1. Some of the tuning policy parameters in merge_sort.cu were hardcoded in order to avoid the assertion failure that checks for the presence of certain memory ops (presumably for performance reasons). This means that we will get suboptimal performance. I think this can be addressed once [FEA]: Redesign default tuning #3570 is solved.
  2. I am still running into some issues with output iterators. This is explained in more detail in [BUG]: output iterators do not currently work with c.parallel merge_sort #3722 .

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@NaderAlAwar NaderAlAwar requested review from a team as code owners January 31, 2025 18:54
@NaderAlAwar NaderAlAwar marked this pull request as draft January 31, 2025 18:54
Copy link

copy-pr-bot bot commented Jan 31, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cub/cub/detail/launcher/cuda_driver.cuh Outdated Show resolved Hide resolved
thrust/thrust/iterator/iterator_traits.h Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Feb 6, 2025

🟩 CI finished in 1h 34m: Pass: 100%/90 | Total: 2d 16h | Avg: 42m 51s | Max: 1h 19m | Hits: 73%/132233
  • 🟩 cub: Pass: 100%/44 | Total: 1d 15h | Avg: 54m 14s | Max: 1h 19m | Hits: 68%/52320

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 13h | Avg: 54m 00s | Max:  1h 19m | Hits:  68%/49888 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 07s | Max: 59m 18s | Hits:  67%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 00m | Avg:  1h 00m | Max:  1h 02m | Hits:  57%/5914  
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  66%/2250  
      🟩 12.8               Pass: 100%/37  | Total:  1d 08h | Avg: 52m 43s | Max:  1h 19m | Hits:  70%/44156 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 06m | Hits:  73%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 00m | Avg:  1h 00m | Max:  1h 02m | Hits:  57%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  66%/2250  
      🟩 nvcc12.8           Pass: 100%/35  | Total:  1d 06h | Avg: 52m 08s | Max:  1h 19m | Hits:  69%/42052 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 06m | Hits:  73%/2104  
      🟩 nvcc               Pass: 100%/42  | Total:  1d 13h | Avg: 53m 49s | Max:  1h 19m | Hits:  68%/50216 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 53m | Avg: 58m 25s | Max:  1h 02m | Hits:  67%/4872  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 29s | Max:  1h 01m | Hits:  67%/2432  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 52s | Max: 58m 50s | Hits:  67%/2432  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  67%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 40m | Avg: 48m 36s | Max:  1h 06m | Hits:  78%/8184  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 41s | Max:  1h 00m | Hits:  67%/2436  
      🟩 GCC8               Pass: 100%/1   | Total: 56m 25s | Avg: 56m 25s | Max: 56m 25s | Hits:  67%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 51m | Avg: 55m 52s | Max: 56m 32s | Hits:  67%/2436  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  67%/2436  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  67%/2432  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 49s | Max:  1h 01m | Hits:  67%/2432  
      🟩 GCC13              Pass: 100%/10  | Total:  6h 19m | Avg: 37m 58s | Max:  1h 04m | Hits:  83%/12160 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 13m | Hits:  14%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 38m | Avg:  1h 19m | Max:  1h 19m | Hits:  14%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  66%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 29m | Avg: 54m 40s | Max:  1h 06m | Hits:  71%/20352 
      🟩 GCC                Pass: 100%/21  | Total: 17h 08m | Avg: 48m 58s | Max:  1h 04m | Hits:  74%/25550 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 53m | Avg:  1h 13m | Max:  1h 19m | Hits:  14%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  66%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 53m 31s | Avg: 26m 45s | Max: 27m 29s | Hits:  83%/2432  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 01m | Max:  1h 19m | Hits:  62%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 59m | Avg: 29m 57s | Max: 56m 48s | Hits:  91%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 13h | Avg:  1h 00m | Max:  1h 19m | Hits:  62%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 20s | Avg: 21m 20s | Max: 21m 20s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 22s | Avg: 16m 22s | Max: 16m 22s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 13m | Avg: 24m 33s | Max: 26m 02s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/2   | Total: 41m 16s | Avg: 20m 38s | Max: 22m 06s | Hits:  99%/2432  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 53m 31s | Avg: 26m 45s | Max: 27m 29s | Hits:  83%/2432  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 04m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 29m | Avg:  1h 01m | Max:  1h 18m | Hits:  60%/23559 
      🟩 20                 Pass: 100%/24  | Total: 19h 16m | Avg: 48m 12s | Max:  1h 19m | Hits:  75%/28761 
    
  • 🟩 thrust: Pass: 100%/43 | Total: 23h 45m | Avg: 33m 09s | Max: 1h 00m | Hits: 77%/79625

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 36s | Avg: 19m 18s | Max: 27m 10s | Hits:  88%/3706  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 22h 47m | Avg: 33m 21s | Max:  1h 00m | Hits:  77%/75920 
      🟩 arm64              Pass: 100%/2   | Total: 58m 09s | Avg: 29m 04s | Max: 30m 53s | Hits:  78%/3705  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 10m | Avg: 38m 01s | Max:  1h 00m | Hits:  73%/9256  
      🟩 12.5               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 02s | Max: 58m 47s | Hits:  63%/3704  
      🟩 12.8               Pass: 100%/36  | Total: 18h 39m | Avg: 31m 05s | Max: 59m 16s | Hits:  78%/66665 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 55m 11s | Avg: 27m 35s | Max: 28m 03s | Hits:  78%/3704  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 10m | Avg: 38m 01s | Max:  1h 00m | Hits:  73%/9256  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 56m | Avg: 58m 02s | Max: 58m 47s | Hits:  63%/3704  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 17h 44m | Avg: 31m 18s | Max: 59m 16s | Hits:  78%/62961 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 55m 11s | Avg: 27m 35s | Max: 28m 03s | Hits:  78%/3704  
      🟩 nvcc               Pass: 100%/41  | Total: 22h 50m | Avg: 33m 25s | Max:  1h 00m | Hits:  77%/75921 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 08s | Max: 33m 05s | Hits:  78%/7408  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 29s | Max: 31m 59s | Hits:  78%/3704  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 25s | Max: 33m 41s | Hits:  78%/3704  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 33s | Max: 33m 15s | Hits:  78%/3704  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 42m | Avg: 23m 14s | Max: 31m 28s | Hits:  84%/12964 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 44s | Max: 33m 19s | Hits:  78%/3706  
      🟩 GCC8               Pass: 100%/1   | Total: 33m 25s | Avg: 33m 25s | Max: 33m 25s | Hits:  78%/1853  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 19s | Max: 33m 44s | Hits:  78%/3706  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 07s | Hits:  78%/3706  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 46s | Max: 35m 15s | Hits:  78%/3706  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 09m | Avg: 34m 57s | Max: 35m 10s | Hits:  78%/3706  
      🟩 GCC13              Pass: 100%/8   | Total:  3h 12m | Avg: 24m 06s | Max: 37m 21s | Hits:  86%/14824 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 48s | Max:  1h 00m | Hits:  53%/3692  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 25m | Avg: 48m 39s | Max: 59m 12s | Hits:  58%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 02s | Max: 58m 47s | Hits:  63%/3704  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 06m | Avg: 28m 35s | Max: 33m 41s | Hits:  80%/31484 
      🟩 GCC                Pass: 100%/19  | Total:  9h 17m | Avg: 29m 21s | Max: 37m 21s | Hits:  81%/35207 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 25m | Avg: 53m 06s | Max:  1h 00m | Hits:  56%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 56m | Avg: 58m 02s | Max: 58m 47s | Hits:  63%/3704  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 54m | Avg: 36m 11s | Max:  1h 00m | Hits:  75%/61112 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 51m | Avg: 23m 09s | Max: 59m 12s | Hits:  85%/18513 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 22h 25m | Avg: 36m 22s | Max:  1h 00m | Hits:  74%/68516 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 11s | Avg: 15m 23s | Max: 30m 29s | Hits:  89%/5551  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 41s | Avg: 11m 13s | Max: 11m 43s | Hits:  99%/5558  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total: 37m 21s | Avg: 37m 21s | Max: 37m 21s | Hits:  78%/1853  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 31m | Avg: 37m 34s | Max:  1h 00m | Hits:  73%/37031 
      🟩 20                 Pass: 100%/21  | Total: 10h 35m | Avg: 30m 16s | Max: 59m 12s | Hits:  80%/38888 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 32s | Avg: 5m 16s | Max: 8m 03s | Hits: 94%/288

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 03s | Hits:  94%/288   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s | Hits:  90%/144   
      🟩 Test               Pass: 100%/1   | Total:  8m 03s | Avg:  8m 03s | Max:  8m 03s | Hits:  98%/144   
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Approving with just a couple of nits

@@ -119,6 +120,17 @@ std::vector<T> generate(std::size_t num_items)
return vec;
}

template <class T>
std::vector<T> make_shuffled_key_ranks_vector(std::size_t num_items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Given that we're defining this in the general-purpose test_util.h, we could consider a more general name

Suggested change
std::vector<T> make_shuffled_key_ranks_vector(std::size_t num_items)
std::vector<T> make_shuffled_sequence(std::size_t num_items)

@@ -14,4 +14,5 @@

#include <cccl/c/types.h>

std::string make_kernel_user_binary_operator(std::string_view input_value_t, cccl_op_t operation);
std::string
make_kernel_user_binary_operator(std::string_view input_value_t, cccl_op_t operation, bool comparison_op = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps we should just have a distinct make_kernel_user_comparison_operator helper.

@@ -161,7 +170,7 @@ struct __align__({5}) output_iterator_state_t{{
struct output_iterator_t {{
using iterator_category = cuda::std::random_access_iterator_tag;
using difference_type = {0};
using value_type = void;
using value_type = VALUE_T;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required independent of #3722?

If not, I'd recommend we make all the output_iterator changes in that PR.

@NaderAlAwar NaderAlAwar force-pushed the merge-sort-c-parallel branch from 1d4dc30 to 5124fe2 Compare February 7, 2025 14:17
Copy link
Contributor

github-actions bot commented Feb 7, 2025

🟨 CI finished in 1h 02m: Pass: 98%/90 | Total: 15h 54m | Avg: 10m 36s | Max: 56m 09s | Hits: 94%/132089
  • 🟨 cccl_c_parallel: Pass: 50%/2 | Total: 5m 16s | Avg: 2m 38s | Max: 3m 04s | Hits: 98%/144

    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s | Hits:  98%/144   
      🔥 Test               Pass:   0%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s
    🟨 cpu
      🟨 amd64              Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    🟨 ctk
      🟨 12.8               Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    🟨 cudacxx
      🟨 nvcc12.8           Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    🟨 cudacxx_family
      🟨 nvcc               Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    🟨 cxx
      🟨 GCC13              Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    🟨 cxx_family
      🟨 GCC                Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    🟨 gpu
      🟨 rtx2080            Pass:  50%/2   | Total:  5m 16s | Avg:  2m 38s | Max:  3m 04s | Hits:  98%/144   
    
  • 🟩 cub: Pass: 100%/44 | Total: 8h 49m | Avg: 12m 02s | Max: 56m 09s | Hits: 92%/52320

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  8h 39m | Avg: 12m 21s | Max: 56m 09s | Hits:  92%/49888 
      🟩 arm64              Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 38s | Hits:  99%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 49m 57s | Avg:  9m 59s | Max: 27m 20s | Hits:  85%/5914  
      🟩 12.5               Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
      🟩 12.8               Pass: 100%/37  | Total:  7h 39m | Avg: 12m 24s | Max: 56m 09s | Hits:  93%/44156 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 42s | Avg:  4m 51s | Max:  4m 59s | Hits: 100%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 49m 57s | Avg:  9m 59s | Max: 27m 20s | Hits:  85%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
      🟩 nvcc12.8           Pass: 100%/35  | Total:  7h 29m | Avg: 12m 50s | Max: 56m 09s | Hits:  92%/42052 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 42s | Avg:  4m 51s | Max:  4m 59s | Hits: 100%/2104  
      🟩 nvcc               Pass: 100%/42  | Total:  8h 40m | Avg: 12m 22s | Max: 56m 09s | Hits:  92%/50216 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 13s | Avg:  5m 48s | Max:  6m 04s | Hits: 100%/4872  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s | Hits: 100%/2432  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 24s | Hits: 100%/2432  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 26s | Hits: 100%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 11m | Avg: 10m 12s | Max: 24m 16s | Hits: 100%/8184  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  6m 01s | Hits:  99%/2436  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s | Hits:  99%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 01m | Avg: 30m 53s | Max: 56m 09s | Hits:  84%/2436  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 31s | Avg:  6m 15s | Max:  6m 18s | Hits:  99%/2436  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max:  6m 40s | Hits:  99%/2432  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 24s | Avg:  6m 42s | Max:  6m 57s | Hits:  99%/2432  
      🟩 GCC13              Pass: 100%/10  | Total:  2h 25m | Avg: 14m 32s | Max: 25m 48s | Hits:  99%/12160 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 59s | Avg: 27m 29s | Max: 27m 39s | Hits:  16%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 57m 55s | Avg: 28m 57s | Max: 30m 37s | Hits:  16%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 12m | Avg:  7m 46s | Max: 24m 16s | Hits: 100%/20352 
      🟩 GCC                Pass: 100%/21  | Total:  4h 24m | Avg: 12m 34s | Max: 56m 09s | Hits:  98%/25550 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 52m | Avg: 28m 13s | Max: 30m 37s | Hits:  16%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 25m 48s | Hits:  99%/2432  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 53m | Avg: 10m 24s | Max: 56m 09s | Hits:  90%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 25m | Avg: 18m 08s | Max: 25m 07s | Hits:  99%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  6h 12m | Avg: 10m 04s | Max: 56m 09s | Hits:  90%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 13s | Avg: 22m 13s | Max: 22m 13s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 48s | Avg: 16m 48s | Max: 16m 48s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 15m | Avg: 25m 03s | Max: 25m 48s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/2   | Total: 42m 42s | Avg: 21m 21s | Max: 23m 34s | Hits:  99%/2432  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 25m 48s | Hits:  99%/2432  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 19s | Avg:  7m 19s | Max:  7m 19s | Hits:  99%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  4h 01m | Avg: 12m 03s | Max: 56m 09s | Hits:  87%/23559 
      🟩 20                 Pass: 100%/24  | Total:  4h 48m | Avg: 12m 01s | Max: 30m 37s | Hits:  96%/28761 
    
  • 🟩 thrust: Pass: 100%/43 | Total: 6h 31m | Avg: 9m 06s | Max: 30m 27s | Hits: 96%/79625

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 50s | Avg:  8m 55s | Max: 11m 18s | Hits:  99%/3706  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  6h 21m | Avg:  9m 18s | Max: 30m 27s | Hits:  96%/75920 
      🟩 arm64              Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 19s | Hits:  99%/3705  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 21s | Avg:  8m 52s | Max: 23m 14s | Hits:  93%/9256  
      🟩 12.5               Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
      🟩 12.8               Pass: 100%/36  | Total:  5h 18m | Avg:  8m 50s | Max: 30m 27s | Hits:  96%/66665 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 38s | Hits: 100%/3704  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 21s | Avg:  8m 52s | Max: 23m 14s | Hits:  93%/9256  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  5h 07m | Avg:  9m 01s | Max: 30m 27s | Hits:  96%/62961 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 38s | Hits: 100%/3704  
      🟩 nvcc               Pass: 100%/41  | Total:  6h 20m | Avg:  9m 16s | Max: 30m 27s | Hits:  96%/75921 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 12s | Avg:  5m 18s | Max:  5m 30s | Hits: 100%/7408  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 58s | Hits: 100%/3704  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 13s | Avg:  6m 06s | Max:  6m 09s | Hits: 100%/3704  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 53s | Avg:  5m 56s | Max:  5m 57s | Hits: 100%/3704  
      🟩 Clang18            Pass: 100%/7   | Total: 46m 03s | Avg:  6m 34s | Max: 10m 28s | Hits: 100%/12964 
      🟩 GCC7               Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  5m 42s | Hits:  99%/3706  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s | Hits:  99%/1853  
      🟩 GCC9               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 48s | Hits:  99%/3706  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  6m 08s | Hits:  99%/3706  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 30s | Avg:  6m 15s | Max:  6m 21s | Hits:  99%/3706  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 50s | Avg:  6m 25s | Max:  6m 37s | Hits:  99%/3706  
      🟩 GCC13              Pass: 100%/8   | Total:  1h 02m | Avg:  7m 47s | Max: 11m 40s | Hits:  99%/14824 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 50m 09s | Avg: 25m 04s | Max: 26m 55s | Hits:  69%/3692  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 29s | Max: 30m 27s | Hits:  69%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 42m | Avg:  6m 03s | Max: 10m 28s | Hits: 100%/31484 
      🟩 GCC                Pass: 100%/19  | Total:  2h 06m | Avg:  6m 40s | Max: 11m 40s | Hits:  99%/35207 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 12m | Avg: 26m 31s | Max: 30m 27s | Hits:  69%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 25m | Avg:  8m 02s | Max: 26m 55s | Hits:  97%/61112 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 05m | Avg: 12m 34s | Max: 30m 27s | Hits:  93%/18513 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 10m | Avg:  8m 24s | Max: 26m 55s | Hits:  96%/68516 
      🟩 TestCPU            Pass: 100%/3   | Total: 47m 04s | Avg: 15m 41s | Max: 30m 27s | Hits:  89%/5551  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 26s | Avg: 11m 08s | Max: 11m 40s | Hits:  99%/5558  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s | Hits:  99%/1853  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 02m | Avg:  9m 08s | Max: 26m 55s | Hits:  95%/37031 
      🟩 20                 Pass: 100%/21  | Total:  3h 10m | Avg:  9m 04s | Max: 30m 27s | Hits:  97%/38888 
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Copy link
Contributor

github-actions bot commented Feb 7, 2025

🟩 CI finished in 1h 19m: Pass: 100%/90 | Total: 16h 00m | Avg: 10m 40s | Max: 56m 09s | Hits: 94%/132233
  • 🟩 cub: Pass: 100%/44 | Total: 8h 49m | Avg: 12m 02s | Max: 56m 09s | Hits: 92%/52320

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  8h 39m | Avg: 12m 21s | Max: 56m 09s | Hits:  92%/49888 
      🟩 arm64              Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 38s | Hits:  99%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 49m 57s | Avg:  9m 59s | Max: 27m 20s | Hits:  85%/5914  
      🟩 12.5               Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
      🟩 12.8               Pass: 100%/37  | Total:  7h 39m | Avg: 12m 24s | Max: 56m 09s | Hits:  93%/44156 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 42s | Avg:  4m 51s | Max:  4m 59s | Hits: 100%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 49m 57s | Avg:  9m 59s | Max: 27m 20s | Hits:  85%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
      🟩 nvcc12.8           Pass: 100%/35  | Total:  7h 29m | Avg: 12m 50s | Max: 56m 09s | Hits:  92%/42052 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 42s | Avg:  4m 51s | Max:  4m 59s | Hits: 100%/2104  
      🟩 nvcc               Pass: 100%/42  | Total:  8h 40m | Avg: 12m 22s | Max: 56m 09s | Hits:  92%/50216 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 13s | Avg:  5m 48s | Max:  6m 04s | Hits: 100%/4872  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s | Hits: 100%/2432  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 24s | Hits: 100%/2432  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 26s | Hits: 100%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 11m | Avg: 10m 12s | Max: 24m 16s | Hits: 100%/8184  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  6m 01s | Hits:  99%/2436  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s | Hits:  99%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 01m | Avg: 30m 53s | Max: 56m 09s | Hits:  84%/2436  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 31s | Avg:  6m 15s | Max:  6m 18s | Hits:  99%/2436  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max:  6m 40s | Hits:  99%/2432  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 24s | Avg:  6m 42s | Max:  6m 57s | Hits:  99%/2432  
      🟩 GCC13              Pass: 100%/10  | Total:  2h 25m | Avg: 14m 32s | Max: 25m 48s | Hits:  99%/12160 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 59s | Avg: 27m 29s | Max: 27m 39s | Hits:  16%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 57m 55s | Avg: 28m 57s | Max: 30m 37s | Hits:  16%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 12m | Avg:  7m 46s | Max: 24m 16s | Hits: 100%/20352 
      🟩 GCC                Pass: 100%/21  | Total:  4h 24m | Avg: 12m 34s | Max: 56m 09s | Hits:  98%/25550 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 52m | Avg: 28m 13s | Max: 30m 37s | Hits:  16%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 10m 56s | Hits:  98%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 25m 48s | Hits:  99%/2432  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 53m | Avg: 10m 24s | Max: 56m 09s | Hits:  90%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 25m | Avg: 18m 08s | Max: 25m 07s | Hits:  99%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  6h 12m | Avg: 10m 04s | Max: 56m 09s | Hits:  90%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 13s | Avg: 22m 13s | Max: 22m 13s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 48s | Avg: 16m 48s | Max: 16m 48s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 15m | Avg: 25m 03s | Max: 25m 48s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/2   | Total: 42m 42s | Avg: 21m 21s | Max: 23m 34s | Hits:  99%/2432  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 25m 48s | Hits:  99%/2432  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 19s | Avg:  7m 19s | Max:  7m 19s | Hits:  99%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  4h 01m | Avg: 12m 03s | Max: 56m 09s | Hits:  87%/23559 
      🟩 20                 Pass: 100%/24  | Total:  4h 48m | Avg: 12m 01s | Max: 30m 37s | Hits:  96%/28761 
    
  • 🟩 thrust: Pass: 100%/43 | Total: 6h 31m | Avg: 9m 06s | Max: 30m 27s | Hits: 96%/79625

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 50s | Avg:  8m 55s | Max: 11m 18s | Hits:  99%/3706  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  6h 21m | Avg:  9m 18s | Max: 30m 27s | Hits:  96%/75920 
      🟩 arm64              Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 19s | Hits:  99%/3705  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 21s | Avg:  8m 52s | Max: 23m 14s | Hits:  93%/9256  
      🟩 12.5               Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
      🟩 12.8               Pass: 100%/36  | Total:  5h 18m | Avg:  8m 50s | Max: 30m 27s | Hits:  96%/66665 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 38s | Hits: 100%/3704  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 21s | Avg:  8m 52s | Max: 23m 14s | Hits:  93%/9256  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  5h 07m | Avg:  9m 01s | Max: 30m 27s | Hits:  96%/62961 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 38s | Hits: 100%/3704  
      🟩 nvcc               Pass: 100%/41  | Total:  6h 20m | Avg:  9m 16s | Max: 30m 27s | Hits:  96%/75921 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 12s | Avg:  5m 18s | Max:  5m 30s | Hits: 100%/7408  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 58s | Hits: 100%/3704  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 13s | Avg:  6m 06s | Max:  6m 09s | Hits: 100%/3704  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 53s | Avg:  5m 56s | Max:  5m 57s | Hits: 100%/3704  
      🟩 Clang18            Pass: 100%/7   | Total: 46m 03s | Avg:  6m 34s | Max: 10m 28s | Hits: 100%/12964 
      🟩 GCC7               Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  5m 42s | Hits:  99%/3706  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s | Hits:  99%/1853  
      🟩 GCC9               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 48s | Hits:  99%/3706  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  6m 08s | Hits:  99%/3706  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 30s | Avg:  6m 15s | Max:  6m 21s | Hits:  99%/3706  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 50s | Avg:  6m 25s | Max:  6m 37s | Hits:  99%/3706  
      🟩 GCC13              Pass: 100%/8   | Total:  1h 02m | Avg:  7m 47s | Max: 11m 40s | Hits:  99%/14824 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 50m 09s | Avg: 25m 04s | Max: 26m 55s | Hits:  69%/3692  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 29s | Max: 30m 27s | Hits:  69%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 42m | Avg:  6m 03s | Max: 10m 28s | Hits: 100%/31484 
      🟩 GCC                Pass: 100%/19  | Total:  2h 06m | Avg:  6m 40s | Max: 11m 40s | Hits:  99%/35207 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 12m | Avg: 26m 31s | Max: 30m 27s | Hits:  69%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 58s | Avg: 14m 29s | Max: 14m 42s | Hits:  99%/3704  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 25m | Avg:  8m 02s | Max: 26m 55s | Hits:  97%/61112 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 05m | Avg: 12m 34s | Max: 30m 27s | Hits:  93%/18513 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 10m | Avg:  8m 24s | Max: 26m 55s | Hits:  96%/68516 
      🟩 TestCPU            Pass: 100%/3   | Total: 47m 04s | Avg: 15m 41s | Max: 30m 27s | Hits:  89%/5551  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 26s | Avg: 11m 08s | Max: 11m 40s | Hits:  99%/5558  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s | Hits:  99%/1853  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 02m | Avg:  9m 08s | Max: 26m 55s | Hits:  95%/37031 
      🟩 20                 Pass: 100%/21  | Total:  3h 10m | Avg:  9m 04s | Max: 30m 27s | Hits:  97%/38888 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 01s | Avg: 5m 30s | Max: 8m 49s | Hits: 98%/288

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  8m 49s | Hits:  98%/288   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s | Hits:  98%/144   
      🟩 Test               Pass: 100%/1   | Total:  8m 49s | Avg:  8m 49s | Max:  8m 49s | Hits:  98%/144   
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 33s | Avg: 28m 33s | Max: 28m 33s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@NaderAlAwar NaderAlAwar merged commit fcace5c into NVIDIA:main Feb 7, 2025
105 of 107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: Implement cccl.c.parallel version of merge sort
3 participants