-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Merge Sort implementation for c.parallel #3636
Conversation
…te parameters to allow calling merge sort from c.parallel
…e to missing definition when using NVRTC, and make changes to included thrust headers to make them NVRTC compilable
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
… per thread for the tuning policy
…ng set incorrectly
This reverts commit ee290a4.
…as the copy versions
🟩 CI finished in 1h 34m: Pass: 100%/90 | Total: 2d 16h | Avg: 42m 51s | Max: 1h 19m | Hits: 73%/132233
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
+/- | CCCL C Parallel Library |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Approving with just a couple of nits
c/parallel/test/test_util.h
Outdated
@@ -119,6 +120,17 @@ std::vector<T> generate(std::size_t num_items) | |||
return vec; | |||
} | |||
|
|||
template <class T> | |||
std::vector<T> make_shuffled_key_ranks_vector(std::size_t num_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Given that we're defining this in the general-purpose test_util.h
, we could consider a more general name
std::vector<T> make_shuffled_key_ranks_vector(std::size_t num_items) | |
std::vector<T> make_shuffled_sequence(std::size_t num_items) |
c/parallel/src/kernels/operators.h
Outdated
@@ -14,4 +14,5 @@ | |||
|
|||
#include <cccl/c/types.h> | |||
|
|||
std::string make_kernel_user_binary_operator(std::string_view input_value_t, cccl_op_t operation); | |||
std::string | |||
make_kernel_user_binary_operator(std::string_view input_value_t, cccl_op_t operation, bool comparison_op = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps we should just have a distinct make_kernel_user_comparison_operator
helper.
@@ -161,7 +170,7 @@ struct __align__({5}) output_iterator_state_t{{ | |||
struct output_iterator_t {{ | |||
using iterator_category = cuda::std::random_access_iterator_tag; | |||
using difference_type = {0}; | |||
using value_type = void; | |||
using value_type = VALUE_T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required independent of #3722?
If not, I'd recommend we make all the output_iterator changes in that PR.
1d4dc30
to
5124fe2
Compare
🟨 CI finished in 1h 02m: Pass: 98%/90 | Total: 15h 54m | Avg: 10m 36s | Max: 56m 09s | Hits: 94%/132089
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
+/- | CCCL C Parallel Library |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
🟩 CI finished in 1h 19m: Pass: 100%/90 | Total: 16h 00m | Avg: 10m 40s | Max: 56m 09s | Hits: 94%/132233
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
+/- | CCCL C Parallel Library |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
Description
Closes #2547
This PR adds the
c.parallel
merge_sort API. It also includes some changes tomerge_sort.cuh
and other included headers to enable NVRTC compilation.Notes to reviewers
merge_sort.cu
were hardcoded in order to avoid the assertion failure that checks for the presence of certain memory ops (presumably for performance reasons). This means that we will get suboptimal performance. I think this can be addressed once [FEA]: Redesign default tuning #3570 is solved.Checklist