Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out level 3 gemm tests #2610

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Split out level 3 gemm tests #2610

wants to merge 1 commit into from

Conversation

kshyatt
Copy link
Contributor

@kshyatt kshyatt commented Jan 8, 2025

Testing locally, the level 3 and split-out level 3 GEMM-y tests seem to take the same amount of time. Should help with parallelization. Also removed an extraneous comment.

@kshyatt kshyatt requested a review from maleadt January 8, 2025 16:45
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: ddac187 Previous: a0c2f4b Ratio
latency/precompile 45404021171 ns 45297385295 ns 1.00
latency/ttfp 6388606471 ns 6375596178 ns 1.00
latency/import 3040221123 ns 3036561495 ns 1.00
integration/volumerhs 9565588 ns 9567419 ns 1.00
integration/byval/slices=1 146893 ns 146746 ns 1.00
integration/byval/slices=3 425698 ns 425517.5 ns 1.00
integration/byval/reference 144994 ns 145010 ns 1.00
integration/byval/slices=2 286459 ns 286216 ns 1.00
integration/cudadevrt 103520 ns 103513 ns 1.00
kernel/indexing 14218 ns 14419 ns 0.99
kernel/indexing_checked 15449 ns 15499 ns 1.00
kernel/occupancy 728.203125 ns 748.2734375 ns 0.97
kernel/launch 2109.4 ns 2194.6666666666665 ns 0.96
kernel/rand 14991 ns 17335 ns 0.86
array/reverse/1d 19743 ns 19412 ns 1.02
array/reverse/2d 25140 ns 24576 ns 1.02
array/reverse/1d_inplace 11280 ns 11029 ns 1.02
array/reverse/2d_inplace 13218 ns 13223 ns 1.00
array/copy 20708 ns 20740 ns 1.00
array/iteration/findall/int 158377 ns 158179 ns 1.00
array/iteration/findall/bool 138671 ns 138583 ns 1.00
array/iteration/findfirst/int 153822.5 ns 153423 ns 1.00
array/iteration/findfirst/bool 154577.5 ns 154821 ns 1.00
array/iteration/scalar 75766.5 ns 77451 ns 0.98
array/iteration/logical 213464 ns 216735 ns 0.98
array/iteration/findmin/1d 41245 ns 41556.5 ns 0.99
array/iteration/findmin/2d 94061 ns 94128 ns 1.00
array/reductions/reduce/1d 35422 ns 42013 ns 0.84
array/reductions/reduce/2d 41072.5 ns 51911 ns 0.79
array/reductions/mapreduce/1d 33410 ns 39275 ns 0.85
array/reductions/mapreduce/2d 41182.5 ns 49505.5 ns 0.83
array/broadcast 21668 ns 21668 ns 1
array/copyto!/gpu_to_gpu 13525 ns 11569 ns 1.17
array/copyto!/cpu_to_gpu 211822 ns 211873 ns 1.00
array/copyto!/gpu_to_cpu 245148.5 ns 245423 ns 1.00
array/accumulate/1d 108822 ns 108388.5 ns 1.00
array/accumulate/2d 79771 ns 79823 ns 1.00
array/construct 1117.05 ns 1208.35 ns 0.92
array/random/randn/Float32 43263 ns 43873.5 ns 0.99
array/random/randn!/Float32 26118 ns 25937 ns 1.01
array/random/rand!/Int64 27149 ns 27271 ns 1.00
array/random/rand!/Float32 8683.333333333334 ns 8766.666666666666 ns 0.99
array/random/rand/Int64 29975 ns 29637 ns 1.01
array/random/rand/Float32 12857 ns 12723 ns 1.01
array/permutedims/4d 66803 ns 66923 ns 1.00
array/permutedims/2d 56890 ns 56439 ns 1.01
array/permutedims/3d 59106 ns 58867 ns 1.00
array/sorting/1d 2919614 ns 2933352 ns 1.00
array/sorting/by 3499898 ns 3500830 ns 1.00
array/sorting/2d 1084142 ns 1085059 ns 1.00
cuda/synchronization/stream/auto 1028.3 ns 1038.4 ns 0.99
cuda/synchronization/stream/nonblocking 6556.8 ns 6432 ns 1.02
cuda/synchronization/stream/blocking 794 ns 807.5918367346939 ns 0.98
cuda/synchronization/context/auto 1212.5 ns 1194.1 ns 1.02
cuda/synchronization/context/nonblocking 6758.6 ns 6649.8 ns 1.02
cuda/synchronization/context/blocking 901.3823529411765 ns 886.6415094339623 ns 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant