Performance gap of same kernel in different cuda version? #850

LeiWang1999 · 2023-03-06T08:20:16Z

LeiWang1999
Mar 6, 2023

Hi all, I'm benchmarking cutlass with cutlass_profiler on my 24GB gtx 3090, however, I found that the best performance of cutlass under cuda 11.1 and cuda 11.8 not share the same kernel.

in cuda 11.1 the best kernel is
cutlass_tensorop_h1688gemm_256x128_32x2_tt_align8, in cuda 11.8, the best is cutlass_tensorop_h16816gemm_256x128_32x3_tt_align8, btw, cutlass_tensorop_h1688gemm_256x128_32x2_tt_align8 maintain same performance with cuda 11.1.

But cutlass_tensorop_h16816gemm_256x128_32x3_tt_align8 in cuda 11.1 has a very bad performance, from my profile, I found that this kernel has so many local memory read and write, which may caused by register spill, maybe cuda 11.8 has better performance is bacause of the nvcc fix some spill case?

I aslo noticed that for different cuda versions, cutlass will enable some new features, like l2 cache prefetch or grid_constant, the generated code should be no different other than these features, is my understanding correct?

hwu36 · 2023-03-06T18:20:34Z

hwu36
Mar 6, 2023
Maintainer

we work together closely with nvcc team to improve the performance of cutlass. Every version of nvcc usually improves some type of cutlass kernels. As to gemm, 11.3 is the minimum. We recommend to use the latest nvcc since it has the most latest optimizations.

NVCC also enables newer HW features in different versions and cutlass will use these features in the kernels when they are available in nvcc.

4 replies

LeiWang1999 Mar 7, 2023
Author

thanks, could you kindly provide some examples that cutlass carefully designed to leverage nvcc?

hwu36 Mar 7, 2023
Maintainer

that is the other direction. nvcc optimizes cutlass.

hwu36 Mar 7, 2023
Maintainer

if you want to have good performance, either use cutlass or generate the same ptx as cutlass.

LeiWang1999 Mar 7, 2023
Author

I see, thanks for your reply, that's impressive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance gap of same kernel in different cuda version? #850

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance gap of same kernel in different cuda version? #850

LeiWang1999 Mar 6, 2023

Replies: 1 comment · 4 replies

hwu36 Mar 6, 2023 Maintainer

LeiWang1999 Mar 7, 2023 Author

hwu36 Mar 7, 2023 Maintainer

hwu36 Mar 7, 2023 Maintainer

LeiWang1999 Mar 7, 2023 Author

LeiWang1999
Mar 6, 2023

Replies: 1 comment 4 replies

hwu36
Mar 6, 2023
Maintainer

LeiWang1999 Mar 7, 2023
Author

hwu36 Mar 7, 2023
Maintainer

hwu36 Mar 7, 2023
Maintainer

LeiWang1999 Mar 7, 2023
Author