HFMA2 and LDS in cutlass efficient fp16 tensorcore kernel? #716
Answered
by
hwu36
LeiWang1999
asked this question in
Q&A
-
Beta Was this translation helpful? Give feedback.
Answered by
hwu36
Dec 1, 2022
Replies: 1 comment 2 replies
-
The epilogue needs to use lds/sts and hfma2 to do alpha and beta scaling. if you want to use nsight to check bank conflicts, you need to change your problem size to launch just one threadblock. |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
LeiWang1999
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The epilogue needs to use lds/sts and hfma2 to do alpha and beta scaling. if you want to use nsight to check bank conflicts, you need to change your problem size to launch just one threadblock.