forked from NVIDIA/TransformerEngine
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from NVIDIA:main #21
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Remove total time measurement Signed-off-by: Phuong Nguyen <[email protected]>
* add default path for ffi include * add an option to get XLA_HOME from env --------- Signed-off-by: Phuong Nguyen <[email protected]>
* Bump minimum CUDA version to 12.0 Signed-off-by: Tim Moon <[email protected]> * Debug CUDA version check Signed-off-by: Tim Moon <[email protected]> * Debug CMake build Signed-off-by: Tim Moon <[email protected]> * Review suggestions from @ksivaman and @ptrendx Remove logic for CUDA <12.0 in PyTorch and Paddle builds. Update version in docs and README. Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Markus Schnoes <[email protected]> Co-authored-by: Charlene Yang <[email protected]>
fix typos regarding t in thd Signed-off-by: Charlene Yang <[email protected]>
* add window_size to AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * add seq_offsets_qkvo for cudnn thd Signed-off-by: Xiaowei Ren <[email protected]> * add seq_offsets_qkvo to AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * fix seq_offsets calculation of cudnn thd Signed-off-by: Xiaowei Ren <[email protected]> * remove a thd assert Signed-off-by: Xiaowei Ren <[email protected]> * fix bias for thd test Signed-off-by: Xiaowei Ren <[email protected]> * add thd test for cudnn FA with CP Signed-off-by: Xiaowei Ren <[email protected]> * skip GQA/MQA test for cuDNN THD Signed-off-by: Xiaowei Ren <[email protected]> * make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1 Signed-off-by: Xiaowei Ren <[email protected]> * fix seq_offsets inputs Signed-off-by: Xiaowei Ren <[email protected]> * remove two comments Signed-off-by: Xiaowei Ren <[email protected]> * fix attn mask type for cudnn thd with cp Signed-off-by: Xiaowei Ren <[email protected]> * fix attn_mask_type check Signed-off-by: Xiaowei Ren <[email protected]> * fix attn_mask_type for cudnn fa with thd Signed-off-by: Xiaowei Ren <[email protected]> * fix a typo Signed-off-by: Xiaowei Ren <[email protected]> * fix out dout in bwd Signed-off-by: Xiaowei Ren <[email protected]> * assert cudnn+thd does not support attn bias Signed-off-by: Xiaowei Ren <[email protected]> * check if attn_mask_type has padding Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * change cp test batch size to 2 Signed-off-by: Xiaowei Ren <[email protected]> * fix code format Signed-off-by: Xiaowei Ren <[email protected]> * fix two assert info Signed-off-by: Xiaowei Ren <[email protected]> * fix assert comment Signed-off-by: Xiaowei Ren <[email protected]> * fix assert comments Signed-off-by: Xiaowei Ren <[email protected]> * minor fix Signed-off-by: Xiaowei Ren <[email protected]> * fix assert comments Signed-off-by: Xiaowei Ren <[email protected]> * assert swa+CP cannot work with thd format Signed-off-by: Xiaowei Ren <[email protected]> * add a new CP function for swa Signed-off-by: Xiaowei Ren <[email protected]> * add a missing dgrads Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * add draft fwd function for swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * enable flash attention for swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * remove an assert of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * call SWAFuncWithCP for swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * use 2hd layout Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change qkv_format check Signed-off-by: Xiaowei Ren <[email protected]> * add a code comment Signed-off-by: Xiaowei Ren <[email protected]> * tensor shape bug fix Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor shape fix Signed-off-by: Xiaowei Ren <[email protected]> * add function to compute cu_seqlens of a cp rank Signed-off-by: Xiaowei Ren <[email protected]> * add cu_seqlens and cu_seqlens_padded to context parallelism Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * fix FlashAttention output sequence length Signed-off-by: Xiaowei Ren <[email protected]> * fix cu_seqlens_kv_per_step calculation Signed-off-by: Xiaowei Ren <[email protected]> * zero dQKV for ending padded tokens Signed-off-by: Xiaowei Ren <[email protected]> * zero dQKV tensors of FlashAttention Signed-off-by: Xiaowei Ren <[email protected]> * fix softmax_lse correction Signed-off-by: Xiaowei Ren <[email protected]> * remove padded tokens of KV to save comounication Signed-off-by: Xiaowei Ren <[email protected]> * do not need to zero dkv for FlashAttention any mroe Signed-off-by: Xiaowei Ren <[email protected]> * zero out tensors Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant code Signed-off-by: Xiaowei Ren <[email protected]> * fix CP unit test Signed-off-by: Xiaowei Ren <[email protected]> * fix kv shape of cp test with thd format Signed-off-by: Xiaowei Ren <[email protected]> * update cp unit test Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add simple code framework Signed-off-by: Xiaowei Ren <[email protected]> * try not to have a separate CP function for SWA Signed-off-by: Xiaowei Ren <[email protected]> * backup some code change Signed-off-by: Xiaowei Ren <[email protected]> * back up code Signed-off-by: Xiaowei Ren <[email protected]> * clean up fwd implementation of SWAFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant code Signed-off-by: Xiaowei Ren <[email protected]> * code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * fix assert info Signed-off-by: Xiaowei Ren <[email protected]> * reduce kv chunk concat overheads Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * make AttnFuncWithCP and SWAFuncWithCP have same API Signed-off-by: Xiaowei Ren <[email protected]> * add a docstring Signed-off-by: Xiaowei Ren <[email protected]> * preliminary implementation of SWAFuncWithCP forward seems working Signed-off-by: Xiaowei Ren <[email protected]> * fix output shape of SWAFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * code refactoring for FlashAttention and add a code placeholder for bwd Signed-off-by: Xiaowei Ren <[email protected]> * use gather_along_first_dim Signed-off-by: Xiaowei Ren <[email protected]> * finish the preliminary implementation of bwd Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant code Signed-off-by: Xiaowei Ren <[email protected]> * fix assert condition Signed-off-by: Xiaowei Ren <[email protected]> * add draft implementation of SWA+CP with FusedAttention Signed-off-by: Xiaowei Ren <[email protected]> * fix attention mask type of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * add qkv_layout Signed-off-by: Xiaowei Ren <[email protected]> * add missing window_size argument Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * fix kv shape of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * bug and typo fix Signed-off-by: Xiaowei Ren <[email protected]> * fix dout shape Signed-off-by: Xiaowei Ren <[email protected]> * add multi stream in fwd of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * save chunk_ids_to_kv_ag in fwd Signed-off-by: Xiaowei Ren <[email protected]> * add multi stream in bwd of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * minor fix to cp stream sync Signed-off-by: Xiaowei Ren <[email protected]> * rename AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * check if window size is None Signed-off-by: Xiaowei Ren <[email protected]> * fix docstring of AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * minor fix Signed-off-by: Xiaowei Ren <[email protected]> * add env var for users to choose KV ag or KV p2p Signed-off-by: Xiaowei Ren <[email protected]> * update cp tests Signed-off-by: Xiaowei Ren <[email protected]> * fix window size in cp unit test Signed-off-by: Xiaowei Ren <[email protected]> * fix pytest skip messages Signed-off-by: Xiaowei Ren <[email protected]> * add cp_comm_type into API Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * assert sequence length divisible requirements Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support table of context parallelism Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo and code format fix Signed-off-by: Xiaowei Ren <[email protected]> * do not print multiple disabling messages Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix device in torch.arange and adjust code for the PR of MLA Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typos and clean asserts Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]> Co-authored-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xiaowei Ren <[email protected]>
* support dtype casting fusion in FusedAdam Signed-off-by: Shijie Wang <[email protected]> * minor changes Signed-off-by: Shijie Wang <[email protected]> * fix lint Signed-off-by: Shijie Wang <[email protected]> * changes based on review comments Signed-off-by: Shijie Wang <[email protected]> * remove unused code Signed-off-by: Shijie Wang <[email protected]> * code refactor Signed-off-by: Shijie Wang <[email protected]> * fix typo Signed-off-by: Shijie Wang <[email protected]> * refactor Signed-off-by: Shijie Wang <[email protected]> * remove unused code Signed-off-by: Shijie Wang <[email protected]> * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Copy CUDA headers for framework sdists Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Shijie Wang <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Frederic Bastien <[email protected]> Co-authored-by: Phuong Nguyen <[email protected]>
…t sequence length parameters (#1066) * Added ability for seqlen for transformer and mha layer Signed-off-by: Lukasz Pierscieniewski <[email protected]> * Documentation for new parameters Signed-off-by: Lukasz Pierscieniewski <[email protected]> * Add tests for THD layout, assert for THD layout with KV-Cache Signed-off-by: Lukasz Pierscieniewski <[email protected]> * Fixed tests Signed-off-by: Lukasz Pierscieniewski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move THD logic in shape calculation, add missing optional in params Signed-off-by: Lukasz Pierscieniewski <[email protected]> * Skip the THD test on GPUs older than Ampere Signed-off-by: Przemek Tredak <[email protected]> --------- Signed-off-by: Lukasz Pierscieniewski <[email protected]> Signed-off-by: Przemek Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Przemek Tredak <[email protected]>
* Perform scale-inv update in cast-transpose kernels Signed-off-by: Tim Moon <[email protected]> * Perform scale-inv update in cast and activation kernels Signed-off-by: Tim Moon <[email protected]> * Perform sclae-inv update in LayerNorm and RMSNorm kernels Signed-off-by: Tim Moon <[email protected]> * Perform scale-inv update after FP8 GEMMs Signed-off-by: Tim Moon <[email protected]> * Fuse casts and scale-inv updates in linear module Signed-off-by: Tim Moon <[email protected]> * Fuse casts and scale-inv updates in layernorm-linear module Signed-off-by: Tim Moon <[email protected]> * Simplify kernel to update FP8 scale-inv Signed-off-by: Tim Moon <[email protected]> * Fix typos Signed-off-by: Tim Moon <[email protected]> * Debug amax update in layernorm kernels Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by: Tim Moon <[email protected]> * Debug ONNX export Use quantization scaling factor in ONNX quantize op. Signed-off-by: Tim Moon <[email protected]> * Review suggestion from @ptrendx Signed-off-by: Tim Moon <[email protected]> * Debug mismatched dtypes Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* update FE to 1.6 Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update to 1.6.1-rc for testing Signed-off-by: Charlene Yang <[email protected]> * update to fe 1.6.1 Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add window_size to AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * add seq_offsets_qkvo for cudnn thd Signed-off-by: Xiaowei Ren <[email protected]> * add seq_offsets_qkvo to AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * fix seq_offsets calculation of cudnn thd Signed-off-by: Xiaowei Ren <[email protected]> * remove a thd assert Signed-off-by: Xiaowei Ren <[email protected]> * fix bias for thd test Signed-off-by: Xiaowei Ren <[email protected]> * add thd test for cudnn FA with CP Signed-off-by: Xiaowei Ren <[email protected]> * skip GQA/MQA test for cuDNN THD Signed-off-by: Xiaowei Ren <[email protected]> * make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1 Signed-off-by: Xiaowei Ren <[email protected]> * fix seq_offsets inputs Signed-off-by: Xiaowei Ren <[email protected]> * remove two comments Signed-off-by: Xiaowei Ren <[email protected]> * fix attn mask type for cudnn thd with cp Signed-off-by: Xiaowei Ren <[email protected]> * fix attn_mask_type check Signed-off-by: Xiaowei Ren <[email protected]> * fix attn_mask_type for cudnn fa with thd Signed-off-by: Xiaowei Ren <[email protected]> * fix a typo Signed-off-by: Xiaowei Ren <[email protected]> * fix out dout in bwd Signed-off-by: Xiaowei Ren <[email protected]> * assert cudnn+thd does not support attn bias Signed-off-by: Xiaowei Ren <[email protected]> * check if attn_mask_type has padding Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * change cp test batch size to 2 Signed-off-by: Xiaowei Ren <[email protected]> * fix code format Signed-off-by: Xiaowei Ren <[email protected]> * fix two assert info Signed-off-by: Xiaowei Ren <[email protected]> * fix assert comment Signed-off-by: Xiaowei Ren <[email protected]> * fix assert comments Signed-off-by: Xiaowei Ren <[email protected]> * minor fix Signed-off-by: Xiaowei Ren <[email protected]> * fix assert comments Signed-off-by: Xiaowei Ren <[email protected]> * assert swa+CP cannot work with thd format Signed-off-by: Xiaowei Ren <[email protected]> * add a new CP function for swa Signed-off-by: Xiaowei Ren <[email protected]> * add a missing dgrads Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * add draft fwd function for swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * enable flash attention for swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * remove an assert of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * call SWAFuncWithCP for swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * use 2hd layout Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change qkv_format check Signed-off-by: Xiaowei Ren <[email protected]> * add a code comment Signed-off-by: Xiaowei Ren <[email protected]> * tensor shape bug fix Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor shape fix Signed-off-by: Xiaowei Ren <[email protected]> * add function to compute cu_seqlens of a cp rank Signed-off-by: Xiaowei Ren <[email protected]> * add cu_seqlens and cu_seqlens_padded to context parallelism Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * fix FlashAttention output sequence length Signed-off-by: Xiaowei Ren <[email protected]> * fix cu_seqlens_kv_per_step calculation Signed-off-by: Xiaowei Ren <[email protected]> * zero dQKV for ending padded tokens Signed-off-by: Xiaowei Ren <[email protected]> * zero dQKV tensors of FlashAttention Signed-off-by: Xiaowei Ren <[email protected]> * fix softmax_lse correction Signed-off-by: Xiaowei Ren <[email protected]> * remove padded tokens of KV to save comounication Signed-off-by: Xiaowei Ren <[email protected]> * do not need to zero dkv for FlashAttention any mroe Signed-off-by: Xiaowei Ren <[email protected]> * zero out tensors Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant code Signed-off-by: Xiaowei Ren <[email protected]> * fix CP unit test Signed-off-by: Xiaowei Ren <[email protected]> * fix kv shape of cp test with thd format Signed-off-by: Xiaowei Ren <[email protected]> * update cp unit test Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add simple code framework Signed-off-by: Xiaowei Ren <[email protected]> * try not to have a separate CP function for SWA Signed-off-by: Xiaowei Ren <[email protected]> * backup some code change Signed-off-by: Xiaowei Ren <[email protected]> * back up code Signed-off-by: Xiaowei Ren <[email protected]> * clean up fwd implementation of SWAFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant code Signed-off-by: Xiaowei Ren <[email protected]> * code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * fix assert info Signed-off-by: Xiaowei Ren <[email protected]> * reduce kv chunk concat overheads Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * make AttnFuncWithCP and SWAFuncWithCP have same API Signed-off-by: Xiaowei Ren <[email protected]> * add a docstring Signed-off-by: Xiaowei Ren <[email protected]> * preliminary implementation of SWAFuncWithCP forward seems working Signed-off-by: Xiaowei Ren <[email protected]> * fix output shape of SWAFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * code refactoring for FlashAttention and add a code placeholder for bwd Signed-off-by: Xiaowei Ren <[email protected]> * use gather_along_first_dim Signed-off-by: Xiaowei Ren <[email protected]> * finish the preliminary implementation of bwd Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant code Signed-off-by: Xiaowei Ren <[email protected]> * fix assert condition Signed-off-by: Xiaowei Ren <[email protected]> * add draft implementation of SWA+CP with FusedAttention Signed-off-by: Xiaowei Ren <[email protected]> * fix attention mask type of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * add qkv_layout Signed-off-by: Xiaowei Ren <[email protected]> * add missing window_size argument Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * fix kv shape of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * bug and typo fix Signed-off-by: Xiaowei Ren <[email protected]> * fix dout shape Signed-off-by: Xiaowei Ren <[email protected]> * add multi stream in fwd of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * save chunk_ids_to_kv_ag in fwd Signed-off-by: Xiaowei Ren <[email protected]> * add multi stream in bwd of swa+cp Signed-off-by: Xiaowei Ren <[email protected]> * minor fix to cp stream sync Signed-off-by: Xiaowei Ren <[email protected]> * rename AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * check if window size is None Signed-off-by: Xiaowei Ren <[email protected]> * fix docstring of AttnFuncWithCP Signed-off-by: Xiaowei Ren <[email protected]> * minor fix Signed-off-by: Xiaowei Ren <[email protected]> * add env var for users to choose KV ag or KV p2p Signed-off-by: Xiaowei Ren <[email protected]> * update cp tests Signed-off-by: Xiaowei Ren <[email protected]> * fix window size in cp unit test Signed-off-by: Xiaowei Ren <[email protected]> * fix pytest skip messages Signed-off-by: Xiaowei Ren <[email protected]> * add cp_comm_type into API Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add deterministic konb in cuDNN fused attn backend Signed-off-by: Xiaowei Ren <[email protected]> * pass fp8 and fp8_meta to attn_func_with_cp Signed-off-by: Xiaowei Ren <[email protected]> * assert only Fused Attn can support FP8+CP Signed-off-by: Xiaowei Ren <[email protected]> * remove redundant assert Signed-off-by: Xiaowei Ren <[email protected]> * add a fwd draft implementation of FP8 + CP Signed-off-by: Xiaowei Ren <[email protected]> * save fp8 and fp8_meta Signed-off-by: Xiaowei Ren <[email protected]> * assert sequence length divisible requirements Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove a redundant qkv_layout compute Signed-off-by: Xiaowei Ren <[email protected]> * if condition change Signed-off-by: Xiaowei Ren <[email protected]> * some typo fix Signed-off-by: Xiaowei Ren <[email protected]> * add support table of context parallelism Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo and code format fix Signed-off-by: Xiaowei Ren <[email protected]> * do not print multiple disabling messages Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix aux_ctx_tensors of FP8 Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * fix device in torch.arange and adjust code for the PR of MLA Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * commit code change for FP8+CP Signed-off-by: Xiaowei Ren <[email protected]> * commit more code change for FP8+CP Signed-off-by: Xiaowei Ren <[email protected]> * commit more fp8 code for FP8+CP Signed-off-by: Xiaowei Ren <[email protected]> * bug fixes Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * cast merged CP results from FP32 to BF16 Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * fix softmax_lse Signed-off-by: Xiaowei Ren <[email protected]> * fix some bugs of FP8 dkv exchange Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * add FP8 unit test Signed-off-by: Xiaowei Ren <[email protected]> * fix typos and clean asserts Signed-off-by: Xiaowei Ren <[email protected]> * fix get_p2p_comm_info Signed-off-by: Xiaowei Ren <[email protected]> * fix dkv p2p exchange Signed-off-by: Xiaowei Ren <[email protected]> * minor fix Signed-off-by: Xiaowei Ren <[email protected]> * change FP8 dkv P2P to A2A Signed-off-by: Xiaowei Ren <[email protected]> * add FP8+CP unit test Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * assert amax reduction is needed for FP8+CP Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicated code Signed-off-by: Xiaowei Ren <[email protected]> * destroy process group in CP unit test Signed-off-by: Xiaowei Ren <[email protected]> * remove interval from fp8_recipe because it has been deprecated Signed-off-by: Xiaowei Ren <[email protected]> * try to fix the failed CP test with the latest CI pipeline Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant f before string Signed-off-by: Xiaowei Ren <[email protected]> * change META_O_CP Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]> Co-authored-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xiaowei Ren <[email protected]>
#1073) * add support for padding in UnfusedDPA Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for padding_causal/_bottom_right Signed-off-by: Charlene Yang <[email protected]> * fix padding_causal/_bottom_right Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * need to test max512 backend Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert last commit Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mask logic in unfused Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use actual_seqlen for alibi/causal_bottom_right padding Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes and convert causal to causal_bottom_right for inference Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use causal in kv cache inference test Signed-off-by: Charlene Yang <[email protected]> * simplify get_alibi logic Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * simplify the non-padding path for get_alibi Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid batch_size loop in generating padding_causal/_bottom_right masks Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Re-add framework specific required dependencies for source build Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * fix build Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
* Add permutation functions * Add permutation ops * Remove the dependency on cutlass * Move permutation.py out of module dir * Rewrite the unit test and enable skipping if FP8 is unavailable * Rename exposed C++ API and reorder its parameters + take NVTETensor as inputs * Use Float8Tensor for FP8 input * Move dtype to ctx --------- Signed-off-by: Jiang Shao <[email protected]> Co-authored-by: Qi Zhang <[email protected]> Co-authored-by: Phuong Nguyen <[email protected]>
* Use jit instead of pjit --------- Signed-off-by: Frederic Bastien <[email protected]> Co-authored-by: Phuong Nguyen <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
* WIP: add fa3 Signed-off-by: Charlene Yang <[email protected]> * WIP: clean up Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add benchmarks Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * differentiate func/varlen_func Signed-off-by: Charlene Yang <[email protected]> * fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func Signed-off-by: Charlene Yang <[email protected]> * WIP: add FP8 fwd support Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add FA3 FP8 fwd code and test Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix assert for FA3 Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA3 FP8 logic and add tests Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA2 to <=2.6.3 Signed-off-by: Charlene Yang <[email protected]> * tweak unit tests for base/mask Signed-off-by: Charlene Yang <[email protected]> * fix lint Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * set constraints for FA3 for sm90 and causal_bottom_right Signed-off-by: Charlene Yang <[email protected]> * revert debug changes in benchmark script Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Limit number of architectures build Signed-off-by: Lukasz Pierscieniewski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Lukasz Pierscieniewski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>
bump cudnn-frontend version Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
phu0ngng
had a problem deploying
to
github-pages
August 26, 2024 14:45 — with
GitHub Actions
Failure
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )