Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from NVIDIA:main #21

Merged
merged 22 commits into from
Aug 26, 2024
Merged

[pull] main from NVIDIA:main #21

merged 22 commits into from
Aug 26, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Aug 21, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

phu0ngng and others added 15 commits August 14, 2024 14:50
Remove total time measurement

Signed-off-by: Phuong Nguyen <[email protected]>
* add default path for ffi include

* add an option to get XLA_HOME from env

---------

Signed-off-by: Phuong Nguyen <[email protected]>
* Bump minimum CUDA version to 12.0

Signed-off-by: Tim Moon <[email protected]>

* Debug CUDA version check

Signed-off-by: Tim Moon <[email protected]>

* Debug CMake build

Signed-off-by: Tim Moon <[email protected]>

* Review suggestions from @ksivaman and @ptrendx

Remove logic for CUDA <12.0 in PyTorch and Paddle builds. Update version in docs and README.

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Markus Schnoes <[email protected]>
Co-authored-by: Charlene Yang <[email protected]>
fix typos regarding t in thd

Signed-off-by: Charlene Yang <[email protected]>
* add window_size to AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* add seq_offsets_qkvo for cudnn thd

Signed-off-by: Xiaowei Ren <[email protected]>

* add seq_offsets_qkvo to AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* fix seq_offsets calculation of cudnn thd

Signed-off-by: Xiaowei Ren <[email protected]>

* remove a thd assert

Signed-off-by: Xiaowei Ren <[email protected]>

* fix bias for thd test

Signed-off-by: Xiaowei Ren <[email protected]>

* add thd test for cudnn FA with CP

Signed-off-by: Xiaowei Ren <[email protected]>

* skip GQA/MQA test for cuDNN THD

Signed-off-by: Xiaowei Ren <[email protected]>

* make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1

Signed-off-by: Xiaowei Ren <[email protected]>

* fix seq_offsets inputs

Signed-off-by: Xiaowei Ren <[email protected]>

* remove two comments

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attn mask type for cudnn thd with cp

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attn_mask_type check

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attn_mask_type for cudnn fa with thd

Signed-off-by: Xiaowei Ren <[email protected]>

* fix a typo

Signed-off-by: Xiaowei Ren <[email protected]>

* fix out dout in bwd

Signed-off-by: Xiaowei Ren <[email protected]>

* assert cudnn+thd does not support attn bias

Signed-off-by: Xiaowei Ren <[email protected]>

* check if attn_mask_type has padding

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* change cp test batch size to 2

Signed-off-by: Xiaowei Ren <[email protected]>

* fix code format

Signed-off-by: Xiaowei Ren <[email protected]>

* fix two assert info

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert comment

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert comments

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert comments

Signed-off-by: Xiaowei Ren <[email protected]>

* assert swa+CP cannot work with thd format

Signed-off-by: Xiaowei Ren <[email protected]>

* add a new CP function for swa

Signed-off-by: Xiaowei Ren <[email protected]>

* add a missing dgrads

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* add draft fwd function for swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* enable flash attention for swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* remove an assert of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* call SWAFuncWithCP for swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* use 2hd layout

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change qkv_format check

Signed-off-by: Xiaowei Ren <[email protected]>

* add a code comment

Signed-off-by: Xiaowei Ren <[email protected]>

* tensor shape bug fix

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tensor shape fix

Signed-off-by: Xiaowei Ren <[email protected]>

* add function to compute cu_seqlens of a cp rank

Signed-off-by: Xiaowei Ren <[email protected]>

* add cu_seqlens and cu_seqlens_padded to context parallelism

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* fix FlashAttention output sequence length

Signed-off-by: Xiaowei Ren <[email protected]>

* fix cu_seqlens_kv_per_step calculation

Signed-off-by: Xiaowei Ren <[email protected]>

* zero dQKV for ending padded tokens

Signed-off-by: Xiaowei Ren <[email protected]>

* zero dQKV tensors of FlashAttention

Signed-off-by: Xiaowei Ren <[email protected]>

* fix softmax_lse correction

Signed-off-by: Xiaowei Ren <[email protected]>

* remove padded tokens of KV to save comounication

Signed-off-by: Xiaowei Ren <[email protected]>

* do not need to zero dkv for FlashAttention any mroe

Signed-off-by: Xiaowei Ren <[email protected]>

* zero out tensors

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant code

Signed-off-by: Xiaowei Ren <[email protected]>

* fix CP unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* fix kv shape of cp test with thd format

Signed-off-by: Xiaowei Ren <[email protected]>

* update cp unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add simple code framework

Signed-off-by: Xiaowei Ren <[email protected]>

* try not to have a separate CP function for SWA

Signed-off-by: Xiaowei Ren <[email protected]>

* backup some code change

Signed-off-by: Xiaowei Ren <[email protected]>

* back up code

Signed-off-by: Xiaowei Ren <[email protected]>

* clean up fwd implementation of SWAFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant code

Signed-off-by: Xiaowei Ren <[email protected]>

* code cleaning

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert info

Signed-off-by: Xiaowei Ren <[email protected]>

* reduce kv chunk concat overheads

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* make AttnFuncWithCP and SWAFuncWithCP have same API

Signed-off-by: Xiaowei Ren <[email protected]>

* add a docstring

Signed-off-by: Xiaowei Ren <[email protected]>

* preliminary implementation of SWAFuncWithCP forward seems working

Signed-off-by: Xiaowei Ren <[email protected]>

* fix output shape of SWAFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* code refactoring for FlashAttention and add a code placeholder for bwd

Signed-off-by: Xiaowei Ren <[email protected]>

* use gather_along_first_dim

Signed-off-by: Xiaowei Ren <[email protected]>

* finish the preliminary implementation of bwd

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant code

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert condition

Signed-off-by: Xiaowei Ren <[email protected]>

* add draft implementation of SWA+CP with FusedAttention

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attention mask type of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* code cleaning

Signed-off-by: Xiaowei Ren <[email protected]>

* add qkv_layout

Signed-off-by: Xiaowei Ren <[email protected]>

* add missing window_size argument

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix kv shape of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* bug and typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix dout shape

Signed-off-by: Xiaowei Ren <[email protected]>

* add multi stream in fwd of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* save chunk_ids_to_kv_ag in fwd

Signed-off-by: Xiaowei Ren <[email protected]>

* add multi stream in bwd of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix to cp stream sync

Signed-off-by: Xiaowei Ren <[email protected]>

* rename AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* check if window size is None

Signed-off-by: Xiaowei Ren <[email protected]>

* fix docstring of AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix

Signed-off-by: Xiaowei Ren <[email protected]>

* add env var for users to choose KV ag or KV p2p

Signed-off-by: Xiaowei Ren <[email protected]>

* update cp tests

Signed-off-by: Xiaowei Ren <[email protected]>

* fix window size in cp unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* fix pytest skip messages

Signed-off-by: Xiaowei Ren <[email protected]>

* add cp_comm_type into API

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code cleaning

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* assert sequence length divisible requirements

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support table of context parallelism

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo and code format fix

Signed-off-by: Xiaowei Ren <[email protected]>

* do not print multiple disabling messages

Signed-off-by: Xiaowei Ren <[email protected]>

* bug fix

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix device in torch.arange and adjust code for the PR of MLA

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typos and clean asserts

Signed-off-by: Xiaowei Ren <[email protected]>

---------

Signed-off-by: Xiaowei Ren <[email protected]>
Co-authored-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xiaowei Ren <[email protected]>
* support dtype casting fusion in FusedAdam

Signed-off-by: Shijie Wang <[email protected]>

* minor changes

Signed-off-by: Shijie Wang <[email protected]>

* fix lint

Signed-off-by: Shijie Wang <[email protected]>

* changes based on review comments

Signed-off-by: Shijie Wang <[email protected]>

* remove unused code

Signed-off-by: Shijie Wang <[email protected]>

* code refactor

Signed-off-by: Shijie Wang <[email protected]>

* fix typo

Signed-off-by: Shijie Wang <[email protected]>

* refactor

Signed-off-by: Shijie Wang <[email protected]>

* remove unused code

Signed-off-by: Shijie Wang <[email protected]>

* Fix linter warnings

Signed-off-by: Tim Moon <[email protected]>

* Copy CUDA headers for framework sdists

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Shijie Wang <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Frederic Bastien <[email protected]>
Co-authored-by: Phuong Nguyen <[email protected]>
…t sequence length parameters (#1066)

* Added ability for seqlen for transformer and mha layer

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Documentation for new parameters

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Add tests for THD layout, assert for THD layout with KV-Cache

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Fixed tests

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move THD logic in shape calculation, add missing optional in params

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Skip the THD test on GPUs older than Ampere

Signed-off-by: Przemek Tredak <[email protected]>

---------

Signed-off-by: Lukasz Pierscieniewski <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Przemek Tredak <[email protected]>
* Perform scale-inv update in cast-transpose kernels

Signed-off-by: Tim Moon <[email protected]>

* Perform scale-inv update in cast and activation kernels

Signed-off-by: Tim Moon <[email protected]>

* Perform sclae-inv update in LayerNorm and RMSNorm kernels

Signed-off-by: Tim Moon <[email protected]>

* Perform scale-inv update after FP8 GEMMs

Signed-off-by: Tim Moon <[email protected]>

* Fuse casts and scale-inv updates in linear module

Signed-off-by: Tim Moon <[email protected]>

* Fuse casts and scale-inv updates in layernorm-linear module

Signed-off-by: Tim Moon <[email protected]>

* Simplify kernel to update FP8 scale-inv

Signed-off-by: Tim Moon <[email protected]>

* Fix typos

Signed-off-by: Tim Moon <[email protected]>

* Debug amax update in layernorm kernels

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug test failures

Signed-off-by: Tim Moon <[email protected]>

* Debug ONNX export

Use quantization scaling factor in ONNX quantize op.

Signed-off-by: Tim Moon <[email protected]>

* Review suggestion from @ptrendx

Signed-off-by: Tim Moon <[email protected]>

* Debug mismatched dtypes

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* update FE to 1.6

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update to 1.6.1-rc for testing

Signed-off-by: Charlene Yang <[email protected]>

* update to fe 1.6.1

Signed-off-by: Charlene Yang <[email protected]>

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add window_size to AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* add seq_offsets_qkvo for cudnn thd

Signed-off-by: Xiaowei Ren <[email protected]>

* add seq_offsets_qkvo to AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* fix seq_offsets calculation of cudnn thd

Signed-off-by: Xiaowei Ren <[email protected]>

* remove a thd assert

Signed-off-by: Xiaowei Ren <[email protected]>

* fix bias for thd test

Signed-off-by: Xiaowei Ren <[email protected]>

* add thd test for cudnn FA with CP

Signed-off-by: Xiaowei Ren <[email protected]>

* skip GQA/MQA test for cuDNN THD

Signed-off-by: Xiaowei Ren <[email protected]>

* make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1

Signed-off-by: Xiaowei Ren <[email protected]>

* fix seq_offsets inputs

Signed-off-by: Xiaowei Ren <[email protected]>

* remove two comments

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attn mask type for cudnn thd with cp

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attn_mask_type check

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attn_mask_type for cudnn fa with thd

Signed-off-by: Xiaowei Ren <[email protected]>

* fix a typo

Signed-off-by: Xiaowei Ren <[email protected]>

* fix out dout in bwd

Signed-off-by: Xiaowei Ren <[email protected]>

* assert cudnn+thd does not support attn bias

Signed-off-by: Xiaowei Ren <[email protected]>

* check if attn_mask_type has padding

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* change cp test batch size to 2

Signed-off-by: Xiaowei Ren <[email protected]>

* fix code format

Signed-off-by: Xiaowei Ren <[email protected]>

* fix two assert info

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert comment

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert comments

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert comments

Signed-off-by: Xiaowei Ren <[email protected]>

* assert swa+CP cannot work with thd format

Signed-off-by: Xiaowei Ren <[email protected]>

* add a new CP function for swa

Signed-off-by: Xiaowei Ren <[email protected]>

* add a missing dgrads

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* add draft fwd function for swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* enable flash attention for swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* remove an assert of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* call SWAFuncWithCP for swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* use 2hd layout

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change qkv_format check

Signed-off-by: Xiaowei Ren <[email protected]>

* add a code comment

Signed-off-by: Xiaowei Ren <[email protected]>

* tensor shape bug fix

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tensor shape fix

Signed-off-by: Xiaowei Ren <[email protected]>

* add function to compute cu_seqlens of a cp rank

Signed-off-by: Xiaowei Ren <[email protected]>

* add cu_seqlens and cu_seqlens_padded to context parallelism

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* fix FlashAttention output sequence length

Signed-off-by: Xiaowei Ren <[email protected]>

* fix cu_seqlens_kv_per_step calculation

Signed-off-by: Xiaowei Ren <[email protected]>

* zero dQKV for ending padded tokens

Signed-off-by: Xiaowei Ren <[email protected]>

* zero dQKV tensors of FlashAttention

Signed-off-by: Xiaowei Ren <[email protected]>

* fix softmax_lse correction

Signed-off-by: Xiaowei Ren <[email protected]>

* remove padded tokens of KV to save comounication

Signed-off-by: Xiaowei Ren <[email protected]>

* do not need to zero dkv for FlashAttention any mroe

Signed-off-by: Xiaowei Ren <[email protected]>

* zero out tensors

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant code

Signed-off-by: Xiaowei Ren <[email protected]>

* fix CP unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* fix kv shape of cp test with thd format

Signed-off-by: Xiaowei Ren <[email protected]>

* update cp unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add simple code framework

Signed-off-by: Xiaowei Ren <[email protected]>

* try not to have a separate CP function for SWA

Signed-off-by: Xiaowei Ren <[email protected]>

* backup some code change

Signed-off-by: Xiaowei Ren <[email protected]>

* back up code

Signed-off-by: Xiaowei Ren <[email protected]>

* clean up fwd implementation of SWAFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant code

Signed-off-by: Xiaowei Ren <[email protected]>

* code cleaning

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert info

Signed-off-by: Xiaowei Ren <[email protected]>

* reduce kv chunk concat overheads

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* make AttnFuncWithCP and SWAFuncWithCP have same API

Signed-off-by: Xiaowei Ren <[email protected]>

* add a docstring

Signed-off-by: Xiaowei Ren <[email protected]>

* preliminary implementation of SWAFuncWithCP forward seems working

Signed-off-by: Xiaowei Ren <[email protected]>

* fix output shape of SWAFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* code refactoring for FlashAttention and add a code placeholder for bwd

Signed-off-by: Xiaowei Ren <[email protected]>

* use gather_along_first_dim

Signed-off-by: Xiaowei Ren <[email protected]>

* finish the preliminary implementation of bwd

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant code

Signed-off-by: Xiaowei Ren <[email protected]>

* fix assert condition

Signed-off-by: Xiaowei Ren <[email protected]>

* add draft implementation of SWA+CP with FusedAttention

Signed-off-by: Xiaowei Ren <[email protected]>

* fix attention mask type of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* code cleaning

Signed-off-by: Xiaowei Ren <[email protected]>

* add qkv_layout

Signed-off-by: Xiaowei Ren <[email protected]>

* add missing window_size argument

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix kv shape of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* bug and typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix dout shape

Signed-off-by: Xiaowei Ren <[email protected]>

* add multi stream in fwd of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* save chunk_ids_to_kv_ag in fwd

Signed-off-by: Xiaowei Ren <[email protected]>

* add multi stream in bwd of swa+cp

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix to cp stream sync

Signed-off-by: Xiaowei Ren <[email protected]>

* rename AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* check if window size is None

Signed-off-by: Xiaowei Ren <[email protected]>

* fix docstring of AttnFuncWithCP

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix

Signed-off-by: Xiaowei Ren <[email protected]>

* add env var for users to choose KV ag or KV p2p

Signed-off-by: Xiaowei Ren <[email protected]>

* update cp tests

Signed-off-by: Xiaowei Ren <[email protected]>

* fix window size in cp unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* fix pytest skip messages

Signed-off-by: Xiaowei Ren <[email protected]>

* add cp_comm_type into API

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code cleaning

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add deterministic konb in cuDNN fused attn backend

Signed-off-by: Xiaowei Ren <[email protected]>

* pass fp8 and fp8_meta to attn_func_with_cp

Signed-off-by: Xiaowei Ren <[email protected]>

* assert only Fused Attn can support FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* remove redundant assert

Signed-off-by: Xiaowei Ren <[email protected]>

* add a fwd draft implementation of FP8 + CP

Signed-off-by: Xiaowei Ren <[email protected]>

* save fp8 and fp8_meta

Signed-off-by: Xiaowei Ren <[email protected]>

* assert sequence length divisible requirements

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove a redundant qkv_layout compute

Signed-off-by: Xiaowei Ren <[email protected]>

* if condition change

Signed-off-by: Xiaowei Ren <[email protected]>

* some typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* add support table of context parallelism

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo and code format fix

Signed-off-by: Xiaowei Ren <[email protected]>

* do not print multiple disabling messages

Signed-off-by: Xiaowei Ren <[email protected]>

* bug fix

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix aux_ctx_tensors of FP8

Signed-off-by: Xiaowei Ren <[email protected]>

* bug fix

Signed-off-by: Xiaowei Ren <[email protected]>

* fix device in torch.arange and adjust code for the PR of MLA

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* commit code change for FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* commit more code change for FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* commit more fp8 code for FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* bug fixes

Signed-off-by: Xiaowei Ren <[email protected]>

* bug fix

Signed-off-by: Xiaowei Ren <[email protected]>

* cast merged CP results from FP32 to BF16

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change

Signed-off-by: Xiaowei Ren <[email protected]>

* fix softmax_lse

Signed-off-by: Xiaowei Ren <[email protected]>

* fix some bugs of FP8 dkv exchange

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* add FP8 unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* fix typos and clean asserts

Signed-off-by: Xiaowei Ren <[email protected]>

* fix get_p2p_comm_info

Signed-off-by: Xiaowei Ren <[email protected]>

* fix dkv p2p exchange

Signed-off-by: Xiaowei Ren <[email protected]>

* minor fix

Signed-off-by: Xiaowei Ren <[email protected]>

* change FP8 dkv P2P to A2A

Signed-off-by: Xiaowei Ren <[email protected]>

* add FP8+CP unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* assert amax reduction is needed for FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicated code

Signed-off-by: Xiaowei Ren <[email protected]>

* destroy process group in CP unit test

Signed-off-by: Xiaowei Ren <[email protected]>

* remove interval from fp8_recipe because it has been deprecated

Signed-off-by: Xiaowei Ren <[email protected]>

* try to fix the failed CP test with the latest CI pipeline

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove redundant f before string

Signed-off-by: Xiaowei Ren <[email protected]>

* change META_O_CP

Signed-off-by: Xiaowei Ren <[email protected]>

---------

Signed-off-by: Xiaowei Ren <[email protected]>
Co-authored-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xiaowei Ren <[email protected]>
#1073)

* add support for padding in UnfusedDPA

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for padding_causal/_bottom_right

Signed-off-by: Charlene Yang <[email protected]>

* fix padding_causal/_bottom_right

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* need to test max512 backend

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert last commit

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix mask logic in unfused

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use actual_seqlen for alibi/causal_bottom_right padding

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lint

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes and convert causal to causal_bottom_right for inference

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use causal in kv cache inference test

Signed-off-by: Charlene Yang <[email protected]>

* simplify get_alibi logic

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* simplify the non-padding path for get_alibi

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid batch_size loop in generating padding_causal/_bottom_right masks

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@pull pull bot added the ⤵️ pull label Aug 21, 2024
ksivaman and others added 7 commits August 21, 2024 22:33
* Re-add framework specific required dependencies for source build

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* fix build

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Fix

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

---------

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
* Add permutation functions

* Add permutation ops

* Remove the dependency on cutlass

* Move permutation.py out of module dir

* Rewrite the unit test and enable skipping if FP8 is unavailable

* Rename exposed C++ API and reorder its parameters + take NVTETensor as inputs

* Use Float8Tensor for FP8 input

* Move dtype to ctx

---------

Signed-off-by: Jiang Shao <[email protected]>
Co-authored-by: Qi Zhang <[email protected]>
Co-authored-by: Phuong Nguyen <[email protected]>
* Use jit instead of pjit

---------

Signed-off-by: Frederic Bastien <[email protected]>
Co-authored-by: Phuong Nguyen <[email protected]>
* WIP: add fa3

Signed-off-by: Charlene Yang <[email protected]>

* WIP: clean up

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* WIP: add benchmarks

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* differentiate func/varlen_func

Signed-off-by: Charlene Yang <[email protected]>

* fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func

Signed-off-by: Charlene Yang <[email protected]>

* WIP: add FP8 fwd support

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add FA3 FP8 fwd code and test

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix assert for FA3

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix FA3 FP8 logic and add tests

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update FA2 to <=2.6.3

Signed-off-by: Charlene Yang <[email protected]>

* tweak unit tests for base/mask

Signed-off-by: Charlene Yang <[email protected]>

* fix lint

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lint

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lint

Signed-off-by: Charlene Yang <[email protected]>

* set constraints for FA3 for sm90 and causal_bottom_right

Signed-off-by: Charlene Yang <[email protected]>

* revert debug changes in benchmark script

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Limit number of architectures build

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Lukasz Pierscieniewski <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
bump cudnn-frontend version

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
@phu0ngng phu0ngng merged commit 7fc50f4 into phu0ngng:main Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.