Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from NVIDIA:main #27

Merged
merged 8 commits into from
Oct 29, 2024
Merged

[pull] main from NVIDIA:main #27

merged 8 commits into from
Oct 29, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Oct 27, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

phu0ngng and others added 4 commits October 25, 2024 10:48
…aph (#1285)

Update jax version for ffi

Signed-off-by: Phuong Nguyen <[email protected]>
* Update dependencies for building documentation

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Patch RTD theme 3.0.0 to include version selector in sidebar

Signed-off-by: Tim Moon <[email protected]>

* Debug printing version in sidebar

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* add THD MQA/GQA

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nvte_get_fused_attn_backend

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* WIP: add max_t support for THD

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* WIP: save tensors for debug and point to new FE

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix stats in bwd

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix stats in fwd

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add docstring for DPA

Signed-off-by: Charlene Yang <[email protected]>

* add docstring

Signed-off-by: Charlene Yang <[email protected]>

* WIP: first try on adding max_b and max_t

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit c3d522e9f5aef3c8ddfec5bf6ff24c3db97bb059.

Signed-off-by: Charlene Yang <[email protected]>

* Revert "WIP: first try on adding max_b and max_t"

This reverts commit 3bc01eb.

Signed-off-by: Charlene Yang <[email protected]>

* update docstring and fix max_seqlen logic for thd

Signed-off-by: Charlene Yang <[email protected]>

* revert two lines of change in docstring

Signed-off-by: Charlene Yang <[email protected]>

* WIP: add get_max_b/t

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix max_seqlen code and docstring

Signed-off-by: Charlene Yang <[email protected]>

* sucess: add max_b/max_t

Signed-off-by: Charlene Yang <[email protected]>

* remove debug code

Signed-off-by: Charlene Yang <[email protected]>

* change max_b/max_t buckets

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix b vs orig_b

Signed-off-by: Charlene Yang <[email protected]>

* fix b vs orig_b with 0 fill

Signed-off-by: Charlene Yang <[email protected]>

* update FE for T3HD/TH3D

Signed-off-by: Charlene Yang <[email protected]>

* add max_b to conversion kernels

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lint

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix changes after last merge

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Jax support for max_t

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update FE to 1.8.0-rc

Signed-off-by: Charlene Yang <[email protected]>

* update FE to 1.8.0

Signed-off-by: Charlene Yang <[email protected]>

* code review/formating fixes

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix Stats shape for <9.6

Signed-off-by: Charlene Yang <[email protected]>

* return nullptr for offset_stats when cudnn < 9.6

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more version control

Signed-off-by: Charlene Yang <[email protected]>

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@pull pull bot added the ⤵️ pull label Oct 27, 2024
timmoon10 and others added 4 commits October 28, 2024 16:47
* Add fallback for fast param getter

Signed-off-by: Tim Moon <[email protected]>

* Remove fast param getter

Signed-off-by: Tim Moon <[email protected]>

* Fix linter warning

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
…nd moved to TE/common (#1067)

* moved userbuffers code to TE/common

Signed-off-by: Alp Dener <[email protected]>

* moved comm+GEMM overlap code to TE/common

Signed-off-by: Alp Dener <[email protected]>

* removed PyTorch depdency from comm+GEMM overlap in TE/common

Signed-off-by: Alp Dener <[email protected]>

* added TE/PyTorch wrappers for refactored comm+GEMM overlap code in TE/common

Signed-off-by: Alp Dener <[email protected]>

* updated TE/PyTorch Python API to match the refactored comm+GEMM overlap code

Signed-off-by: Alp Dener <[email protected]>

* updated unit tests to work with refactored comm+GEMM overlap code

Signed-off-by: Alp Dener <[email protected]>

* added a pylint exception to comm+GEMM overlap test runner

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing linting errors

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added documentation for te.initialize_ub

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed compile errors when building with NVTE_UB_WITH_MPI=1

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed default bootstrap backend

Signed-off-by: Alp Dener <[email protected]>

* switched default bootstrap backend priority to MPI > Gloo > NCCL

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated bootstrap backend documentation

Signed-off-by: Alp Dener <[email protected]>

* close UB bootstrap socket to avoid interfering with CUDA Multicast shareable file handle send/recv

Signed-off-by: Alp Dener <[email protected]>

* added torch::Tensor wrappers for communication buffer and atomic counters so PyTorch can factor externally allocated memory into its garbage collection threshold

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* automated handling of world, local and node ranks/sizes within C++ CommOverlapHelper to simplify Python function signatures

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed incorrect read of environment variables

Signed-off-by: Alp Dener <[email protected]>

* corrected priority for _SOCKET_IFNAME environment variables in UB bootstrapping

Signed-off-by: Alp Dener <[email protected]>

* moved multicast support check to cuda_runtime.h and replaced cudaDeviceGetProp call with cached sm_count()

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed commented out old code and replaced external collective function type defines with aliases

Signed-off-by: Alp Dener <[email protected]>

* compile-time CUDA version guard for CUDA Driver Multicast attribute

Signed-off-by: Alp Dener <[email protected]>

* added compile-time CUDA version guards to Multicast code in Userbuffers

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* condensed UB docs, corrected const violations

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed autodoc rst for UB calls, added CUDA version guard on Multicast UB kernels

Signed-off-by: Alp Dener <[email protected]>

* fixed incorrect UB type reporting for P2P overlaps, comment reformatting

Signed-off-by: Alp Dener <[email protected]>

* add docstring to tex.ubuf_built_with_mpi()

Signed-off-by: Alp Dener <[email protected]>

---------

Signed-off-by: Alp Dener <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
skip some t3hd/th3d tests for MQA/GQA

Signed-off-by: Charlene Yang <[email protected]>
* check if GPU is available

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@phu0ngng phu0ngng merged commit 8bdb54f into phu0ngng:main Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants