forked from NVIDIA/TransformerEngine
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from NVIDIA:main #27
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…aph (#1285) Update jax version for ffi Signed-off-by: Phuong Nguyen <[email protected]>
* Update dependencies for building documentation Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Patch RTD theme 3.0.0 to include version selector in sidebar Signed-off-by: Tim Moon <[email protected]> * Debug printing version in sidebar Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* add THD MQA/GQA Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nvte_get_fused_attn_backend Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* WIP: add max_t support for THD Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: save tensors for debug and point to new FE Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stats in bwd Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stats in fwd Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for DPA Signed-off-by: Charlene Yang <[email protected]> * add docstring Signed-off-by: Charlene Yang <[email protected]> * WIP: first try on adding max_b and max_t Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit c3d522e9f5aef3c8ddfec5bf6ff24c3db97bb059. Signed-off-by: Charlene Yang <[email protected]> * Revert "WIP: first try on adding max_b and max_t" This reverts commit 3bc01eb. Signed-off-by: Charlene Yang <[email protected]> * update docstring and fix max_seqlen logic for thd Signed-off-by: Charlene Yang <[email protected]> * revert two lines of change in docstring Signed-off-by: Charlene Yang <[email protected]> * WIP: add get_max_b/t Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix max_seqlen code and docstring Signed-off-by: Charlene Yang <[email protected]> * sucess: add max_b/max_t Signed-off-by: Charlene Yang <[email protected]> * remove debug code Signed-off-by: Charlene Yang <[email protected]> * change max_b/max_t buckets Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix b vs orig_b Signed-off-by: Charlene Yang <[email protected]> * fix b vs orig_b with 0 fill Signed-off-by: Charlene Yang <[email protected]> * update FE for T3HD/TH3D Signed-off-by: Charlene Yang <[email protected]> * add max_b to conversion kernels Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix changes after last merge Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Jax support for max_t Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.8.0-rc Signed-off-by: Charlene Yang <[email protected]> * update FE to 1.8.0 Signed-off-by: Charlene Yang <[email protected]> * code review/formating fixes Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Stats shape for <9.6 Signed-off-by: Charlene Yang <[email protected]> * return nullptr for offset_stats when cudnn < 9.6 Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more version control Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add fallback for fast param getter Signed-off-by: Tim Moon <[email protected]> * Remove fast param getter Signed-off-by: Tim Moon <[email protected]> * Fix linter warning Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
…nd moved to TE/common (#1067) * moved userbuffers code to TE/common Signed-off-by: Alp Dener <[email protected]> * moved comm+GEMM overlap code to TE/common Signed-off-by: Alp Dener <[email protected]> * removed PyTorch depdency from comm+GEMM overlap in TE/common Signed-off-by: Alp Dener <[email protected]> * added TE/PyTorch wrappers for refactored comm+GEMM overlap code in TE/common Signed-off-by: Alp Dener <[email protected]> * updated TE/PyTorch Python API to match the refactored comm+GEMM overlap code Signed-off-by: Alp Dener <[email protected]> * updated unit tests to work with refactored comm+GEMM overlap code Signed-off-by: Alp Dener <[email protected]> * added a pylint exception to comm+GEMM overlap test runner Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing linting errors Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added documentation for te.initialize_ub Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed compile errors when building with NVTE_UB_WITH_MPI=1 Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed default bootstrap backend Signed-off-by: Alp Dener <[email protected]> * switched default bootstrap backend priority to MPI > Gloo > NCCL Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated bootstrap backend documentation Signed-off-by: Alp Dener <[email protected]> * close UB bootstrap socket to avoid interfering with CUDA Multicast shareable file handle send/recv Signed-off-by: Alp Dener <[email protected]> * added torch::Tensor wrappers for communication buffer and atomic counters so PyTorch can factor externally allocated memory into its garbage collection threshold Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * automated handling of world, local and node ranks/sizes within C++ CommOverlapHelper to simplify Python function signatures Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed incorrect read of environment variables Signed-off-by: Alp Dener <[email protected]> * corrected priority for _SOCKET_IFNAME environment variables in UB bootstrapping Signed-off-by: Alp Dener <[email protected]> * moved multicast support check to cuda_runtime.h and replaced cudaDeviceGetProp call with cached sm_count() Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed commented out old code and replaced external collective function type defines with aliases Signed-off-by: Alp Dener <[email protected]> * compile-time CUDA version guard for CUDA Driver Multicast attribute Signed-off-by: Alp Dener <[email protected]> * added compile-time CUDA version guards to Multicast code in Userbuffers Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * condensed UB docs, corrected const violations Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed autodoc rst for UB calls, added CUDA version guard on Multicast UB kernels Signed-off-by: Alp Dener <[email protected]> * fixed incorrect UB type reporting for P2P overlaps, comment reformatting Signed-off-by: Alp Dener <[email protected]> * add docstring to tex.ubuf_built_with_mpi() Signed-off-by: Alp Dener <[email protected]> --------- Signed-off-by: Alp Dener <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
skip some t3hd/th3d tests for MQA/GQA Signed-off-by: Charlene Yang <[email protected]>
* check if GPU is available Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )