forked from NVIDIA/TransformerEngine
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from NVIDIA:main #26
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…sequences (#1179) Modify unit tests to work around cuDNN 9.4 regression. Signed-off-by: Michael Goldfarb <[email protected]>
add dtensor support for te optimizers Signed-off-by: jasonwan <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Update list of CI users Signed-off-by: Tim Moon <[email protected]>
Implementation of context parallel fused attention using all-gather. Signed-off-by: Michael Goldfarb <[email protected]>
…g cuDNN and NVRTC (#1183) Defaulted CUDA_HOME/CUDA_PATH to /usr/local/cuda when attempting to dynamically load cuDNN and NVRTC Signed-off-by: Alp Dener <[email protected]>
Signed-off-by: Przemyslaw Tredak <[email protected]>
Allow specifying cmake directory Signed-off-by: Ryan Li <[email protected]> Co-authored-by: Ryan Li <[email protected]>
* Add PyPI install instructions Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Review from @timmoon10 Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Port optimizer tests to pytest Signed-off-by: Tim Moon <[email protected]>
…1175) * Check if network interface name is valid and show useful warning message when initializing Userbuffers Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix formatting issue in warning message. Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alp Dener <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>
* make rotary_base arg Signed-off-by: Sudhakar Singh <[email protected]> * rotary base can be a float Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Sudhakar Singh <[email protected]> --------- Signed-off-by: Sudhakar Singh <[email protected]> Co-authored-by: Tim Moon <[email protected]>
* relax contiguous check for flash attention Signed-off-by: Xin Yao <[email protected]> * force contiguous for cp Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]>
* allow tutorial to download the model weights automatically Signed-off-by: Sudhakar Singh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * allow users to provide weight cache directory Signed-off-by: Sudhakar Singh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sudhakar Singh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Restore compatibility with Python 3.8 Signed-off-by: Przemyslaw Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemyslaw Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Add @pggPL to list of CI users Signed-off-by: Tim Moon <[email protected]>
* Allow to pass architectures like 90a, without being overriden Signed-off-by: aurianer <[email protected]> * Review suggestion from @timmoon10 Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: aurianer <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]>
Add new users to CI Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
* fix NVTE_UB_WITH_MPI read Signed-off-by: Sangkug Lym <[email protected]> * Add default value Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* Docs fixes Signed-off-by: Pawel Gadzinski <[email protected]> * docs fix Signed-off-by: Pawel Gadzinski <[email protected]> * docs fix Signed-off-by: Pawel Gadzinski <[email protected]> --------- Signed-off-by: Pawel Gadzinski <[email protected]> Co-authored-by: Pawel Gadzinski <[email protected]>
* fix detection of 3 in 3hd/h3d layouts Signed-off-by: Charlene Yang <[email protected]> * error out when invalid layout group is provided Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
skip FP8 CP tests if hardware does not support FP8 Signed-off-by: Xiaowei Ren <[email protected]>
Add pool argument to make_graphed_callable Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Fix Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
…with offsets (#1220) * Removing the unused options from GroupedLinear docs and fixing the bug with offsets Signed-off-by: Przemyslaw Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * offsets -> fp8_meta_offsets Signed-off-by: Przemyslaw Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemyslaw Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
move block_table arg to varlen_func section Signed-off-by: Charlene Yang <[email protected]>
* CPU perf optimization in linear autograd function Avoid enable_grad context when possible in cast function. Cache distributed group properties. Signed-off-by: Tim Moon <[email protected]> * CPU perf optimization in prepare_forward function Avoid torch.nn.Module impl of __setattr__. Signed-off-by: Tim Moon <[email protected]> * Avoid module import in TE module forwards Signed-off-by: Tim Moon <[email protected]> * Use fast getter for params Signed-off-by: Tim Moon <[email protected]> * Reuse tensor dims in linear autograd func Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply optimizations to grouped linear Signed-off-by: Tim Moon <[email protected]> * Debug test failures Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by: Tim Moon <[email protected]> * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Avoid deepcopy in tests Signed-off-by: Tim Moon <[email protected]> * Move _fast_setattr logic to __setattr__ method Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Emmanuel Ferdman <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* Tests for distributed Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added the test to the qa script Signed-off-by: Pawel Gadzinski <[email protected]> * Changed qa Signed-off-by: Pawel Gadzinski <[email protected]> * fix to test_numerics file Signed-off-by: Pawel Gadzinski <[email protected]> * pr fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/distributed/run_numerics.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]> --------- Signed-off-by: Pawel Gadzinski <[email protected]> Signed-off-by: Pawel Gadzinski <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]> Co-authored-by: Pawel Gadzinski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pawel Gadzinski <[email protected]> Co-authored-by: Tim Moon <[email protected]>
* change API for hierarchical CP Signed-off-by: Xiaowei Ren <[email protected]> * move fp8 code before qkv reshape Signed-off-by: Xiaowei Ren <[email protected]> * try to insert A2A for hierarchical CP Signed-off-by: Xiaowei Ren <[email protected]> * make fwd work Signed-off-by: Xiaowei Ren <[email protected]> * remove a redundant sync Signed-off-by: Xiaowei Ren <[email protected]> * make bwd of hierarchical CP work Signed-off-by: Xiaowei Ren <[email protected]> * fix dout a2a in bwd Signed-off-by: Xiaowei Ren <[email protected]> * fix q_f16 with fp8 Signed-off-by: Xiaowei Ren <[email protected]> * assert hierarchical CP implementation does not support THD format Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * assert hierarchical CP does not support attn bias Signed-off-by: Xiaowei Ren <[email protected]> * add unit test for hierarchical CP Signed-off-by: Xiaowei Ren <[email protected]> * fix cp_comm_type in unit test Signed-off-by: Xiaowei Ren <[email protected]> * bug fix and code cleaning Signed-off-by: Xiaowei Ren <[email protected]> * minor change Signed-off-by: Xiaowei Ren <[email protected]> * an assert info change Signed-off-by: Xiaowei Ren <[email protected]> * dout shape fix Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move function definitions to the front of the first call Signed-off-by: Xiaowei Ren <[email protected]> * fix tensor view comments Signed-off-by: Xiaowei Ren <[email protected]> * refine CP unit test Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> * save cp_size_a2a and rank_a2a in fwd Signed-off-by: Xiaowei Ren <[email protected]> * add more explainations of cp_group in doc_string Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Expose JAX sliding window attn API Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No SWA in context parallel; fix RNG seed in test Signed-off-by: Hua Huang <[email protected]> * Handle SAW API discrepancy in cuDNN and Python Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add SAW API for flax, all tests passed Will update tests/jax/test_praxis_layers.py next Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_praxis_layers.py for SWA, test passed Signed-off-by: Hua Huang <[email protected]> * Use tuple window_size; update for PR #1212 Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add and adjust some pytest.skip Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revised following Reese Wang's comments Still need further debugging: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: These errors does not exist in the previous commit Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix no-SWA test case errors in previous commit Signed-off-by: Hua Huang <[email protected]> * Add Padding mask w/ sliding windows sanity tests Signed-off-by: Reese Wang <[email protected]> * Use float32 for the reference code softmax calculation Signed-off-by: Reese Wang <[email protected]> --------- Signed-off-by: Hua Huang <[email protected]> Signed-off-by: Reese Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Reese Wang <[email protected]>
* Fixes to Float8Tensor Signed-off-by: Przemyslaw Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemyslaw Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix bug in torch compile and seqdim is integer Signed-off-by: 李金梁 <[email protected]> * Update attention.py change the jit_fuser to torch.compile on flash_attn_fwd_out_correction Signed-off-by: 李金梁 <[email protected]> * Annotate fused functions Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: 李金梁 <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* fa2 function import renaming Signed-off-by: Xiaowei Ren <[email protected]> * refine fa_fwd_kwargs and fa_bwd_kwargs Signed-off-by: Xiaowei Ren <[email protected]> * import FA3 fucntions for CP Signed-off-by: Xiaowei Ren <[email protected]> * fix output of FA3 fwd Signed-off-by: Xiaowei Ren <[email protected]> * fix rng_state in a2a implementation with FA3 Signed-off-by: Xiaowei Ren <[email protected]> * hack lse correction for packed lse format Signed-off-by: Xiaowei Ren <[email protected]> * make CP thd out correction work with packed lse Signed-off-by: Xiaowei Ren <[email protected]> * fix for packed softmax_lse Signed-off-by: Xiaowei Ren <[email protected]> * fix softmax_lse shape Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change lse_packed to constexpr Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Xiaowei Ren <[email protected]> Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <[email protected]>
* Let Fused RoPE support THD with CP Signed-off-by: Xin Yao <[email protected]> * add comment Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Xiaowei Ren <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
#1227) Update test to check support for context parallel attention. Signed-off-by: Michael Goldfarb <[email protected]>
* Create README.md added all PyT examples Signed-off-by: Santosh Bhavani <[email protected]> * Update README.md - Added JAX, PaddlePaddle, and third-party examples - Fixed DL framework links - Removed issue request for new PRs Signed-off-by: Santosh Bhavani <[email protected]> --------- Signed-off-by: Santosh Bhavani <[email protected]> Signed-off-by: Santosh Bhavani <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* Build custom ORT ops before running ONNX tests Signed-off-by: Tim Moon <[email protected]> * Remove ONNX from context parallelism tests Signed-off-by: Tim Moon <[email protected]> * Export ONNX ops that do compute in FP32 Matches internal impl of TE kernels. Signed-off-by: Tim Moon <[email protected]> * Add build script for custom ORT ops Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
fixed assertion bug for SWA Signed-off-by: Md Fahim Faysal Khan <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Phuong Nguyen <[email protected]>
* WIP: make FA2 optional Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: fix logic Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweak Signed-off-by: Charlene Yang <[email protected]> * add L1 test to test all supported FA versions Signed-off-by: Charlene Yang <[email protected]> * update version to 2.1.1 and trim L1 tests Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update onnxruntime version Signed-off-by: Charlene Yang <[email protected]> * remove onnxruntime from L1 FA versions tests Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Upgrade pylint and first round formatting Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * round 2 Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * round 3 Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Format and fixes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Paddle lint Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Reviews Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * FIxes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * More linting Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Run formatter Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Paddle lint Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fixes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Fix FP8 activation recompute Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Przemyslaw Tredak <[email protected]>
#1258) Fix wgrad for GroupedLinear when weights doesn't require grad Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* fix bias for 0-dim tensor Signed-off-by: Xin Yao <[email protected]> * add check Signed-off-by: Xin Yao <[email protected]> * use numel() instead of nullptr Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]>
* register CmdBufferCompatible traits via C++ API * renamed FFI_Traits * use register_ffi_target() --------- Signed-off-by: Phuong Nguyen <[email protected]>
fix seq_dim in CP implementation Signed-off-by: Xiaowei Ren <[email protected]>
* Reorganize PyTorch L1 tests Signed-off-by: Tim Moon <[email protected]> * Move ONNX tests to L1 Signed-off-by: Tim Moon <[email protected]> * Move FA version test to L3 Signed-off-by: Tim Moon <[email protected]> * Limit parallel build jobs in FA version test Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
* Debug wheel test for PaddlePaddle Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
Remove PyTorch L0 distributed test Forgot to remove in #1255. Signed-off-by: Tim Moon <[email protected]>
remove one FA version in the L3 test Signed-off-by: Charlene Yang <[email protected]>
…1230) * Use 64-bit offsets for cuDNN 9.5+ * Align workspace tensors to 16B. * Fix bug where std::accumulate overflowed on large tensor shapes. * Only support 64-bit offsets on arbitrary sequence length fp16 backend. Signed-off-by: Michael Goldfarb <[email protected]>
* Skip encoder tests on V100 * Fix mulitprocessing jax.distributed.init * Remove XLA xla_gpu_deterministic_ops which causes segfault --------- Signed-off-by: Reese Wang <[email protected]>
Add THD + GQA supports for cuDNN >= 9.6 Signed-off-by: Reese Wang <[email protected]>
…rics check in unit tests (#1282) Fix correctness of JAX fused attention with CP. Signed-off-by: Michael Goldfarb <[email protected]>
…, ActLuFP8, LayerNormForwardFP8FFI, and LayerNormBackwardFFI (#1263) * Add TransposeFFI, test passed Signed-off-by: Hua Huang <[email protected]> * Add ActLuFP8FFI; fix TransposeFFI Signed-off-by: Hua Huang <[email protected]> * Add QuantizeFFI Signed-off-by: Hua Huang <[email protected]> * Add FusedAttnForwardFFI and some unit tests Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor fix Signed-off-by: Hua Huang <[email protected]> * Add LayerNormForwardFP8FFI & LayerNormBackwardFFI Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revise FusedAttnForwardFFI() Signed-off-by: Hua Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FFI_CudaGraph_Traits All tests passed, ready for merge Signed-off-by: Hua Huang <[email protected]> * Bug fix for FFI data type mismatch Also add a safeguard on the entrance to FFI function Signed-off-by: Hua Huang <[email protected]> --------- Signed-off-by: Hua Huang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Update class names for Paddle 3.0 Signed-off-by: Tim Moon <[email protected]>
* update test numerics Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update test numerics Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update test numerics Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/test_numerics.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]> * tests fix Signed-off-by: Pawel Gadzinski <[email protected]> * Not passing CI fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Not passing CI fixes Signed-off-by: Pawel Gadzinski <[email protected]> * Fix key Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * fixes Signed-off-by: Pawel Gadzinski <[email protected]> --------- Signed-off-by: Pawel Gadzinski <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Pawel Gadzinski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )