Add support for unpadded shapes in Matmul1D w/ gather_in0 #16627

avoraTT · 2025-01-10T21:20:48Z

Ticket

Support unpadded shapes in matmul1d w/ gather_in0 #16626

Problem description

In the current use case of Matmul1D with gather_in0 in the Llama models, the activations and weights need to be padded. This results in significant overhead.

What's changed

Added support to skip part of in0_block_w that is padding information
Pad the Kt and Nt in the host code for gather_in0

Checklist

Post commit CI passes (https://github.com/tenstorrent/tt-metal/actions/runs/12893880800)
New/Existing tests provide coverage for changes (https://github.com/tenstorrent/tt-metal/actions/runs/12893883783)

ttnn/cpp/ttnn/operations/prefetcher/prefetcher/device/dram_prefetcher_op.cpp

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp

tests/tt_eager/python_api_testing/unit_testing/misc/test_matmul_1d_gather_in0.py

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp

ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp

yugaoTT

overall looks good!

...tnn/operations/matmul/device/kernels/dataflow/reader_bmm_tile_layout_in0_ring_all_gather.cpp

...erations/matmul/device/kernels/compute/bmm_large_block_zm_fused_bias_activation_gathered.cpp

...tnn/operations/matmul/device/kernels/dataflow/reader_bmm_tile_layout_in0_ring_all_gather.cpp

ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp

ttnn/cpp/ttnn/operations/prefetcher/prefetcher/device/dram_prefetcher_op.cpp

TT-BrianLiu · 2025-01-22T18:52:29Z

ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp

+    /* Inner dim padding */
+    const uint32_t Kt_pad = in0_buffer->shard_spec().shape()[1] / in0_tile.get_tile_shape()[1] * num_cores;
+    in0_block_w = Kt_pad / num_cores;
+
+    uint32_t num_blocks = Kt_pad / in0_block_w;


Should just use passed in in0_block_w and derive num_blocks with shard spec

in0_block_w also needs to change to be derived from the padded value

This code is just setting in0_block_w to be shard width. Is this not what you are passing in?

const uint32_t Kt_pad = in0_buffer->shard_spec().shape()[1] / in0_tile.get_tile_shape()[1] * num_cores; in0_block_w = Kt_pad / num_cores;

num_blocks is basically unused in this function

in0_block_w has to match with the unpadded K value, or else the other matmul validation fails. So that's why it needs to be updated here.

num_blocks is passed into the computer kernel to determine number of block iterations.

ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp

…_w's.

…6627) ### Ticket - #16626 ### Problem description In the current use case of Matmul1D with gather_in0 in the Llama models, the activations and weights need to be padded. This results in significant overhead. ### What's changed - Added support to skip part of in0_block_w that is padding information - Pad the Kt and Nt in the host code for gather_in0 ### Checklist - [x] Post commit CI passes (https://github.com/tenstorrent/tt-metal/actions/runs/12893880800) - [x] New/Existing tests provide coverage for changes (https://github.com/tenstorrent/tt-metal/actions/runs/12893883783)

avoraTT added the metal tt-metal issue label Jan 10, 2025

avoraTT self-assigned this Jan 10, 2025

avoraTT marked this pull request as ready for review January 10, 2025 21:38

avoraTT requested review from ayerofieiev-tt, dmakoviichuk-tt, rfurko-tt, cfjchu, TT-BrianLiu, razorback3, dongjin-na, bbradelTT, yugaoTT and asandhupatlaTT as code owners January 10, 2025 21:38

yugaoTT reviewed Jan 10, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/prefetcher/prefetcher/device/dram_prefetcher_op.cpp Outdated Show resolved Hide resolved

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp Outdated Show resolved Hide resolved

yugaoTT reviewed Jan 10, 2025

View reviewed changes

tests/tt_eager/python_api_testing/unit_testing/misc/test_matmul_1d_gather_in0.py Outdated Show resolved Hide resolved

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp Outdated Show resolved Hide resolved

avoraTT force-pushed the avora/mm_pad branch 3 times, most recently from 414e20d to a7aa7f3 Compare January 13, 2025 13:12

avoraTT requested a review from yugaoTT January 13, 2025 14:01

bbradelTT reviewed Jan 13, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp Outdated Show resolved Hide resolved

ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp Outdated Show resolved Hide resolved

avoraTT force-pushed the avora/mm_pad branch from 4356853 to 6884566 Compare January 14, 2025 12:58

avoraTT requested a review from bbradelTT January 14, 2025 12:58

bbradelTT reviewed Jan 14, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp Outdated Show resolved Hide resolved

avoraTT requested review from patrickroberts, yan-zaretskiy, eyonland, abhullar-tt, pgkeller, aliuTT and tt-aho as code owners January 14, 2025 18:43

avoraTT force-pushed the avora/mm_pad branch 2 times, most recently from bc32be0 to 1d004e5 Compare January 21, 2025 14:34

avoraTT requested review from TT-BrianLiu, bbradelTT and yugaoTT January 21, 2025 14:38

yugaoTT reviewed Jan 21, 2025

View reviewed changes

avoraTT force-pushed the avora/mm_pad branch 2 times, most recently from 72e3729 to 52628da Compare January 21, 2025 18:55

TT-BrianLiu reviewed Jan 21, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp Show resolved Hide resolved

avoraTT force-pushed the avora/mm_pad branch from 52628da to 7eac428 Compare January 22, 2025 13:48

bbradelTT approved these changes Jan 22, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/prefetcher/prefetcher/device/dram_prefetcher_op.cpp Outdated Show resolved Hide resolved

avoraTT force-pushed the avora/mm_pad branch from 7eac428 to f7a5b20 Compare January 22, 2025 17:12

TT-BrianLiu reviewed Jan 22, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp Outdated Show resolved Hide resolved

avoraTT force-pushed the avora/mm_pad branch 2 times, most recently from f453ac9 to 3f13274 Compare January 23, 2025 15:10

avoraTT requested a review from TT-BrianLiu January 23, 2025 15:36

avoraTT added 8 commits January 24, 2025 08:57

Add initial implementation that writes 0's inside the kernel to pad.

dd7bf9f

Add mechanism to skip matmul block computation for unpadded in0_block…

4537aea

…_w's.

Re-add local shard optimization.

f87bb4d

Add outer dim padding.

2d2b352

Add fix for TG pf tests. Add unpadded shapes in prefetcher tests.

c56a2a9

Update in0_block_w calculation in pytest.

4189795

Addressing comments.

4406191

Update validation for matmul and prefetcher.

b599226

avoraTT force-pushed the avora/mm_pad branch from 3f13274 to b599226 Compare January 24, 2025 13:57

TT-BrianLiu approved these changes Jan 24, 2025

View reviewed changes

avoraTT merged commit c8b0fa8 into main Jan 24, 2025
190 checks passed

avoraTT deleted the avora/mm_pad branch January 24, 2025 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for unpadded shapes in Matmul1D w/ gather_in0 #16627

Add support for unpadded shapes in Matmul1D w/ gather_in0 #16627

avoraTT commented Jan 10, 2025 •

edited

Loading

yugaoTT left a comment

TT-BrianLiu Jan 22, 2025

avoraTT Jan 23, 2025

TT-BrianLiu Jan 23, 2025

avoraTT Jan 23, 2025

Add support for unpadded shapes in Matmul1D w/ gather_in0 #16627

Add support for unpadded shapes in Matmul1D w/ gather_in0 #16627

Conversation

avoraTT commented Jan 10, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

yugaoTT left a comment

Choose a reason for hiding this comment

TT-BrianLiu Jan 22, 2025

Choose a reason for hiding this comment

avoraTT Jan 23, 2025

Choose a reason for hiding this comment

TT-BrianLiu Jan 23, 2025

Choose a reason for hiding this comment

avoraTT Jan 23, 2025

Choose a reason for hiding this comment

avoraTT commented Jan 10, 2025 •

edited

Loading