-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for unpadded shapes in Matmul1D w/ gather_in0 #16627
Conversation
ttnn/cpp/ttnn/operations/prefetcher/prefetcher/device/dram_prefetcher_op.cpp
Outdated
Show resolved
Hide resolved
tests/tt_eager/python_api_testing/unit_testing/misc/test_matmul_1d_gather_in0.py
Outdated
Show resolved
Hide resolved
414e20d
to
a7aa7f3
Compare
ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp
Outdated
Show resolved
Hide resolved
bc32be0
to
1d004e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good!
...tnn/operations/matmul/device/kernels/dataflow/reader_bmm_tile_layout_in0_ring_all_gather.cpp
Outdated
Show resolved
Hide resolved
...erations/matmul/device/kernels/compute/bmm_large_block_zm_fused_bias_activation_gathered.cpp
Show resolved
Hide resolved
...tnn/operations/matmul/device/kernels/dataflow/reader_bmm_tile_layout_in0_ring_all_gather.cpp
Outdated
Show resolved
Hide resolved
ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp
Outdated
Show resolved
Hide resolved
72e3729
to
52628da
Compare
ttnn/cpp/ttnn/operations/prefetcher/prefetcher/device/dram_prefetcher_op.cpp
Outdated
Show resolved
Hide resolved
/* Inner dim padding */ | ||
const uint32_t Kt_pad = in0_buffer->shard_spec().shape()[1] / in0_tile.get_tile_shape()[1] * num_cores; | ||
in0_block_w = Kt_pad / num_cores; | ||
|
||
uint32_t num_blocks = Kt_pad / in0_block_w; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just use passed in in0_block_w
and derive num_blocks with shard spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in0_block_w also needs to change to be derived from the padded value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is just setting in0_block_w
to be shard width. Is this not what you are passing in?
const uint32_t Kt_pad = in0_buffer->shard_spec().shape()[1] / in0_tile.get_tile_shape()[1] * num_cores;
in0_block_w = Kt_pad / num_cores;
num_blocks
is basically unused in this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in0_block_w has to match with the unpadded K value, or else the other matmul validation fails. So that's why it needs to be updated here.
num_blocks
is passed into the computer kernel to determine number of block iterations.
ttnn/cpp/ttnn/operations/matmul/device/matmul_op_multi_core_reuse_mcast_1d_program_factory.cpp
Outdated
Show resolved
Hide resolved
f453ac9
to
3f13274
Compare
…6627) ### Ticket - #16626 ### Problem description In the current use case of Matmul1D with gather_in0 in the Llama models, the activations and weights need to be padded. This results in significant overhead. ### What's changed - Added support to skip part of in0_block_w that is padding information - Pad the Kt and Nt in the host code for gather_in0 ### Checklist - [x] Post commit CI passes (https://github.com/tenstorrent/tt-metal/actions/runs/12893880800) - [x] New/Existing tests provide coverage for changes (https://github.com/tenstorrent/tt-metal/actions/runs/12893883783)
Ticket
Problem description
In the current use case of Matmul1D with gather_in0 in the Llama models, the activations and weights need to be padded. This results in significant overhead.
What's changed
Checklist