Untilize with unpadding only supports parallelization over the height #17537

nardoTT · 2025-02-04T14:46:34Z

Describe the bug
The untilize with unpadding operation parallelizes the tensors along the height. Wide tensors are mapped to few cores only which affects performance

To Reproduce
Profile any wide tensor with untilize with unpadding operation and check the number of cores

Expected behavior
Using more cores for wide tensors

…adding (#17538) ### Ticket Link to Github Issue #17537 ### Problem description Currently, the untilize with unpadding implementation supports parallelization only along the height dimension. This affects perf for wide tensors, as they are mapped to a limited number of cores. ### What's changed In this PR, we introduce support for parallelizing the untiling operation along the width dimension, similar to tilize with padding. The operation executes the parallelization over the dimension with the larger number of tiles. In future versions: - we want the operation to support the parallelization along both dimensions simultaneously - we want the compute kernel to support the processing of an entire column block at once instead of one tile at a time For the tests added in test_to_layout.py, the kernel duration of the previous implementation is around 1.8 to 24.8 times larger than the current implementation   ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/13121055787 - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes

nardoTT added the bug Something isn't working label Feb 4, 2025

nardoTT self-assigned this Feb 4, 2025

nardoTT mentioned this issue Feb 4, 2025

Add support for parallelization along the width for untilize with unpadding #17538

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Untilize with unpadding only supports parallelization over the height #17537

Untilize with unpadding only supports parallelization over the height #17537

nardoTT commented Feb 4, 2025 •

edited

Loading

Untilize with unpadding only supports parallelization over the height #17537

Untilize with unpadding only supports parallelization over the height #17537

Comments

nardoTT commented Feb 4, 2025 • edited Loading

nardoTT commented Feb 4, 2025 •

edited

Loading