Lowering matmul_transpose_a with pack-peel-4-level-tiling #1036

newling · 2025-01-16T20:55:33Z

Can we use another dma dimension in L1:

The issue (symptom) is that before lower-to-aie, there is a connection used by 2 copies

controlcode { 
...
%19 = amdaie.npu.circular_dma_cpy_nd %connection_B_10([0, 0, 0] [64, 8, 4] [4, 256, 1], [0, 0] [64, 32] [32, 1])  
...
%23 = amdaie.npu.circular_dma_cpy_nd %connection_B_10([0, 0, 0] [64, 8, 4] [4, 256, 1], [1, 0, 0] [1, 64, 32] [2048, 32, 1]) 
...
}

which is not allowed (each copy must have its own connection). But why are these 2 copies above not merged in the preceding iree-amdaie-dma-composition pass ? Indeed they should can be combined to

%21 = amdaie.npu.circular_dma_cpy_nd %connection_A_21([0, 0, 0, 0] [2, 64, 8, 4] [0, 4, 256, 1], [0, 0] [128, 32] [32, 1])

but they are not because the maximum number of dimensions reported here for the target side is 3. Which means they cannot be combined, because the number of target dimensions after merging is 4 (see above, created by relaxing the maximum number of allowed dimensions).

This copy/connection is going from L2 to L1 -- are there really only 3 available dma channels in L1? I'm a bit confused about this, specifically about the availability of the 'inter' dimensions. It seems like there is 1 'inter' dim at all levels of the hierarchy (see here) but it is not usable in all situations (see here).

Ideally I would be able to use one more channel for this use case. It does seem to work (gives numerically correct result).

Alternative to increasing number of channels available:

If we can only use 3 dims, I'm fairly confident that the dma copies from L3 -> L2 -> L1 are using more permutations than needed, and the packing can be 'linearized' which would mean we don't need as many dma dimensions.

The text was updated successfully, but these errors were encountered:

newling mentioned this issue Jan 16, 2025

[CombineStridedOps] Generalize dimension checking #1032

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lowering matmul_transpose_a with pack-peel-4-level-tiling #1036

Lowering matmul_transpose_a with pack-peel-4-level-tiling #1036

newling commented Jan 16, 2025

Lowering matmul_transpose_a with pack-peel-4-level-tiling #1036

Lowering matmul_transpose_a with pack-peel-4-level-tiling #1036

Comments

newling commented Jan 16, 2025

Can we use another dma dimension in L1:

Alternative to increasing number of channels available: