Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttir.convolution decomposition may insert unnecessary permute #1751

Closed
LPanosTT opened this issue Jan 10, 2025 · 3 comments
Closed

ttir.convolution decomposition may insert unnecessary permute #1751

LPanosTT opened this issue Jan 10, 2025 · 3 comments
Assignees

Comments

@LPanosTT
Copy link
Contributor

LPanosTT commented Jan 10, 2025

When decomposing ttir.convolution to ttir.conv2d, if the weights and/or input are already in channel-last format, we do not need to permute them. However, ttir.permute ops are still added. I.e.:

auto weight = rewriter.create<ttir::PermuteOp>(
        op.getLoc(), weightDPSOutput.getType(), adaptor.getWeight(),
        weightDPSOutput, kernelPermutation);

When kernelPermutation = {0, 1, 2, 3}. This would be solvable with a simple canonicalization pattern, but we probably shouldn't knowingly introduce ops that do nothing just because they'll be erased later.

We could add a bool checkIsNopPermutation(SmallVector<int> permutation) and replace the assignment of weight with

...
auto weight  = adaptor.getWeight()
if (!checkisNopPermutation(kernelPermutation)) {
    auto weightType =
        mlir::cast<RankedTensorType>(adaptor.getWeight().getType());
    auto kernelPermutation =
        generateConvKernelPermutation(op, conv2dKernelLayout);
    auto weightOutputShape = ::ttmlir::utils::applyPermutation(
        mlir::cast<RankedTensorType>(adaptor.getWeight().getType()).getShape(),
        kernelPermutation);
    auto weightDPSOutput = rewriter.create<tensor::EmptyOp>(
        op.getLoc(), weightOutputShape, weightType.getElementType());
    weight = rewriter.create<ttir::PermuteOp>(
        op.getLoc(), weightDPSOutput.getType(), weight,
        weightDPSOutput, kernelPermutation);
}
...

We could do the same with the input and output.

@LPanosTT LPanosTT assigned LPanosTT and azecevicTT and unassigned LPanosTT Jan 10, 2025
@LPanosTT
Copy link
Contributor Author

LPanosTT commented Jan 10, 2025

Assigned @azecevicTT because you were the last person who edited this part of the code. But if you want me to take it on I'll do it.

@azecevicTT
Copy link
Contributor

azecevicTT commented Jan 13, 2025

Yeah, I added that in #1670. I didn't want to merge it too early because of the holidays, so I will leave it open for one or two more days for review.
By the way, this is a very common theme, especially with data movement ops during conversion. All of them have well-defined identities, so identities should be folded by the respective op, not in the conversion of some other op. First, it's very error-prone to do it directly during conversion, and second, if the op changes the interface (like a broadcast op recently) the condition for folding an op can change and folding can become invalid, on the other hand, if it's done in one place it's much easier to spot an error (any such folding should be covered with test-case).
The same goes for direct erasure of ops during conversions. Ideally, every op should be responsible for itself and give relevant information about itself to the global context (like NoMemoryEffect/Pure). Otherwise, we are unconsciously creating strong interdependencies between ops themselves.

@LPanosTT
Copy link
Contributor Author

Makes Sense to me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants