Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "Propagate reshapes through generics with reduction… (#18968) #19113

Merged
merged 7 commits into from
Jan 8, 2025

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Nov 12, 2024

Reland after fixing sdxl int8 regressions via #19012.

Running CI revealed further performance regressions that have pending patches: #19325 and #19326.

This reverts commit 8d3faf8.

@IanWood1 IanWood1 marked this pull request as ready for review November 12, 2024 19:29
@IanWood1 IanWood1 requested review from nithinsubbiah and removed request for ScottTodd November 12, 2024 19:29
hanhanW
hanhanW previously approved these changes Nov 12, 2024
@IanWood1
Copy link
Contributor Author

Waiting to rebase on top of #19088, then will land

@MaheshRavishankar
Copy link
Contributor

Am I reading this right? The dispatch count seems to have regressed?

@hanhanW
Copy link
Contributor

hanhanW commented Nov 14, 2024

Am I reading this right? The dispatch count seems to have regressed?

246 -> 245 looks like an improvment?

image

@IanWood1
Copy link
Contributor Author

Am I reading this right? The dispatch count seems to have regressed?

246 -> 245 looks like an improvment?
image

But the new benchmarks seem to be regressing. I'm not sure why since there was no effect on the old unet benchs

@MaheshRavishankar
Copy link
Contributor

Try rebasing

@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch from 6383ab6 to 4765737 Compare November 14, 2024 03:39
@IanWood1
Copy link
Contributor Author

Try rebasing

I'll try that. I was getting some weird dispatch count results when testing locally with main

@IanWood1
Copy link
Contributor Author

I think it is all coming from:

%extracted_slice_237 = tensor.extract_slice %152[0, 0, 0] [2, 4096, 2560] [1, 1, 1] : tensor<2x4096x5120xf16> to tensor<2x4096x2560xf16>
%extracted_slice_238 = tensor.extract_slice %152[0, 0, 2560] [2, 4096, 2560] [1, 1, 1] : tensor<2x4096x5120xf16> to tensor<2x4096x2560xf16>
%expanded_239 = tensor.expand_shape %extracted_slice_237 [[0], [1, 2], [3]] output_shape [2, 64, 64, 2560] : tensor<2x4096x2560xf16> into tensor<2x64x64x2560xf16>
%expanded_240 = tensor.expand_shape %extracted_slice_238 [[0], [1, 2], [3]] output_shape [2, 64, 64, 2560] : tensor<2x4096x2560xf16> into tensor<2x64x64x2560xf16>

Which could be cloned into the dispatch if the expand_shape wasn't blocking

@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch 2 times, most recently from 62bdaef to 98aa2a5 Compare November 18, 2024 20:32
@IanWood1
Copy link
Contributor Author

IanWood1 commented Nov 19, 2024

Looks like there is some bad codegen when reductions aren't collapsed:

  1. Slow dispatch -- expanded ~164us
  2. Fast dispatch -- collapsed ~ 79us

@MaheshRavishankar
Copy link
Contributor

That's interesting, but shouldn't the collapse dimension fold those back?

@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch from 8409122 to ef99f8d Compare November 19, 2024 03:36
@IanWood1
Copy link
Contributor Author

IanWood1 commented Nov 19, 2024

That's interesting, but shouldn't the collapse dimension fold those back?

Yeah, I had to change it a bit to make isEligibleForCollapse less restrictive, as well as modifying getCollapsibleLoops so that it considered the results of all indexing_maps (not just the first one). Before force pushing, it seemed to regain perf (I just fixed the failing lit tests). hopefully this resolves all perf problems🤞

@IanWood1 IanWood1 requested a review from hanhanW November 27, 2024 18:19
@IanWood1
Copy link
Contributor Author

@hanhanW would you be able to look at this again? Only the latest commit is new since last review

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanhanW would you be able to look at this again? Only the latest commit is new since last review

The PR title and description need to be updated. It is not just reapply a reverted commit. btw, should we create a separate PR for the last commit?

@hanhanW hanhanW dismissed their stale review November 27, 2024 18:37

remove my approval because of new changes.

@IanWood1
Copy link
Contributor Author

@hanhanW would you be able to look at this again? Only the latest commit is new since last review

The PR title and description need to be updated. It is not just reapply a reverted commit. btw, should we create a separate PR for the last commit?

That makes sense, this change had several perf regressions that needed to get fixed. Would you like me to create a new PR with the changes (with your comments applied) and then after landing I would rebase and force push so that this only contains the revert commit?

@hanhanW
Copy link
Contributor

hanhanW commented Nov 27, 2024

That makes sense, this change had several perf regressions that needed to get fixed. Would you like me to create a new PR with the changes (with your comments applied) and then after landing I would rebase and force push so that this only contains the revert commit?

Yes, that'd be great; it makes the codebase state and commit tracking better!

@IanWood1
Copy link
Contributor Author

IanWood1 commented Nov 27, 2024

I created 2 new PRs for each of the separate fixes and updated the description of this PR. I'll rebase this branch after those have landed so that this only serves as a revert.

Before force pushing, the previous HEAD was ef99f8d

IanWood1 added a commit that referenced this pull request Dec 3, 2024
This is the 1/2 changes needed to reland
#18857 (with an open PR
#19113).


Adds pattern to bubble up expand shape through extract slice. i.e
`expand(extract)` to `extract(expand)`. This only supports the case
where the expanded dimensions are not modified by the extract slice and
there are no dynamic dimensions.

This is important because `tensor.expand_shape` ops _cannot be cloned_
while `tensor.extract_slice` ops _can be cloned_. So, if the
`expand_shape` gets stuck on the bottom of the `extract_slice` it will
block it from being cloned and the `extract_slice` will have to be put
into its own dispatch.

---------

Signed-off-by: Ian Wood <[email protected]>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…#19325)

This is the 1/2 changes needed to reland
iree-org#18857 (with an open PR
iree-org#19113).

Adds pattern to bubble up expand shape through extract slice. i.e
`expand(extract)` to `extract(expand)`. This only supports the case
where the expanded dimensions are not modified by the extract slice and
there are no dynamic dimensions.

This is important because `tensor.expand_shape` ops _cannot be cloned_
while `tensor.extract_slice` ops _can be cloned_. So, if the
`expand_shape` gets stuck on the bottom of the `extract_slice` it will
block it from being cloned and the `extract_slice` will have to be put
into its own dispatch.

---------

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
IanWood1 added a commit that referenced this pull request Dec 5, 2024
This is the 2/2 changes needed to reland
#18857 (open PR
#19113).


There are 2 small subchanges in this pr:
- Refactor to lambda `isPossiblySoftmax` and tweak the condition to
check for operand indexing maps that are projected permutations but not
permutations.
- Extend `getCollapsibleLoops` look at all operations to get contiguous
loops.

Also, I added test case that needed both these changes to pass (without
being collapsed it was causing regressions on the GPU backend).

---------

Signed-off-by: Ian Wood <[email protected]>
@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch 2 times, most recently from 044da91 to 1a747d5 Compare December 5, 2024 21:27
@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch from 1a747d5 to e609b15 Compare December 7, 2024 00:15
@IanWood1
Copy link
Contributor Author

This is failing toy llama due to the extract_slice in this dispatch. I manually folded it away (modified dispatch) and it compiled. I'm not currently sure where this should be fixed

@MaheshRavishankar
Copy link
Contributor

Don't we have a pattern to fold slices away with producers... The bubble up extract-slices pass? That should take care of it

@IanWood1
Copy link
Contributor Author

Don't we have a pattern to fold slices away with producers... The bubble up extract-slices pass? That should take care of it

It looks like we have the fill pattern, it was just blocked by a reshape (I added a test as an example). Adding fill's canonicalization patterns seems to fix the problem

@MaheshRavishankar
Copy link
Contributor

Great! I suggest running the connect test suites with this change as well before landing

@IanWood1
Copy link
Contributor Author

IanWood1 commented Jan 2, 2025

@IanWood1
Copy link
Contributor Author

IanWood1 commented Jan 3, 2025

This caused ~20 regressions on llvm-cpu and ~3 on rocm, I'm looking into that now

IanWood1 added a commit that referenced this pull request Jan 6, 2025
#19113 uncovered some problems with
the logic in this pass.

Fixes two problems:
1. If a consumer cannot be collapsed, producers can only collapse
dimensions not touched by the consumer
2. When updating which consumer loops can be collapsed, the
reassociation of the producer must be taken into account since its
possible they are not all contiguous (e.g. a transpose on an input).
This is the same logic as in `updateFromConsumer`

---------

Signed-off-by: Ian Wood <[email protected]>
@IanWood1
Copy link
Contributor Author

IanWood1 commented Jan 6, 2025

Merging to pickup #19598, rerunning the action to make sure everything has been cleared up https://github.com/nod-ai/SHARK-TestSuite/actions/runs/12639855078

@IanWood1 IanWood1 merged commit a5c3879 into iree-org:main Jan 8, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants