Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTIR][TTNN] MLIR compiler locations #1745

Open
sdjordjevicTT opened this issue Jan 10, 2025 · 7 comments
Open

[TTIR][TTNN] MLIR compiler locations #1745

sdjordjevicTT opened this issue Jan 10, 2025 · 7 comments
Assignees

Comments

@sdjordjevicTT
Copy link
Contributor

Create a hierarchy of MLIR locations during the passes that decompose and convert MLIR operations, enabling tracing back through the graphs for easier debugging.

@tapspatel
Copy link
Contributor

Design Considerations

  • Every children op needs a way to know whom are its parents
  • Every parent op needs a way to know whom are its children
  • Need to know the lowest level ttnn operation that produces the result for any higher level dialect operation
    For example, if dialect.op1 decomposes into [ttnn.subop1, ttnn.subop2, ttnn.deallocate, ttnn.subop3, ttnn.deallocate], need to know what is the actual child op which corresponds to the final computation result

What can passes do?

  • Passes can decompose an existing parent op into 1 or more children ops
  • Passes can fuse two or more existing parent ops into 1 or more children ops
  • Passes can add new ops (child with no parent)
  • Passes can remove existing parent ops

Here is a potential location naming design I was thinking about
Operations

  • If a pass translates an op into 1 or more children ops, each of those children ops inherit the loc of the parent
  • If a pass fuses two or more existing parent ops into 1 or more children ops
    • If the parent ops have the same loc, children ops inherit the same loc
    • If the parent ops have different locs, children ops inherit a combination of all the parent locs
  • If a pass adds new ops, ops will set their loc to some representation of the pass that added it

Inputs

  • If an op is inserted for an input op, it's loc will be set to the name of the input op

Outputs

  • If an op is inserted for an output op, it's loc will be set to the name of the input op

Special Operations
The following special operations will always set their loc to the pass that adds it

  • ToDevice
  • FromDevice
  • Deallocate

For an example, currently in the compiler, if we had the following sigmoid.mlir ttir module. Each op was generated in ttir and had a location set to it

module {
  func.func @test_sigmoid(%arg0: tensor<128x128xf32>) -> tensor<128x128xf32> {
    %0 = tensor.empty() : tensor<128x128xf32>
    %1 = "ttir.sigmoid"(%arg0, %0) <{operandSegmentSizes = array<i32: 1, 1>}> : (tensor<128x128xf32>, tensor<128x128xf32>) -> tensor<128x128xf32>
    return %1 : tensor<128x128xf32>
  }
}

It's decomposition (with loc data) looks like the following

Image

The design proposal would change it to the following

Image

@tapspatel
Copy link
Contributor

I added some useful code here tpatel/issue-1745

In golden_generator.py, you can run test_relu which will generate an mlir file from python infra (other ops also pybinded). def print_module(module): will print the module including the location data set within the module for each op. All passes have also been pybinded (see test_relu_decomp).

@tapspatel
Copy link
Contributor

You can see the total passes that are run on an mlir file via

ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=/code/tt-mlir/ttrt-artifacts/system_desc.ttsys" test_relu_ttir.mlir --dump-pass-pipeline     
Pass Manager with 12 passes:
builtin.module(ttir-to-ttir-decomposition,inline{default-pipeline=canonicalize inlining-threshold=4294967295 max-iterations=4 },ttir-load-system-desc{path=/code/tt-mlir/ttrt-artifacts/system_desc.ttsys},ttir-implicit-device{},ttir-broadcast-fold,ttnn-layout,convert-ttir-to-ttnn,remove-dead-values,ttnn-workaround{ttnn-enable-decomposition-workaround-pass=true ttnn-enable-layout-workaround-pass=true},canonicalize{  max-iterations=10 max-num-rewrites=-1 region-simplify=normal test-convergence=false top-down=true},ttnn-decompose-layouts,ttnn-deallocate)

All of these are pybinded into golden_generator.py to help see what the output of each decomposed pass is and what the op loc data looks like

@odjuricicTT
Copy link
Contributor

I'll add some requirements from optimizer and tt-explorer side:

  1. Op locations should be unique.
    Optimizer overrides use op location strings to identify a specific op. This does not work if multiple ops have the same location.

  2. Frontends should be able to pass multiple levels of location. E.g. llama op comes from "layer_1", "attention_module", "matmul_1". This info exists on forge-fe and is needed in order to be able to visualize the graph in tt-explorer.

Tho we might be stretching the usage of Locations for this, i'm open to using something different in the long run if it makes sense.

@sdjordjevicTT @tapspatel @azecevicTT

@sdjordjevicTT
Copy link
Contributor Author

Image

Pasting the image from our zoom whiteboard.

@azecevicTT
Copy link
Contributor

@tapspatel I've synced offline with @odjuricicTT, we agreed that IDs in location might be a bit of a stretch and that attributes that are added in some pass might be a better place for them.

Regarding your proposal, I will just reiterate through it with some implementation details, so we don't miss something before implementation.

If a pass translates an op into 1 or more children ops, each of those children ops inherit the loc of the

This is a 'default' case that we have right now in most (if not all) places. In the case of decomposition, the result of the last op in the chain of new ops should always be the result of a decomposed op, so I believe there isn't ambiguity in this case.

If a pass fuses two or more existing parent ops into 1 or more children ops
-If the parent ops have the same loc, children ops inherit the same loc
-If the parent ops have different locs, children ops inherit a combination of all the parent locs

For the second point, we can use built-in FusedLoc https://mlir.llvm.org/docs/Dialects/Builtin/#fusedloc. The question that remains is does the order in which they appear in FusedLoc matters, i.e. linear(a, b, c) = add(mamtul(a, b), c), so the loc(linear) = FusedLoc([loc(add), loc(matmul)]). If you want to trace it back it seems that order is important, but it will be the same (or reverse, depending on the way you look at it) as the order of original ops in IR.
For the first point, is this something that's functionally important for your use-case, or can you still trace back, with FusedLoc even when the locations of parents are the same?

If a pass adds new ops, ops will set their loc to some representation of the pass that added it

There is built-in NameLoc https://mlir.llvm.org/docs/Dialects/Builtin/#nameloc, that seems suitable for this case. We use it in some other places as well so it can become ambiguous, so my proposal is extend this class with something like PassOpsLoc (naming proposals are welcome), where we would set the name to the name of the pass that has added an op, and childLoc to loc that we are currently using. This way we can use RTTI to query information about the pass that has added an op. Can you confirm that this would be possible to do with Python bindings?

Inputs
If an op is inserted for an input op, it's loc will be set to the name of the input op
Outputs
If an op is inserted for an output op, it's loc will be set to the name of the input op

Can you elaborate more on this, I'm not sure I'm getting the point here. Ultimately frontend that lowers to TTIR sets the 'starting' location.

Special Operations
The following special operations will always set their loc to the pass that adds it
ToDevice
FromDevice
Deallocate

Do we still have to consider them special if we add the aforementioned PassOpsLoc?

@tapspatel
Copy link
Contributor

discussed questions with @azecevicTT offline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants