Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime][python] Add debug sink to bindings #19013

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

sogartar
Copy link
Contributor

@sogartar sogartar commented Nov 4, 2024

We don't support custom debug sinks in the Runtime Python bindings.
In particular the ability to register a custom callback when tracing
tensors.

This change makes it possible to create a HAL module with a Python
function as a callback.
This implementation does not handle the case of referencing directly or
indirectly the HAL module, VM context or VM instance in the callback
function object. In such a scenario the circular reference will not be
collected by the garbage collector and will leak. No no check is done
to guard against this. It is possible to traverse the Python object
structure to detect a reference to VM objects but it would require more
effort.

Here is added a callback to the debug sink in the IREE native runtime
API that signals when the runtime is done using the debug sink.
We need this since the Python objects corresponding to native runtime
objects are ephemeral and can not be used to hold the reference to the
debug sink.

@sogartar
Copy link
Contributor Author

sogartar commented Nov 4, 2024

This PR plumbs through the Python bindings the trace callback recently added by #18966.

@sogartar sogartar force-pushed the hal-module-sink-python-bindings branch from 8b2ec54 to c060bb2 Compare November 4, 2024 19:41
@sogartar sogartar requested review from benvanik and stellaraccident and removed request for stellaraccident November 4, 2024 20:42
@sogartar sogartar marked this pull request as draft November 4, 2024 20:48
@sogartar sogartar marked this pull request as ready for review November 4, 2024 21:23
@sogartar sogartar requested a review from rsuderman November 4, 2024 21:24
@sogartar
Copy link
Contributor Author

sogartar commented Nov 5, 2024

In its current state, I found out that one needs to hold the HAL module python object in order for the debug sink context to not get deleted, which is not what I wanted. I wanted once registered in IREE, the HAL module to continue be a shared owner of the debug sink. I guess the C++ objects for the bindings gets destroyed and created on demand from the underlying IREE runtime objects.
I will need to rethink how to handle this.

@sogartar sogartar marked this pull request as draft November 5, 2024 19:50
// Retain a reference. We want the callback to be valid after
// the user has dropped its reference and not burden the user
// with lifetime management.
vm_module.SetHalModuleDebugSink(*debug_sink);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think a bit carefully about this: The underlying VM module can have a lifetime that exceeds the Python level wrapper for it. Reviewing the rest of the patch and will see if I can spot a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with 2 solutions.

  1. Burden the user to guarantee that the callback function outlives the HAL module by holding a reference to it. In general not a good solution in Python.
  2. In order to bind the lifetime of the Python callback object to the HAL module we need to be able to register hooks with the module where it gets called on destruction. This burdens the IREE runtime with more stuff. Maybe not desirable. Instead of doing this for every VM module maybe at least the IREE runtime instance can have on-destruction hooks. The general problem is that we don't know when stuff gets destroyed.

Copy link
Contributor Author

@sogartar sogartar Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to add ownership of the debug sink to the VM context when the HAL module is registered there.
Then as long as the Python object of the VM context lives the debug sink will live.
This solution suffers from the problem of interop between the the Python bindings and some other interaction with the runtime. For example if we take the native handle from the Python VM context object and start using it while we destroy the Python object. I am not sure currently if the the bindings support this interaction, but I may be a future use case.

@sogartar
Copy link
Contributor Author

sogartar commented Dec 3, 2024

I don't know how to reconcile the various desirable aspects. They seem contradictory.

  1. Be able to pass a Python callable for the callback.
  2. Do not burden the user with object lifetime management. Requiring them to hold the debug sink object while the HAL module is used or having them provide information if the callable would introduce cyclic dependencies.
  3. Do not use traversal to detect reference cycles.

I think 1. is the whole point of having the flexibility that the runtime offers and to expose it up the stack. An alternative is to have a preset of implementations that we know don't create the cyclic dependency problem.
If we violate 2. partially, we can have the user specify who is to manage the callback in the API. Whether the bindings should share ownership. Then if the callback is problematic the user should manage it. Thus avoiding the cycle.

@stellaraccident
Copy link
Collaborator

stellaraccident commented Dec 3, 2024

I think the balance that is typically used is your option 1 but where the callback explicitly takes as arguments the objects it will typically need to close over to function. Then make it safe if the user closes over too much but discourage this as unoptimal. This would let the user (or us as part of utilities) to have a global create_foo_sink(output_path) which doesn't close over anything but the output resource. If the user does it this way, then destruction is completely deterministic. Maintaining that property as an option is what you are after, and then providing some protection for cycles (but the cost will be non deterministic destruction).

@stellaraccident
Copy link
Collaborator

You find variants of this in python's asyncio and async executor mechanisms where there is the recommended party of using closure-less callbacks.

We don't support custom debug sinks in the Runtime Python bindings.
In particular the ability to register a custom callback when tracing
tensors.

This change makes it possible to create a HAL module with a Python
function as a callback.
This implementation does not handle the case of referencing directly or
indirectly the HAL module, VM context or VM instance in the callback
function object. In such a scenario the circular reference will not be
collected by the garbage collector and will leak. No no check is done
to guard against this. It is possible to traverse the Python object
structure to detect a reference to VM objects but it would require more
effort.

Here is added a callback to the debug sink in the IREE native runtime
API that signals when the runtime is done using the debug sink.
We need this since the Python objects corresponding to native runtime
objects are ephemeral and can not be used to hold the reference to the
debug sink.

Signed-off-by: Boian Petkantchin <[email protected]>
@sogartar sogartar force-pushed the hal-module-sink-python-bindings branch from 4286e78 to ca30efc Compare January 6, 2025 19:45
@sogartar
Copy link
Contributor Author

sogartar commented Jan 6, 2025

@stellaraccident, I updated this PR to handle only callbacks that do not reference the relevant VM object. No effort is made to detect and prevent such cycles. At some future point we could improve on that.

@sogartar sogartar marked this pull request as ready for review January 6, 2025 19:51
@sogartar sogartar force-pushed the hal-module-sink-python-bindings branch from f7ecea0 to ff5d1bb Compare January 7, 2025 15:24
@sogartar sogartar enabled auto-merge (squash) January 7, 2025 15:26
@sogartar sogartar merged commit d517661 into iree-org:main Jan 7, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants