-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][compiled graphs] Introduce with_tensor_transport API #49753
Conversation
fdb34bc
to
7e4aedd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good so far!
ed4fc79
to
9be5af0
Compare
e8970ef
to
8df83b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
ad44154
to
99eb84b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments; the rest LGTM. We should monitor the benchmark results after this PR is merged.
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
b419d13
to
16750da
Compare
…ect#49753) Signed-off-by: Rui Qiao <[email protected]>
…ect#49753) Signed-off-by: Rui Qiao <[email protected]>
…ect#49753) Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: Anson Qian <[email protected]>
…ect#49753) Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: Anson Qian <[email protected]>
Why are these changes needed?
Currently to specify a torch tensor transport, we use
with_type_hint(TorchTensorType(transport="nccl"|"auto"|Communicator))
API, which is a bit verbose and non-intuitive.This PR introduces a new
with_tensor_transport()
API to replace the oldwith_type_hint()
API.It also supports an "auto" option for the "transport" arg to automatically decide whether to use shared memory, intra-process channel or NCCL as the transport, based on the node location and GPU assignments of the reader and writer actors.
Related issue number
Closes #47258
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.