Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][compiled graphs] Introduce with_tensor_transport API #49753

Merged
merged 6 commits into from
Jan 22, 2025

Conversation

ruisearch42
Copy link
Contributor

@ruisearch42 ruisearch42 commented Jan 10, 2025

Why are these changes needed?

Currently to specify a torch tensor transport, we use with_type_hint(TorchTensorType(transport="nccl"|"auto"|Communicator)) API, which is a bit verbose and non-intuitive.
This PR introduces a new with_tensor_transport() API to replace the old with_type_hint() API.

It also supports an "auto" option for the "transport" arg to automatically decide whether to use shared memory, intra-process channel or NCCL as the transport, based on the node location and GPU assignments of the reader and writer actors.

Related issue number

Closes #47258

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far!

python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
@ruisearch42 ruisearch42 marked this pull request as ready for review January 16, 2025 20:34
@ruisearch42 ruisearch42 changed the title [core][compiled graphs] Use new with_tensor_transport API [core][compiled graphs] Introduce new with_tensor_transport API Jan 16, 2025
@ruisearch42 ruisearch42 added the go add ONLY when ready to merge, run all tests label Jan 16, 2025
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial review

python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/dag_node.py Outdated Show resolved Hide resolved
@ruisearch42 ruisearch42 changed the title [core][compiled graphs] Introduce new with_tensor_transport API [core][compiled graphs] Introduce with_tensor_transport API Jan 21, 2025
@ruisearch42 ruisearch42 force-pushed the type_hint branch 2 times, most recently from ad44154 to 99eb84b Compare January 21, 2025 19:45
python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/dag_node.py Show resolved Hide resolved
python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
python/ray/experimental/channel/auto_channel_type.py Outdated Show resolved Hide resolved
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments; the rest LGTM. We should monitor the benchmark results after this PR is merged.

python/ray/dag/compiled_dag_node.py Show resolved Hide resolved
python/ray/dag/compiled_dag_node.py Show resolved Hide resolved
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
@jjyao jjyao merged commit d9898c3 into ray-project:master Jan 22, 2025
5 checks passed
win5923 pushed a commit to win5923/ray that referenced this pull request Jan 23, 2025
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Jan 23, 2025
anson627 pushed a commit to anson627/ray that referenced this pull request Jan 31, 2025
anson627 pushed a commit to anson627/ray that referenced this pull request Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aDAG] More intuitive API for (NCCL) type hints
4 participants