-
Notifications
You must be signed in to change notification settings - Fork 6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'hjiang/fix-windows-dependency' of github.com:dentiny/ra…
…y into hjiang/fix-windows-dependency
- Loading branch information
Showing
14 changed files
with
370 additions
and
48 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
Execution and failure semantics | ||
=============================== | ||
|
||
Like classic Ray Core, Ray Compiled Graph propagates exceptions to the final output. | ||
In particular: | ||
|
||
- **Application exceptions**: If an application task throws an exception, Compiled Graph | ||
wraps the exception in a :class:`RayTaskError <ray.exceptions.RayTaskError>` and | ||
raises it when the caller calls :func:`ray.get() <ray.get>` on the result. The thrown | ||
exception inherits from both :class:`RayTaskError <ray.exceptions.RayTaskError>` | ||
and the original exception class. | ||
|
||
- **System exceptions**: System exceptions include actor death or unexpected errors | ||
such as network errors. For actor death, Compiled Graph raises a | ||
:class:`ActorDiedError <ray.exceptions.ActorDiedError>`, and for other errors, it | ||
raises a :class:`RayChannelError <ray.exceptions.RayChannelError>`. | ||
|
||
Ray Compiled Graph remains executable after application exceptions. However, Compiled Graph | ||
automatically shuts down in the case of system exceptions. If an actor's death causes | ||
the Compiled Graph to shut down, this shutdown doesn't affect the remaining actors. See the | ||
following code as an example: | ||
|
||
.. testcode:: | ||
|
||
import ray | ||
from ray.dag import InputNode, MultiOutputNode | ||
|
||
@ray.remote | ||
class EchoActor: | ||
def echo(self, msg): | ||
return msg | ||
|
||
actors = [EchoActor.remote() for _ in range(4)] | ||
with InputNode() as inp: | ||
outputs = [actor.echo.bind(inp) for actor in actors] | ||
dag = MultiOutputNode(outputs) | ||
|
||
compiled_dag = dag.experimental_compile() | ||
# Kill one of the actors to simulate unexpected actor death. | ||
ray.kill(actors[0]) | ||
ref = compiled_dag.execute(1) | ||
|
||
live_actors = [] | ||
try: | ||
ray.get(ref) | ||
except ray.exceptions.ActorDiedError: | ||
# At this point, the Compiled Graph is shutting down. | ||
for actor in actors: | ||
try: | ||
# Check for live actors. | ||
ray.get(actor.echo.remote("ping")) | ||
live_actors.append(actor) | ||
except ray.exceptions.RayActorError: | ||
pass | ||
|
||
# Optionally, use the live actors to create a new Compiled Graph. | ||
assert live_actors == actors[1:] | ||
|
||
Timeouts | ||
-------- | ||
|
||
Some errors, such as network errors, require additional handling to avoid hanging. | ||
To address these cases, Compiled Graph allows configurable timeouts for | ||
``compiled_dag.execute()`` and :func:`ray.get() <ray.get>`. | ||
|
||
The default timeout is 10 seconds for both. Set the following environment variables | ||
to change the default timeout: | ||
|
||
- ``RAY_CGRAPH_submit_timeout``: Timeout for ``compiled_dag.execute()``. | ||
- ``RAY_CGRAPH_get_timeout``: Timeout for :func:`ray.get() <ray.get>`. | ||
|
||
:func:`ray.get() <ray.get>` also has a timeout parameter to set timeout on a per-call basis. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
Overlap communication and computation | ||
====================================== | ||
|
||
Compiled Graph currently provides experimental support for GPU communication and computation overlap. When you turn this feature on, it automatically overlaps the GPU communication with computation operations, thereby hiding the communication overhead and improving performance. | ||
|
||
To enable this feature, specify ``_overlap_gpu_communication=True`` when calling ``dag.experimental_compile()``. | ||
|
||
The following code has GPU communication and computation operations that benefit | ||
from overlapping. | ||
|
||
.. testcode:: | ||
|
||
import ray | ||
import time | ||
import torch | ||
from ray.dag import InputNode, MultiOutputNode | ||
from ray.experimental.channel.torch_tensor_type import TorchTensorType | ||
from ray.air._internal import torch_utils | ||
|
||
@ray.remote(num_cpus=0, num_gpus=1) | ||
class TorchTensorWorker: | ||
def __init__(self): | ||
self.device = torch_utils.get_devices()[0] | ||
|
||
def send(self, shape, dtype, value: int, send_tensor=True): | ||
if not send_tensor: | ||
return 1 | ||
return torch.ones(shape, dtype=dtype, device=self.device) * value | ||
|
||
def recv_and_matmul(self, two_d_tensor): | ||
""" | ||
Receive the tensor and do some expensive computation (matmul). | ||
|
||
Args: | ||
two_d_tensor: a 2D tensor that has the same size for its dimensions | ||
""" | ||
# Check that tensor got loaded to the correct device. | ||
assert two_d_tensor.dim() == 2 | ||
assert two_d_tensor.size(0) == two_d_tensor.size(1) | ||
assert two_d_tensor.device == self.device | ||
torch.matmul(two_d_tensor, two_d_tensor) | ||
return (two_d_tensor[0][0].item(), two_d_tensor.shape, two_d_tensor.dtype) | ||
|
||
def test(overlap_gpu_communication): | ||
num_senders = 3 | ||
senders = [TorchTensorWorker.remote() for _ in range(num_senders)] | ||
receiver = TorchTensorWorker.remote() | ||
|
||
shape = (10000, 10000) | ||
dtype = torch.float16 | ||
|
||
with InputNode() as inp: | ||
branches = [sender.send.bind(shape, dtype, inp) for sender in senders] | ||
branches = [ | ||
branch.with_type_hint( | ||
TorchTensorType( | ||
transport="nccl", _static_shape=True, _direct_return=True | ||
) | ||
) | ||
for branch in branches | ||
] | ||
branches = [receiver.recv_and_matmul.bind(branch) for branch in branches] | ||
dag = MultiOutputNode(branches) | ||
|
||
compiled_dag = dag.experimental_compile( | ||
_overlap_gpu_communication=overlap_gpu_communication | ||
) | ||
|
||
start = time.monotonic() | ||
for i in range(5): | ||
ref = compiled_dag.execute(i) | ||
result = ray.get(ref) | ||
assert result == [(i, shape, dtype)] * num_senders | ||
duration = time.monotonic() - start | ||
print(f"{overlap_gpu_communication=}, {duration=}") | ||
|
||
if __name__ == "__main__": | ||
for overlap_gpu_communication in [False, True]: | ||
test(overlap_gpu_communication) | ||
|
||
The output of the preceding code includes the following two lines: | ||
|
||
.. testoutput:: | ||
|
||
overlap_gpu_communication=False, duration=1.0670117866247892 | ||
overlap_gpu_communication=True, duration=0.9211348341777921 | ||
|
||
The actual performance numbers may vary on different hardware, but enabling ``_overlap_gpu_communication`` improves latency by about 14% for this example. | ||
|
||
To verify that Compiled Graph overlaps the communication and computation operations, | ||
:ref:`visualize the execution schedule <execution-schedule>` by setting the environment variable | ||
``RAY_CGRAPH_VISUALIZE_SCHEDULE=1``. | ||
|
||
.. image:: ../../images/compiled_graph_schedule_overlap.png | ||
:alt: Execution Schedule with GPU Communication Overlap Enabled | ||
:align: center | ||
|
||
Red nodes denote the operations with different execution orders in the optimized schedule | ||
compared to the original order, due to ``_overlap_gpu_communication``. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.