diff --git a/doc/source/serve/getting_started.md b/doc/source/serve/getting_started.md index ff2620cc80524..0bbe4084f3e5e 100644 --- a/doc/source/serve/getting_started.md +++ b/doc/source/serve/getting_started.md @@ -101,6 +101,7 @@ parameters in the `@serve.deployment` decorator. The example configures a few co * `ray_actor_options`: a dictionary containing configuration options for each replica. * `num_cpus`: a float representing the logical number of CPUs each replica should reserve. You can make this a fraction to pack multiple replicas together on a machine with fewer CPUs than replicas. * `num_gpus`: a float representing the logical number of GPUs each replica should reserve. You can make this a fraction to pack multiple replicas together on a machine with fewer GPUs than replicas. + * `resources`: a dictionary containing other resource requirements for the replicate, such as non-GPU accelerators like HPUs or TPUs. All these parameters are optional, so feel free to omit them: diff --git a/doc/source/serve/resource-allocation.md b/doc/source/serve/resource-allocation.md index 57f580f2c3703..18df5a8181a4e 100644 --- a/doc/source/serve/resource-allocation.md +++ b/doc/source/serve/resource-allocation.md @@ -6,14 +6,14 @@ This guide helps you configure Ray Serve to: - Scale your deployments horizontally by specifying a number of replicas - Scale up and down automatically to react to changing traffic -- Allocate hardware resources (CPUs, GPUs, etc) for each deployment +- Allocate hardware resources (CPUs, GPUs, other accelerators, etc) for each deployment (serve-cpus-gpus)= -## Resource management (CPUs, GPUs) +## Resource management (CPUs, GPUs, accelerators) -You may want to specify a deployment's resource requirements to reserve cluster resources like GPUs. To assign hardware resources per replica, you can pass resource requirements to +You may want to specify a deployment's resource requirements to reserve cluster resources like GPUs or other accelerators. To assign hardware resources per replica, you can pass resource requirements to `ray_actor_options`. By default, each replica reserves one CPU. To learn about options to pass in, take a look at the [Resources with Actors guide](actor-resource-guide). @@ -27,6 +27,14 @@ def func(*args): return do_something_with_my_gpu() ``` +Or if you want to create a deployment where each replica uses another type of accelerator such as an HPU, follow the example below: + +```python +@serve.deployment(ray_actor_options={"resources": {"HPU": 1}}) +def func(*args): + return do_something_with_my_hpu() +``` + (serve-fractional-resources-guide)= ### Fractional CPUs and fractional GPUs