ray-project · edoakes · Nov 20, 2024 · Nov 19, 2024 · Nov 19, 2024 · Nov 19, 2024
@@ -101,6 +101,7 @@ parameters in the `@serve.deployment` decorator. The example configures a few co
 * `ray_actor_options`: a dictionary containing configuration options for each replica.
     * `num_cpus`: a float representing the logical number of CPUs each replica should reserve. You can make this a fraction to pack multiple replicas together on a machine with fewer CPUs than replicas.
     * `num_gpus`: a float representing the logical number of GPUs each replica should reserve. You can make this a fraction to pack multiple replicas together on a machine with fewer GPUs than replicas.
+    * `resources`: a dictionary containing other resource requirements for the replicate, such as non-GPU accelerators like HPUs or TPUs.
 
 All these parameters are optional, so feel free to omit them:
 

@@ -6,14 +6,14 @@ This guide helps you configure Ray Serve to:
 
 - Scale your deployments horizontally by specifying a number of replicas
 - Scale up and down automatically to react to changing traffic
-- Allocate hardware resources (CPUs, GPUs, etc) for each deployment
+- Allocate hardware resources (CPUs, GPUs, other accelerators, etc) for each deployment
 
 
 (serve-cpus-gpus)=
 
-## Resource management (CPUs, GPUs)
+## Resource management (CPUs, GPUs, accelerators)
 
-You may want to specify a deployment's resource requirements to reserve cluster resources like GPUs.  To assign hardware resources per replica, you can pass resource requirements to
+You may want to specify a deployment's resource requirements to reserve cluster resources like GPUs or other accelerators.  To assign hardware resources per replica, you can pass resource requirements to
 `ray_actor_options`.
 By default, each replica reserves one CPU.
 To learn about options to pass in, take a look at the [Resources with Actors guide](actor-resource-guide).
@@ -27,6 +27,14 @@ def func(*args):
     return do_something_with_my_gpu()
 ```
 
+Or if you want to create a deployment where each replica uses another type of accelerator such as an HPU, follow the example below:
+
+```python
+@serve.deployment(ray_actor_options={"resources": {"HPU": 1}})
+def func(*args):
+    return do_something_with_my_hpu()
+```
+
 (serve-fractional-resources-guide)=
 
 ### Fractional CPUs and fractional GPUs