Skip to content

Commit

Permalink
add pre-requisite and improve formatting
Browse files Browse the repository at this point in the history
Signed-off-by: Andrew Sy Kim <[email protected]>
  • Loading branch information
andrewsykim committed Dec 4, 2024
1 parent 7638cf0 commit 9b12bff
Showing 1 changed file with 108 additions and 79 deletions.
187 changes: 108 additions & 79 deletions doc/source/cluster/kubernetes/user-guides/kuberay-auth.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,96 @@
(kuberay-auth)=

# Configure Ray cluster authentication and access control using Kubernetes RBAC
# Configure Ray Clusters with Authentication and Access Control using KubeRay

This guide demonstrates how to configure authentication and access control for Ray clusters using KubeRay and Kubernetes RBAC.
This guide demonstrates how to secure Ray clusters deployed with KubeRay by enabling authentication and access control using Kubernetes Role-Based Access Control (RBAC).

## Create a GKE cluster
## Prerequisites

Create a GKE cluster
```
* A Kubernetes cluster (this guide uses GKE, but the concepts apply to other Kubernetes distributions).
* `kubectl` installed and configured to interact with your cluster.
* `gcloud` CLI installed and configured (if using GKE).
* Helm installed.
* Ray installed locally.

## Create a GKE Cluster (or use an existing cluster)

If you don't have a Kubernetes cluster, create one using the following command (or adapt it for your cloud provider):

```bash
gcloud container clusters create kuberay-cluster \
--num-nodes=2 --zone=us-west1-b --machine-type e2-standard-4
```

## Install the KubeRay operator

Follow [Deploy a KubeRay operator](kuberay-operator-deploy) to install the latest stable KubeRay operator from the Helm repository.
## Install the KubeRay Operator

## Deploy a RayCluster with kube-rbac-proxy
Install the latest stable KubeRay operator using Helm:

Create a RayCluster resource running a kube-rbac-proxy sidecar container:
```bash
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator
```

See [Deploy a KubeRay operator](kuberay-operator-deploy) for more details on installing the KubeRay operator.

## Deploy a Ray Cluster with Authentication Enabled

Deploy a RayCluster configured with `kube-rbac-proxy` for authentication and authorization:

```bash
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/refs/heads/master/ray-operator/config/samples/ray-cluster.auth.yaml
```
```

This command deploys:
* A `RayCluster` resource with a `kube-rbac-proxy` sidecar container on the Head Pod. This proxy handles authentication and authorization.
* A `ConfigMap` for kube-rbac-proxy, containing resource attributes required for authorization.
* A `ServiceAccount`, `ClusterRole`, and `ClusterRoleBinding` that allow the `kube-rbac-proxy` to access the Kubernetes TokenReview and SubjectAccessReview APIs.

This step deploys the following:
1. A RayCluster resource running a kube-rbac-proxy sidecar container for authentication/authorization on the Head Pod.
2. A kube-rbac-proxy ConfigMap containing resource attributes required for authorization
3. A ServiceAccount, ClusterRole and ClusterRoleBinding used by kube-rbac-proxy to access the TokenReview and SubjectAcceessReview APIs
## Verify Initial Unauthorized Access

## Verify 401 unauthorized error when accessing Ray cluster
Attempt to submit a Ray job to the cluster to verify that authentication is required. You should receive a `401 Unauthorized` error:

```bash
kubectl port-forward svc/ray-cluster-with-auth-head-svc 8265:8265 &
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
```

You'll see an error similar to this:

Submit a Ray job and verify that 401 unauthorized error is returned
```
$ kubectl port-forward svc/ray-cluster-with-auth-head-svc 8265:8265 &
$ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
...
...
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://localhost:8265/api/version
```

## Configure Kubernetes RBAC for access control
This confirms that the Ray cluster requires authentication.

The following is required in order to access the RayCluster:
* **Authentication**: you must provide an authentication token (via headers) to authenticate your Kubernetes user
* **Authorization**: your user must be authorized (via Kubernetes RBAC) to access the RayCluster resource
## Configure Kubernetes RBAC for Access Control

In most Kubernetes offerings, you can authenticate to your Kubernetes cluster using your cloud user and authorization is granted by an admin
who can modify Kubernetes RBAC rules. This guide will use a Kubernetes service account to demonstrate how to grant access to an unauthorized
subject. The same steps can be applied to grant access to Kubernetes users.
To access the RayCluster, you need:
* **Authentication:** Provide a valid authentication token (e.g., a Kubernetes service account token or a cloud IAM token) in the request headers.
* **Authorization:** Your authenticated user or service account must have the necessary Kubernetes RBAC permissions to access the `RayCluster` resource.

Create a Kubernetes service account:
```
$ kubectl create serviceaccount ray-user
serviceaccount/ray-user created
This guide demonstrates granting access using a Kubernetes service account, but the same principles apply to individual Kubernetes users or cloud IAM users.

### Create a Kubernetes Service Account

Create a service account that will represent your Ray job submitter:

```bash
kubectl create serviceaccount ray-user
```

Verify that the service account cannot access RayCluster resources:
Confirm that the service account currently cannot access the `RayCluster` resource:

```bash
$ kubectl auth can-i get rayclusters.ray.io/ray-cluster-with-auth --as=system:serviceaccount:default:ray-user
no
kubectl auth can-i get rayclusters.ray.io/ray-cluster-with-auth --as=system:serviceaccount:default:ray-user
```

Grant the ServiceAccount access to the RayCluster resource used in this guide:
The output should be `no`.

### Grant Access using Kubernetes RBAC

Create a `Role` and `RoleBinding` to grant the necessary permissions to the `ray-user` service account:

```yaml
# ray-cluster-rbac.yaml
---
Expand All @@ -76,7 +105,7 @@ rules:
- 'rayclusters'
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ray-user
Expand All @@ -91,26 +120,45 @@ subjects:
namespace: default
```
Verify that the service account can access RayCluster resources
Apply the RBAC configuration:
```bash
$ kubectl auth can-i get rayclusters.ray.io/ray-cluster-with-auth --as=system:serviceaccount:default:ray-user
yes
kubectl apply -f ray-cluster-rbac.yaml
```

## Submit a job
### Verify Access

Confirm that the service account now has access to the `RayCluster` resource:

Set the `RAY_JOB_HEADERS` environment variable containing a token from the Kubernetes service account:
```bash
$ export RAY_JOB_HEADERS="{\"Authorization\": \"Bearer $(kubectl create token ray-user)\"}"
kubectl auth can-i get rayclusters.ray.io/ray-cluster-with-auth --as=system:serviceaccount:default:ray-user
```

Submit a Ray job:
The output should be `yes`.

## Submit a Ray Job with Authentication

Now you can submit a Ray job using the service account's authentication token.

Get a token for the `ray-user` service account and store it in the `RAY_JOB_HEADERS` environment variable:

```bash
export RAY_JOB_HEADERS="{\"Authorization\": \"Bearer $(kubectl create token ray-user)\"}"
```

Submit the Ray job:

```bash
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
```

The job should now succeed, and you'll see output similar to this:

```bash
$ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Job submission server address: http://localhost:8265

-------------------------------------------------------
Job 'raysubmit_n2fq2Ui7cbh3p2Js' submitted successfully
Job 'raysubmit_...' submitted successfully
-------------------------------------------------------

Next steps
Expand All @@ -123,49 +171,30 @@ Next steps

Tailing logs until the job exits (disable with --no-wait):

2024-12-04 12:21:02,613 INFO job_manager.py:530 -- Runtime env is setting up.
2024-12-04 12:21:04,182 INFO worker.py:1494 -- Using address 10.112.0.52:6379 set in the environment variable RAY_ADDRESS
2024-12-04 12:21:04,183 INFO worker.py:1634 -- Connecting to existing Ray cluster at address: 10.112.0.52:6379...
2024-12-04 12:21:04,198 INFO worker.py:1810 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8443
{'node:10.112.0.52': 1.0, 'memory': 12884901888.0, 'node:__internal_head__': 1.0, 'object_store_memory': 3708144844.0, 'CPU': 4.0, 'node:10.112.1.49': 1.0, 'node:10.112.2.36': 1.0}
...
{'node:10.112.0.52': 1.0, 'memory': ..., 'node:__internal_head__': 1.0, 'object_store_memory': ..., 'CPU': 4.0, 'node:10.112.1.49': 1.0, 'node:10.112.2.36': 1.0}

------------------------------------------
Job 'raysubmit_n2fq2Ui7cbh3p2Js' succeeded
Job 'raysubmit_...' succeeded
------------------------------------------
```
## Verify access using cloud IAM (optional)
## Verify Access Using Cloud IAM (Optional)
Most cloud providers support authenticating to your Kubernetes cluster as your cloud IAM user. For example, on GKE you can authenticate as your Google Cloud user using `gcloud auth print-access-token`:
```bash
$ export RAY_JOB_HEADERS="{\"Authorization\": \"Bearer $(gcloud auth print-access-token)\"}"
```
Most cloud providers allow you to authenticate to your Kubernetes cluster as your cloud IAM user. This is a convenient way to interact with the cluster without managing separate Kubernetes credentials.
Submit a Ray job:
```bash
$ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Job submission server address: http://localhost:8265
**Example using Google Cloud (GKE):**
-------------------------------------------------------
Job 'raysubmit_L8D1bBiTP9CYsA5P' submitted successfully
-------------------------------------------------------
Get an access token for your Google Cloud user:
Next steps
Query the logs of the job:
ray job logs raysubmit_L8D1bBiTP9CYsA5P
Query the status of the job:
ray job status raysubmit_L8D1bBiTP9CYsA5P
Request the job to be stopped:
ray job stop raysubmit_L8D1bBiTP9CYsA5P
```bash
export RAY_JOB_HEADERS="{\"Authorization\": \"Bearer $(gcloud auth print-access-token)\"}"
```
Tailing logs until the job exits (disable with --no-wait):
2024-12-04 12:26:39,949 INFO job_manager.py:530 -- Runtime env is setting up.
2024-12-04 12:26:41,119 INFO worker.py:1494 -- Using address 10.112.0.52:6379 set in the environment variable RAY_ADDRESS
2024-12-04 12:26:41,120 INFO worker.py:1634 -- Connecting to existing Ray cluster at address: 10.112.0.52:6379...
2024-12-04 12:26:41,130 INFO worker.py:1810 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8443
{'memory': 12884901888.0, 'node:__internal_head__': 1.0, 'CPU': 4.0, 'object_store_memory': 3708144844.0, 'node:10.112.0.52': 1.0, 'node:10.112.1.49': 1.0, 'node:10.112.2.36': 1.0}
Submit a Ray job using the IAM token:
------------------------------------------
Job 'raysubmit_L8D1bBiTP9CYsA5P' succeeded
------------------------------------------
```bash
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
```
The job should succeed if your cloud user has the necessary Kubernetes RBAC permissions (you may need to configure additional RBAC rules for your cloud user).

0 comments on commit 9b12bff

Please sign in to comment.