Installation using AWS EKS cluster

Before you begin, ensure you have the following tools installed:

AWS CLI to provision AWS resources
Eksctl (>= v0.191.0) to create and manage clusters on EKS
Helm to install this operator
kubectl to view Kubernetes resources

Create EKS Cluster

If you do not already have an EKS cluster, run the following to create one:

cd ../.. #go back to main directory to use MAKE commands

export AWS_CLUSTER_NAME=kaito-aws
export AWS_REGION=us-west-2
export AWS_PARTITION=aws
export AWS_K8S_VERSION=1.30
export KARPENTER_NAMESPACE=kube-system
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"

make deploy-aws-cloudformation
make create-eks-cluster

If you already have an EKS cluster, connect to it using

aws eks update-kubeconfig --name $CLUSTER_NAME --region $AWS_REGION

Install Karpenter Controller

make aws-karpenter-helm

Install Workspace Controller

make aws-patch-install-helm

Verify installation

You can run the following commands to verify the installation of the controllers were successful.

Check status of the Helm chart installations.

helm list -n default

Check status of workspace.

kubectl describe deploy kaito-workspace -n kaito-workspace

Check status of karpenter.

kubectl describe deploy karpenter -n $KARPENTER_NAMESPACE

Create a Workspace and start an inference service

Once the Kaito and Karpenter controllers are installed, you can follow these commands to start a falcon-7b inference service.

$ export kaito_workspace_aws="../../examples/inference/kaito_workspace_falcon_7b_aws.yaml"
$ cat $kaito_workspace_aws
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
  name: aws-workspace
resource:
  instanceType: "g5.4xlarge"
  labelSelector:
    matchLabels:
      apps: falcon-7b
inference:
  preset:
    name: "falcon-7b"

$ kubectl apply -f $kaito_workspace_aws

The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes True, the model has been deployed successfully.

$ kubectl get workspace workspace-falcon-7b
NAME                  INSTANCE            RESOURCEREADY   INFERENCEREADY    JOBSTARTED  WORKSPACESUCCEEDED  AGE
aws-workspace         g5.4xlarge          True            True              True        True                10m

Next, one can find the inference service's cluster ip and use a temporal curl pod to test the service endpoint in the cluster.

$ kubectl get svc aws-workspace
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)            AGE
aws-workspace         ClusterIP   <CLUSTERIP>  <none>        80/TCP,29500/TCP   10m

export CLUSTERIP=$(kubectl get svc aws-workspace -o jsonpath="{.spec.clusterIPs[0]}") 
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_installation.md

aws_installation.md

Installation using AWS EKS cluster

Create EKS Cluster

Install Karpenter Controller

Install Workspace Controller

Verify installation

Create a Workspace and start an inference service

Files

aws_installation.md

Latest commit

History

aws_installation.md

File metadata and controls

Installation using AWS EKS cluster

Create EKS Cluster

Install Karpenter Controller

Install Workspace Controller

Verify installation

Create a Workspace and start an inference service