Before you begin, ensure you have the following tools installed:
- AWS CLI to provision AWS resources
- Eksctl (>= v0.191.0) to create and manage clusters on EKS
- Helm to install this operator
- kubectl to view Kubernetes resources
If you do not already have an EKS cluster, run the following to create one:
cd ../.. #go back to main directory to use MAKE commands
export AWS_CLUSTER_NAME=kaito-aws
export AWS_REGION=us-west-2
export AWS_PARTITION=aws
export AWS_K8S_VERSION=1.30
export KARPENTER_NAMESPACE=kube-system
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
make deploy-aws-cloudformation
make create-eks-cluster
If you already have an EKS cluster, connect to it using
aws eks update-kubeconfig --name $CLUSTER_NAME --region $AWS_REGION
make aws-karpenter-helm
make aws-patch-install-helm
You can run the following commands to verify the installation of the controllers were successful.
Check status of the Helm chart installations.
helm list -n default
Check status of workspace
.
kubectl describe deploy kaito-workspace -n kaito-workspace
Check status of karpenter
.
kubectl describe deploy karpenter -n $KARPENTER_NAMESPACE
Once the Kaito and Karpenter controllers are installed, you can follow these commands to start a falcon-7b inference service.
$ export kaito_workspace_aws="../../examples/inference/kaito_workspace_falcon_7b_aws.yaml"
$ cat $kaito_workspace_aws
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: aws-workspace
resource:
instanceType: "g5.4xlarge"
labelSelector:
matchLabels:
apps: falcon-7b
inference:
preset:
name: "falcon-7b"
$ kubectl apply -f $kaito_workspace_aws
The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes True
, the model has been deployed successfully.
$ kubectl get workspace workspace-falcon-7b
NAME INSTANCE RESOURCEREADY INFERENCEREADY JOBSTARTED WORKSPACESUCCEEDED AGE
aws-workspace g5.4xlarge True True True True 10m
Next, one can find the inference service's cluster ip and use a temporal curl
pod to test the service endpoint in the cluster.
$ kubectl get svc aws-workspace
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
aws-workspace ClusterIP <CLUSTERIP> <none> 80/TCP,29500/TCP 10m
export CLUSTERIP=$(kubectl get svc aws-workspace -o jsonpath="{.spec.clusterIPs[0]}")
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}"