Skip to content

Latest commit

 

History

History
101 lines (78 loc) · 3.17 KB

aws_installation.md

File metadata and controls

101 lines (78 loc) · 3.17 KB

Installation using AWS EKS cluster

Before you begin, ensure you have the following tools installed:

  • AWS CLI to provision AWS resources
  • Eksctl (>= v0.191.0) to create and manage clusters on EKS
  • Helm to install this operator
  • kubectl to view Kubernetes resources

Create EKS Cluster

If you do not already have an EKS cluster, run the following to create one:

cd ../.. #go back to main directory to use MAKE commands

export AWS_CLUSTER_NAME=kaito-aws
export AWS_REGION=us-west-2
export AWS_PARTITION=aws
export AWS_K8S_VERSION=1.30
export KARPENTER_NAMESPACE=kube-system
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"

make deploy-aws-cloudformation
make create-eks-cluster

If you already have an EKS cluster, connect to it using

aws eks update-kubeconfig --name $CLUSTER_NAME --region $AWS_REGION

Install Karpenter Controller

make aws-karpenter-helm

Install Workspace Controller

make aws-patch-install-helm

Verify installation

You can run the following commands to verify the installation of the controllers were successful.

Check status of the Helm chart installations.

helm list -n default

Check status of workspace.

kubectl describe deploy kaito-workspace -n kaito-workspace

Check status of karpenter.

kubectl describe deploy karpenter -n $KARPENTER_NAMESPACE

Create a Workspace and start an inference service

Once the Kaito and Karpenter controllers are installed, you can follow these commands to start a falcon-7b inference service.

$ export kaito_workspace_aws="../../examples/inference/kaito_workspace_falcon_7b_aws.yaml"
$ cat $kaito_workspace_aws
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
  name: aws-workspace
resource:
  instanceType: "g5.4xlarge"
  labelSelector:
    matchLabels:
      apps: falcon-7b
inference:
  preset:
    name: "falcon-7b"

$ kubectl apply -f $kaito_workspace_aws

The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes True, the model has been deployed successfully.

$ kubectl get workspace workspace-falcon-7b
NAME                  INSTANCE            RESOURCEREADY   INFERENCEREADY    JOBSTARTED  WORKSPACESUCCEEDED  AGE
aws-workspace         g5.4xlarge          True            True              True        True                10m

Next, one can find the inference service's cluster ip and use a temporal curl pod to test the service endpoint in the cluster.

$ kubectl get svc aws-workspace
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)            AGE
aws-workspace         ClusterIP   <CLUSTERIP>  <none>        80/TCP,29500/TCP   10m

export CLUSTERIP=$(kubectl get svc aws-workspace -o jsonpath="{.spec.clusterIPs[0]}") 
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}"