-
Notifications
You must be signed in to change notification settings - Fork 48
CodeFlare Operator Installation
Taken from: https://github.com/opendatahub-io/distributed-workloads/blob/main/Quick-Start.md
0.1 Assumes you have an OpenShift Cluster
0.2 It assumes you're logged into the OpenShift Console of your OpenShift Cluster, to be able to install the ODH and CodeFlare operators. (Applying a subscription from the terminal is available if you don't have the OpenShift UI)
0.3 It assumes you've already used oc login
to log into your OpenShift cluster from a terminal.
0.4 It also assumes you have a default storage class already set up. For the IBM Fyre clusters, I'm using "PortWorx" storage and have defined a default storageclass:
oc get sc |grep default
portworx-watson-assistant-sc (default) kubernetes.io/portworx-volume Retain Immediate true 3h50m
1.1 Using your Console, navigate to Operators --> OperatorHub and filter for Open Data Hub Operator
1.2 Press Install
, accept all the defaults and then press Install
again.
Optionally, you could have issued the subscription from the terminal with this:
cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: opendatahub-operator
namespace: openshift-operators
spec:
channel: rolling
name: opendatahub-operator
source: community-operators
sourceNamespace: openshift-marketplace
installPlanApproval: Automatic
startingCSV: opendatahub-operator.v1.9.0
EOF
1.3 Using your terminal, you can see that the ODH operator is running by:
oc get pods -n openshift-operators
and you'll see that it has started:
NAME READY STATUS RESTARTS AGE
opendatahub-operator-controller-manager-84858b8998-7nd6q 2/2 Running 0 87s
2. Install the CodeFlare Operator into openshift-operators namespace using the OpenShift UI console:
2.1 Using your Console, navigate to Operators --> OperatorHub and filter for CodeFlare Operator
2.2 Press Install
, accept all the defaults and then press Install
again.
Optionally, you could have issued the subscription from the terminal with this:
cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: codeflare-operator
namespace: openshift-operators
spec:
channel: alpha
name: codeflare-operator
source: community-operators
sourceNamespace: openshift-marketplace
installPlanApproval: Manual #ManualAutomatic
startingCSV: codeflare-operator.v1.0.0-rc.1
EOF
2.3 Using your terminal, you can see that the CodeFlare operator is running by:
oc get pods -n openshift-operators
and you'll see that it has started:
NAME READY STATUS RESTARTS AGE
codeflare-operator-controller-manager-8594c586f4-rlbbv 2/2 Running 0 100s
opendatahub-operator-controller-manager-84858b8998-7nd6q 2/2 Running 0 2m24s
3. If you want to run GPU enabled workloads, you will need to install the Node Feature Discovery Operator
and the NVIDIA GPU Operator
from the OperatorHub.
3.1 Using your Console, navigate to Operators --> OperatorHub and filter for Node Feature Discovery
. Select the Operator that's from the Red Hat catalog, not the Community)
3.2 Press Install
, accept all the defaults and then press Install
again.
3.3 Using your terminal, you can see that the Node Feature Discovery Operator is running by:
oc get pods -n openshift-nfd
and you'll see that it has started:
NAME READY STATUS RESTARTS AGE
nfd-controller-manager-b767b964c-sl7j2 2/2 Running 0 12s
3.4 Using your Console, navigate to Operators --> OperatorHub and filter for NVIDIA GPU Operator
.
3.5 Press Install
, accept all the defaults and then press Install
again.
3.6 Using your terminal, you can see that the NVIDIA GPU Operator is running by:
oc get pods -n nvidia-gpu-operator
and you'll see that it has started:
NAME READY STATUS RESTARTS AGE
gpu-operator-868867dbdb-2nd9s 1/1 Running 0 4m11s
4. Now with the Codeflare and ODH operators installed, (and the GPU operators installed if you have GPUs) you can deploy the kfdefs which will install the underlying stack to the opendatahub namespace:
4.1 Create the opendatahub namespace with the following command:
oc create ns opendatahub
4.2 Apply the odh-core kfdef with this command:
oc apply -f https://raw.githubusercontent.com/opendatahub-io/odh-manifests/master/kfdef/odh-core.yaml -n opendatahub
4.3 Create the CodeFlare-Stack kfdef with this command:
oc apply -f https://raw.githubusercontent.com/opendatahub-io/distributed-workloads/main/codeflare-stack-kfdef.yaml -n opendatahub
Note: The older version of the KFDEF without the latest CRD changes would be:
oc apply -f TBD
4.4 Check that everything is running in opendatahub with this command:
oc get pods -n opendatahub
It should look like this:
NAME READY STATUS RESTARTS AGE
data-science-pipelines-operator-controller-manager-5fbfdc8x5wnx 1/1 Running 0 3m39s
etcd-85c59bc4d6-wn777 1/1 Running 0 3m41s
grafana-deployment-6cf577dbb6-ptcjp 1/1 Running 0 3m35s
grafana-operator-controller-manager-54fbd5b876-zfbvz 2/2 Running 0 4m4s
instascale-instascale-66587c96f5-28chv 1/1 Running 0 4m34s
kuberay-operator-67d58795bf-h8hwt 1/1 Running 0 4m31s
mcad-controller-mcad-5f5cb64ddb-mhf5p 1/1 Running 0 4m34s
modelmesh-controller-5588b58d79-c46g5 1/1 Running 0 3m41s
modelmesh-controller-5588b58d79-tn4rt 1/1 Running 0 3m41s
modelmesh-controller-5588b58d79-wz82x 1/1 Running 0 3m41s
notebook-controller-deployment-5c565c4c75-2pbzg 1/1 Running 0 3m50s
odh-dashboard-7f46945556-kd7l5 2/2 Running 0 4m37s
odh-dashboard-7f46945556-vsg4m 2/2 Running 0 4m37s
odh-model-controller-79c67bc689-5559f 1/1 Running 0 3m41s
odh-model-controller-79c67bc689-9q9ss 1/1 Running 0 3m41s
odh-model-controller-79c67bc689-vnfbh 1/1 Running 0 3m41s
odh-notebook-controller-manager-5cf77fdc56-s4cm6 1/1 Running 0 3m50s
prometheus-odh-model-monitoring-0 3/3 Running 0 3m39s
prometheus-odh-model-monitoring-1 3/3 Running 0 3m39s
prometheus-odh-model-monitoring-2 3/3 Running 0 3m39s
prometheus-odh-monitoring-0 2/2 Running 0 3m58s
prometheus-odh-monitoring-1 2/2 Running 0 3m58s
prometheus-operator-779f765944-p2nbf 1/1 Running 0 4m9s
https://odh-dashboard-$ODH_NAMESPACE.apps.<your cluster's uri>
5.1 You can find it with this command:
oc get route -n opendatahub |grep dash |awk '{print $2}'
For example:
odh-dashboard-opendatahub.apps.jimbig412.cp.fyre.ibm.com
5.2 Put that in your browser. For example: https://odh-dashboard-opendatahub.apps.jimbig412.cp.fyre.ibm.com
- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well
5.3 Click on the link "Launch application" in the Jupyter tile.
5.4 Choose CodeFlare Notebook, and click "Start server"
4.5 Note, if this is the first time, it'll take awhile to pull the new container. You can watch it start from the terminal by issuing this:
oc get pods -n opendatahub |grep jupyter
And it'll show if the pod is starting or has started. For example:
jupyter-nb-kube-3aadmin-0 0/2 ContainerCreating 0 89s
and then a few minutes later:
jupyter-nb-kube-3aadmin-0 2/2 Running 0 2m30s
4.6 Note, It's also using a pvc:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
jupyterhub-nb-kube-3aadmin-pvc Bound pvc-28c725bd-6ba8-4bf4-92fe-b88b82b58fc6 1Gi RWO portworx-watson-assistant-sc 3m32s
6.1 Click either "Open in a new tab" or "Open in current tab"
- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well
6.2 Click on the "+" to open up a new window, select terminal Inside this terminal, do this:
git clone https://github.com/project-codeflare/codeflare-sdk.git
Then you can close the terminal
6.2 On the far left, navigate to: codeflare-sdk --> demo-notebooks --> guided-demos
7. Then walk through the various guided Jupyter Notebook examples one-by-one to see what you can do with CodeFlare
Hint 1: If you don't want to reveal your OC token in your Jupyter notebook, you can use the terminal to oc login instead of the skip auth = TokenAuthentication step.
Hint 2: When you do a cluster.up() it defaults to the default namespace. You can see your cluster start like this:
oc get pods -n default api.jim412.cp.fyre.ibm.com: Wed May 24 17:41:17 2023
NAME READY STATUS RESTARTS AGE
mnisttest-head-zgpvt 0/1 ContainerCreating 0 60s
mnisttest-worker-small-group-mnisttest-lqbr8 0/1 PodInitializing 0 60s
mnisttest-worker-small-group-mnisttest-ztr27 0/1 PodInitializing 0 60s
The first time, it has to pull the pods and takes a few minutes. Future runs, the images will be cached.
To completely clean up all the CodeFlare components after an install, follow these steps:
-
No appwrappers should be left running:
oc get appwrappers -A
If any are left, you'd want to delete them
-
Remove the notebook and notebook pvc:
oc delete notebook jupyter-nb-kube-3aadmin -n opendatahub oc delete pvc jupyterhub-nb-kube-3aadmin-pvc -n opendatahub
-
Remove the codeflare-stack kfdef: (Removes MCAD, InstaScale, KubeRay and the Notebook image)
oc delete kfdef codeflare-stack -n opendatahub
-
Remove the CodeFlare Operator csv and subscription: (Removes the CodeFlare Operator from the OpenShift Cluster)
oc delete sub codeflare-operator -n openshift-operators oc delete csv `oc get csv -n opendatahub |grep codeflare-operator |awk '{print $1}'` -n openshift-operators
-
Remove the CodeFlare CRDs
oc delete crd instascales.codeflare.codeflare.dev mcads.codeflare.codeflare.dev schedulingspecs.mcad.ibm.com appwrappers.mcad.ibm.com quotasubtrees.ibm.com