The repo contains terraform configuration for resources for running ZenML pipelines on AWS
In this repository, we have used this fanatastic repo : Open Source MLOps Stack Recipes by ZenML. We have taken aws-minimal recipe as a starting point, and modified to fit our needs. The modification steps are listed below:
-
☁️ Pull
aws-minimal
recipe that we will use for running ZenML pipelines on AWS.zenml stack recipe pull aws-minimal
By default, this recipe will create following resources on AWS
- An EKS cluster that can act as an orchestrator for your workloads.
- An S3 bucket as an artifact store, which can be used to store all your ML artifacts like the model, checkpoints, etc.
- An MLflow tracking server as an experiment tracker which can be used for logging data while running your applications. It also has a beautiful UI that you can use to view everything in one place.
- A Seldon Core deployment as a model deployer to have your trained model deployed on a Kubernetes cluster to run inference on.
- A secrets manager enabled for storing your secrets.
This command will create a folder
zenml_stack_recipes/aws-minimal
that contains bunch of terraform files for above mentioned resources. -
✂️ We want to modify this stack to not use Seldon as model deployer. So, we delete Seldon Core specific resources. The files to be modified or deleted are
- Delete
seldon
folder. - Delete
seldon.tf
file. - Inside
output_file.tf
, remove lines from 35 to 39. - Inside
outpus.tf
, remove line 46 to 57 that output seldon specific information. - Inside
get_URIs.tf
, remove line 12 to 21. - Inside
locals.ts
, remove line 17 to 21.
- Delete
-
🎨 Customize the deployment by editing the default values in the
locals.tf
file.- Change
prefix
fromprefix
todobble
. - Change
region
fromeu-west-1
toeu-west-2
.
- Change
-
🐛 There is a bug in the stack recipe with respect to the mlflow tracking uri in
output_file.tf
file.-
Change
tracking_uri
from:"http://${data.kubernetes_service.mlflow_tracking.status.0.load_balancer.0.ingress.0.hostname}"
to
"http://${data.kubernetes_service.mlflow_tracking.status.0.load_balancer.0.ingress.0.hostname}/mlflow"
-
-
🎉 The resulting terraform configuration is committed to the Git repository.
Pre-requsities:
- tfenv and Terraform :
tfenv install && tfenv use
- AWS CLI
- Install zenml stack :
pip install "zenml[stacks]"
- kubectl
-
⚡ Configure AWS : Follow this guide to configure aws credentials using AWS CLI (
aws configure
).Note Set
Default region name
toeu-west-2
. -
🔐 Add your secret information like keys and passwords into the
values.tfvars.json
file which is not committed and only exists locally.Create a file named
values.tfvars.json
underzenml_stack_recipes/aws_minimal
directory with following contents.{ "mlflow-artifact-S3-access-key" : "<fill-me>", "mlflow-artifact-S3-secret-key" : "<fill-me>", "mlflow-username" : "<fill-me>", "mlflow-password" : "<fill-me>", "zenml-version" : "0.20.5" }
- Add
aws_access_key_id
tomlflow-artifact-S3-access-key
. - Add
aws_secret_access_key
tomlflow-artifact-S3-secret-key
. - Add
mlflow-username
andmlflow-password
as well.
Note These secrets can be accessed by runnning
cat ~/.aws/credentials
. - Add
-
🚀 Deploy the recipe with this simple command.
Navigate to directory containing terraform files.
cd zenml_stack_recipes/aws_minimal
terraform init terraform plan --var-file values.tfvars.json terraform apply --var-file values.tfvars.json terraform output
-
🔨 (Optional) Upon successfully provisioning all resources and making sure kubectl configured to eks cluster.
kubectl get namespaces
should containingress-nginx
.To deploy ZenServer, create a file named
zen_server.tfvars.json
and fill in the content{ "name": "dobble", "region": "eu-west-2", "provider": "aws", "username": "<fill-me>", "password": "<fill-me>", "create_ingress_controller": "false", "ingress_controller_hostname": "<fill-me>", "zenmlserver_image_tag": "0.20.5" }
To get value of
ingress_controller_hostname
, run following# copy the LoadBalancer Ingress from the output of command kubectl describe svc nginx-controller-ingress-nginx-controller -n ingress-nginx
Deploy the ZenServer using the command below.
zenml deploy --config zen_server.tfvars.json
After the server is created, you can visit the output url and login with the credentials supplied above to access ZenServer dashboard.
Connect to the ZenServer:
zenml connect --url=<fill-me> --username=<fill-me> --password=<fill-me> --no-verify-ssl
-
📃 A ZenML stack configuration file (ex:
aws_minimal_stack_<something>.yaml
) gets created after the previous command executes 🤯!
Here we are going to modify the generated configuration file to run the pipelines locally instead of on kubernetes. To do so, we change the contents in aws_minimal_stack_<something>.yaml
as following:
Replace the orchestrator configuration from
orchestrator:
id: <something>
flavor: kubernetes
name: eks_kubernetes_orchestrator
configuration: {"kubernetes_context": "terraform", "synchronous": True}
to
orchestrator:
id: <something>
flavor: local
name: local_orchestrator
configuration: {}
For consistency with naming experiment tracker in yaml configuration of the pipelines, we replace the name in experiment_tracker
from name: eks_mlflow_experiment_tracker
to name: mlflow_tracker
.
```bash
zenml stack import -f <path-to-the-created-stack-config-yaml> <stack-name>
```
Run zenml pipelines using the above created stack, once it is set as active. Update the configuration files `.yaml` to use appropriate experiment_tracker, etc.
```bash
zenml stack set <stack-name>
```
Install required integrations for running training pipeline on AWS:
```bash
zenml instgration install s3 aws pytorch mlflow -y
```
After the stack is set to active and integrations are installed, we can run the zenml pipelines using this stack:
```bash
python run.py -dp -tp
```
-
💣 Delete the provisioned resources.
terraform destroy
-
🎇 Destory ZenServer.
zenml destroy
-
🧹 You can also remove all the downloaded recipe files from the pull execution by using the clean command.
zenml stack recipe clean