-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RPD-172] Improve documentation (#96)
* Docs fixes * Readme aligned with docs * Readme aligned with docs * Readme aligned with docs * Readme aligned with docs * Readme aligned with docs * Your first model guide * More resolution * RPD-172 added acknowledgements to README * Better animated gif * RPD-172 updates based on PR comments * RPD-172 added note on alpha * Your first model guide * RPD-172 updated stack diagram * Docs: logo colour more Matcha-like, changed purple banner background, added gif to index, finished getting started guide * RPD-172 updated gif --------- Co-authored-by: Jonathan Carlton <[email protected]>
- Loading branch information
Showing
14 changed files
with
185 additions
and
105 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,12 @@ | ||
# Cost | ||
|
||
`matcha` deploys a set of resources to Azure for you, so a natural question is: what is this going to cost? Below is how much it would cost to run the provisioned resources on Azure for a month. | ||
Matcha deploys a set of resources to Azure for you, so a natural question is: what is this going to cost? Below is how much it would cost to run the provisioned resources on Azure for a month. | ||
|
||
<figure markdown> | ||
![Azure Cost Breakdown](img/azure-permissions/azure-cost-breakdown.png) | ||
<figcaption>Azure Cost Breakdown</figcaption> | ||
</figure> | ||
|
||
This is a minimal setup and we've minimised the amount of resources where possible, reducing the overall cost. It's worth noting that if you don't have an Azure account, new joiners get Azure Credit which more than covers the cost of deploying these resources for exploring `matcha`. | ||
This is a minimal setup and we've minimised the amount of resources where possible, reducing the overall cost. It's worth noting that if you don't have an Azure account, new joiners get Azure Credit which more than covers the cost of deploying these resources for exploring Matcha. | ||
|
||
Our advice would be to make use of the `matcha destroy` command which will deprovision resources for you. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
:root { | ||
--md-primary-fg-color: #8211ff | ||
--md-primary-fg-color: #0d1117 | ||
} | ||
|
||
header footer { | ||
padding: 12px; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,82 +1,90 @@ | ||
# Getting Started | ||
# Deploying your first model with Matcha | ||
|
||
This guide will show you how to get up and running with a fully provisioned cloud environment using `matcha` :tea:. We have a number of examples, see [here](https://github.com/fuzzylabs/matcha-examples) for our examples repository. | ||
In this guide, we'll walk you through how to provision your first machine learning infrastructure to Azure, and then use that infrastructure to train and deploy a model. The model we're using is a movie recommender, and we picked this because it's one that beginners can get up and running with quickly. | ||
|
||
## A movie recommender with experiment tracking | ||
There are five things we'll cover: | ||
|
||
In this example, we'll show you how to use `matcha` to setup a default cloud environment on Azure and hook up a movie recommendation pipeline to run on that environment. | ||
* [Pre-requisites](#pre-requisites): everything you need to set up before starting. | ||
* [The movie recommender](#the-movie-recommender): downloading the example code and setting up your Python environment | ||
* [Provisioning](#provisioning): Using Matcha to provision your infrastructure | ||
* [Training and deploying](#training-and-deploying): training a model on your provisioned infrastructure, deploying, and testing it | ||
* [Destroying](#destroying): tearing down provisioned infrastructure | ||
|
||
### What's the benefit of experiment tracking? | ||
The movie recommender is one among several of example workflows that we've made available on Github; you can view all our examples [here](https://github.com/fuzzylabs/matcha-examples). | ||
|
||
If you're reading through our documentation, then it's quite likely that we don't need to sell the benefit of tracking your experiments. Even so, it's worth emphasising. Having experiment tracking means that for each run of your pipeline, its metadata is stored in a central place. | ||
> Note: Matcha is still in alpha release. While we've worked hard to ensure there are no defects, there's a small chance that you'll find a bug or something that hasn't been documented as well as it could be. If that happens, we'd really value your feedback, which you can send by submitting an issue to Matcha on Github. | ||
### Pre-requisites | ||
# Pre-requisites | ||
|
||
Before trying to provision infrastructure on Azure, `matcha` needs to you to be authenticated and to have the correct permissions for the tools you're wanting to deploy. See our explainer on [Azure Permissions](azure-permissions.md). | ||
## An Azure cloud environment | ||
|
||
Alongside this, you need the Azure CLI installed - see [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) on how to install it | ||
Matcha uses Azure to provision your infrastructure, so first you'll need to set up a [Microsoft® Azure account](https://azure.com). | ||
|
||
### Getting Setup | ||
## Tools you'll need | ||
|
||
Let's start with logging into Azure (make sure you've followed the steps above to install the Azure CLI): | ||
Next, you'll need to install a couple of things. | ||
|
||
```bash | ||
az login | ||
``` | ||
* Python 3.8 or newer, along with Virtual Env and PIP. | ||
* The Azure command line tool. Instructions on installing this can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). | ||
* Terraform. We use this to provision services inside Azure. You'll find installation instructions for your platform [here](https://developer.hashicorp.com/terraform/downloads?product_intent=terraform). We recommend version 1.4 or newer. | ||
|
||
Clone our examples repository: | ||
# The movie recommender | ||
|
||
```bash | ||
git clone [email protected]:fuzzylabs/matcha-examples.git | ||
``` | ||
Matcha has an [examples repository](https://github.com/fuzzylabs/matcha-examples) on Github, and that's what we'll be working from in this guide. There are a number of different examples in that repository, but we'll focus on the movie recommender. Note, however, that all the examples have been designed to work in much the same way as this one. | ||
|
||
Move into the recommendation example directory: | ||
Start by cloning the examples repository: | ||
|
||
```bash | ||
cd recommendation | ||
git clone https://github.com/fuzzylabs/matcha-examples.git | ||
``` | ||
|
||
Create a virtual environment: | ||
Then, enter the `recommendation` directory and set up your Python environment: | ||
|
||
```bash | ||
cd recommendation | ||
python3 -m venv venv | ||
source venv/bin/activate | ||
``` | ||
|
||
> There is a requirement for the Python version being used in this example to be 3.8+. We recommended making use of [`pyenv`](https://github.com/pyenv/pyenv) to manage your versions. | ||
Install `matcha`: | ||
Now, let's install Matcha: | ||
|
||
```bash | ||
pip install matcha-ml | ||
``` | ||
|
||
Test that your installation is working by running: | ||
You can test that your installation is working by running | ||
|
||
```bash | ||
$ matcha --version | ||
Matcha version: 0.1.0 | ||
matcha --version | ||
``` | ||
|
||
### Provisioning your Azure environment with `matcha` | ||
Which should reply with something like `Matcha version: <version number>`. | ||
|
||
Now you have your virtual environment configured and `matcha` installed, it's time to provision your Azure environment. For this example, we'll deploy an experimental tracker ([MLflow](https://mlflow.org/)) to Azure. There are other components deployed as part of this, see [here](inside-matcha.md) for a detailed explanation of what `matcha` is doing. | ||
Now you're ready to provision your infrastructure. | ||
|
||
To start, you need to authenticate with Azure (see [pre-requisites](#pre-requisites)): | ||
# Provisioning | ||
|
||
Using the Azure CLI, you will need to authenticate: | ||
|
||
```bash | ||
az login | ||
``` | ||
|
||
`matcha` has a set of sensible defaults for the infrastructure that it'll provision for you. | ||
When you run this command, you'll be taken to the Azure login screen in a web browser window, and you'll be asked if you want to allow the Azure CLI to access your Azure account. You'll need to grant this permission in order for Matcha to gain access to your Azure account when it provisions infrastructure. | ||
|
||
> Note: you'll need certain permissions in order for Matcha to work. If you're unsure, you can just run `matcha` and it will tell you if you're missing any permissions. For specifics around permissions, please see our explainer on [Azure Permissions](azure-permissions.md). | ||
To provision an experiment tracker using `matcha`, run the following command (you'll be asked a series of questions which helps `matcha` personalise the environment to you): | ||
Next, let's provision: | ||
|
||
```bash | ||
matcha provision | ||
``` | ||
|
||
Once `provision` has finished it's thing, you can use the following command to inspect the resources that have been provisioned for you: | ||
Initially, Matcha will ask you a few questions about how you'd like your infrastructure to be set up. Specifically, it will ask for a _name_ for your infrastructure, a _region_ to deploy it to, and a password. After that, it will go ahead of provision infrastructure. | ||
|
||
> Note: provisioning can take up to 20 minutes. | ||
Once provisioning is completed, you can query Matcha, using the `get` command: | ||
|
||
```bash | ||
matcha get | ||
|
@@ -116,7 +124,7 @@ Pipeline | |
- storage-path: az://<path> | ||
``` | ||
|
||
You can then use `get` to inspect specific resources, for example: | ||
You can also use `get` to inspect specific resources, for example: | ||
|
||
```bash | ||
matcha get experiment-tracker | ||
|
@@ -134,25 +142,70 @@ Experiment tracker | |
|
||
> Note: You can also get these outputs in either json or YAML format using the following: `matcha get --output json` | ||
By default, `matcha` will hide sensitive resource properties. If you need one of these properties, then you can add the `--show-sensitive` flag to your `get` command. | ||
By default, Matcha will hide sensitive resource properties. If you need one of these properties, then you can add the `--show-sensitive` flag to your `get` command. | ||
|
||
# Training and deploying | ||
|
||
Now that you've reached this point, you'll have provisioned the following infrastructure into Azure: | ||
|
||
* The MLFlow experiment tracker and model registry. | ||
* Seldon for model deployment and serving. | ||
* A ZenML server. This example uses ZenML for defining and orchestrating the training and deployment pipelines. | ||
* Kubernetes. This has two roles: firstly, it's where the training workload actually runs, and secondly it's the deployment environment for all of the above components. | ||
|
||
## Setup | ||
|
||
## Setting up | ||
|
||
Before you can train the model, there's a little setup to do. We've provided a convenient script that does this for you: | ||
|
||
```bash | ||
./setup.sh | ||
``` | ||
|
||
### Running your recommender | ||
You might wonder why this setup step is necessary, and what it's doing. While you've already set Matcha up, the code that will train the model needs to know a few things about your infrastructure before it can run. As you've seen, `matcha get` is what's used to query information about your infrastructure. Under the hood, the setup script for the movie recommender model actually invokes `matcha get` to find out everything it needs to know. Additionally, this script installs some Python dependencies that are specific to the machine learning task that we're working with; crucially, the _Surprise_ library, which is part of Scikit-learn, which we're using to do the recommendation bit itself. | ||
|
||
The environment is provisioned, you've got a movie recommender, and you're hyped and ready to go - we hope. | ||
## Training | ||
|
||
Running the following command will run the recommendation pipeline locally (using ZenML), but the metadata associated with it (e.g., the RMSE performance metric) will be stored in your deployed experiment tracker: | ||
Once the setup script completes, you're ready to train the model: | ||
|
||
```bash | ||
python run.py --train | ||
``` | ||
|
||
From here, you'll be able to visit your experiment tracker and see the runs stored there. | ||
## Experiment tracking | ||
|
||
Training won't take too long. After it finishes, you'll be able to view the details of this training run in MLFlow. First, look up the URL to MLFlow: | ||
|
||
```bash | ||
matcha get experiment-tracker | ||
``` | ||
|
||
Copy the tracking URL into a web browser. Then, from the `experiments` pane on the left-hand side of the MLFlow interface, you'll be able to select `recommendation_pipeline`. Each time the training pipeline runs, it will be logged here, so you can view historical runs alongside important details such as the training parameters or the model performance. | ||
|
||
![Screenshot of the MLFlow user interface showing the recommendation pipeline runs](img/getting-started/recommendation-example-mlflow.png) | ||
|
||
## Deploying | ||
|
||
Your model has been trained, but we can't interact with it until it has been deployed. Alongside the training pipeline, the movie recommender example includes a deployment pipeline, which will result in the model being deployed to Seldon, and made accessible as a web service. | ||
|
||
Run | ||
|
||
```bash | ||
python run.py --deploy | ||
``` | ||
|
||
Once this has completed, you can test the model out. We've included a convenience script to help with this, called `inference.py`: | ||
|
||
```bash | ||
python inference.py --user 100 --movie 100 | ||
``` | ||
|
||
### Releasing Resources | ||
This will result in a score, which represents how strongly we recommend movie ID `100` to user ID `100`. | ||
|
||
Even though we've chosen sensible a default configuration for you, leaving the resources you've provisioned in this example running in the cloud is going to run up a bill. | ||
# Destroying | ||
|
||
To release the resources that you've provisioned in this example, run the following command: | ||
The final thing you'll want to do is decomission the infrastructure that Matcha has set up during this guide. Matcha includes a `destroy` command which will remove everything that has been provisioned, which avoids running up an Azure bill! | ||
|
||
```bash | ||
matcha destroy | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.