[RPD-172] Improve documentation (#96)

* Docs fixes * Readme aligned with docs * Readme aligned with docs * Readme aligned with docs * Readme aligned with docs * Readme aligned with docs * Your first model guide * More resolution * RPD-172 added acknowledgements to README * Better animated gif * RPD-172 updates based on PR comments * RPD-172 added note on alpha * Your first model guide * RPD-172 updated stack diagram * Docs: logo colour more Matcha-like, changed purple banner background, added gif to index, finished getting started guide * RPD-172 updated gif --------- Co-authored-by: Jonathan Carlton <[email protected]>
fuzzylabs · May 16, 2023 · 6e3bcae · 6e3bcae
1 parent 3f73a43
commit 6e3bcae
Show file tree

Hide file tree

Showing 14 changed files with 185 additions and 105 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 <h1 align="center">
-    <code>matcha</code> &#127861;
+    <img src="docs/img/logo.png" width="200" style="max-width: 200px"></img>
 </h1>
 
 <p align="center">
@@ -18,45 +18,53 @@
 </p>
 
 <h3 align="center">
-    <p>A tool for provisioning an MLOps environment on Azure in 20 minutes</p>
+    <p>Open source MLOps on Azure in one step</p>
 </h3>
 
-&#127861; `matcha` is the open source tool for provisioning MLOps environments to Azure
+If you train machine learning models, then you know the challenge of going from _experiment_ to _production_. There's a vast range of tools that promise to help, from experiment tracking through to model deployment, but setting these up requires a lot of time and cloud engineering knowledge.
 
-With a single command and using sensible defaults, Data Scientists and ML practitioners are provided with the following capabilities:
-
-* &#127939; A way to run model training.
-* &#128099; A way to track experiments.
-* &#127869;&#65039; A way to deploy and serve models.
-
-&#127861; `matcha` uses open source tools to build this MLOps environment on Azure.
+**Matcha removes the complexity of provisioning your machine learning infrastructure**. With one step, you'll have a complete _machine learning operations (MLOps)_ stack up and running in your Microsoft® Azure cloud environment. This means you'll be able to track your experiments, train your models, as well as deploy and serve those models.
 
 <div align="center">
-    <img src="docs/img/matcha-provision.gif">
+    <img src="docs/img/matcha-provision.gif"></img>
 </div>
 
-## 	&#128678; Getting Started
+# &#127861; Who is Matcha for?
 
-Keen to get started with `matcha`? We have a great getting started guide which you can find [here](https://mymatcha.ai/getting-started/).
+Matcha is for data scientists, machine learning engineers, and anybody who trains machine learning models. If you're using Azure, and want an intuitive way to deploy machine learning infrastructure, Matcha is for you.
 
-By the end, you'll understand how to use `matcha` and have a fully provisioned movie recommendation system in Azure, ready to serve some movie ratings!
+# &#127939; How do I get started?
+
+If you're new to Matcha, the best place to start is [our guide to deploying your first model](http://www.mymatcha.ai/getting-started). If you're happy with the basics, then you might want to dive into our [Matcha examples](https://github.com/fuzzylabs/matcha-examples) on Github.
 
 ## &#128214; Documentation
 
-Want to learn more about `matcha`? See our [documentation](https://mymatcha.ai/) which covers what `matcha` is in more detail, how to use it, what Azure permissions are required, and much more!
+The full Matcha documentation can be found at [mymatcha.ai](https://mymatcha.ai). This covers what Matcha is in more detail, how to use it, what Azure permissions are required, and how Matcha works internally.
 
 ## &#128506;&#65039; Roadmap
 
-Where is `matcha` going? We have a [roadmap](https://matcha.hellonext.co/) which shows what we're working on and what is up and coming.
+We've put a lot of thought into what our users — data scientists, ML engineers, etc — need from their infrastructure, and we came up with 5 key pieces of functionality that are absolute musts:
+
+* A place to track, version, and manage datasets.
+* A place to track experiments and models assets.
+* Scalable compute for running training workloads, with the option to use GPUs.
+* Somewhere to deploy and serve models in a way that scales with your application needs.
+* The ability to monitor models for things like drift and bias.
 
-We want you to contribute to this! Is there a feature that you think is missing? Add this to the roadmap and help us prioritise what is useful to you, the community.
+Matcha is still in alpha release, and we don't support everything on that list yet. We support experiment tracking, training, and deployment, with plans to add data versioning and monitoring later. We very much welcome input on our roadmap from our early users.
 
 ## &#128079; Contributing
 
-`matcha` at its very core is open source and we're eager for you to get involved whether through raising issues or by opening a PR.
+Matcha at its very core is open source and we're eager for you to get involved whether through raising issues or by opening a PR.
 
 We have an in-depth [contributing](CONTRIBUTING.md) guide which will describe how to do all of this.
 
+> Note: Matcha is still in alpha release. While we've worked hard to ensure there are no defects, there's a small chance that you'll find a bug or something that hasn't been documented as well as it could be. If that happens, we'd really value your feedback, which you can send by submitting an [issue](https://github.com/fuzzylabs/matcha/issues/new/choose) to Matcha on Github.
+
 ## &#128220; License
 
 This library is released under the Apache License. See [LICENSE](LICENSE).
+
+# &#129309; Acknowledgements
+
+Thank you to [ZenML](https://zenml.io/home) for their contributions and inspiration through their [stack recipes](https://github.com/zenml-io/mlops-stacks).
diff --git a/docs/azure-permissions.md b/docs/azure-permissions.md
@@ -1,14 +1,14 @@
 # Azure Permissions
 
-Before getting started with `matcha`, we need to make sure that you have the correct permissions to provision resources on Azure.
+Before getting started with Matcha, we need to make sure that you have the correct permissions to provision resources on Azure.
 
 There is a presumption made here that you have an Azure account with an active subscription which has billing enabled. For a guide on how to set that up, see [here](https://learn.microsoft.com/en-us/dynamics-nav/how-to--sign-up-for-a-microsoft-azure-subscription).
 
-## Why does `matcha` require certain permissions?
+## Why does Matcha require certain permissions?
 
-As a provisioning tool, `matcha` interacts with Azure on your behalf, hiding away the complexities of standing up resources. To do the provisioning, `matcha` issues commands through your Azure account and to do that, your account needs to have the correct permissions enabled within a subscription.
+As a provisioning tool, Matcha interacts with Azure on your behalf, hiding away the complexities of standing up resources. To do the provisioning, Matcha issues commands through your Azure account and to do that, your account needs to have the correct permissions enabled within a subscription.
 
-## What permissions does `matcha` require?
+## What permissions does Matcha require?
 
 In its current form, the following Azure permissions are required:
 

diff --git a/docs/costings.md b/docs/costings.md
@@ -1,12 +1,12 @@
 # Cost
 
-`matcha` deploys a set of resources to Azure for you, so a natural question is: what is this going to cost? Below is how much it would cost to run the provisioned resources on Azure for a month.
+Matcha deploys a set of resources to Azure for you, so a natural question is: what is this going to cost? Below is how much it would cost to run the provisioned resources on Azure for a month.
 
 <figure markdown>
   ![Azure Cost Breakdown](img/azure-permissions/azure-cost-breakdown.png)
   <figcaption>Azure Cost Breakdown</figcaption>
 </figure>
 
-This is a minimal setup and we've minimised the amount of resources where possible, reducing the overall cost. It's worth noting that if you don't have an Azure account, new joiners get Azure Credit which more than covers the cost of deploying these resources for exploring `matcha`.
+This is a minimal setup and we've minimised the amount of resources where possible, reducing the overall cost. It's worth noting that if you don't have an Azure account, new joiners get Azure Credit which more than covers the cost of deploying these resources for exploring Matcha.
 
 Our advice would be to make use of the `matcha destroy` command which will deprovision resources for you.
diff --git a/docs/css/extra.css b/docs/css/extra.css
@@ -1,3 +1,7 @@
 :root {
-    --md-primary-fg-color: #8211ff
+    --md-primary-fg-color: #0d1117
+}
+
+header footer {
+  padding: 12px;
 }
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -1,82 +1,90 @@
-# Getting Started
+# Deploying your first model with Matcha
 
-This guide will show you how to get up and running with a fully provisioned cloud environment using `matcha` :tea:. We have a number of examples, see [here](https://github.com/fuzzylabs/matcha-examples) for our examples repository.
+In this guide, we'll walk you through how to provision your first machine learning infrastructure to Azure, and then use that infrastructure to train and deploy a model. The model we're using is a movie recommender, and we picked this because it's one that beginners can get up and running with quickly.
 
-## A movie recommender with experiment tracking
+There are five things we'll cover:
 
-In this example, we'll show you how to use `matcha` to setup a default cloud environment on Azure and hook up a movie recommendation pipeline to run on that environment.
+* [Pre-requisites](#pre-requisites): everything you need to set up before starting.
+* [The movie recommender](#the-movie-recommender): downloading the example code and setting up your Python environment
+* [Provisioning](#provisioning): Using Matcha to provision your infrastructure
+* [Training and deploying](#training-and-deploying): training a model on your provisioned infrastructure, deploying, and testing it
+* [Destroying](#destroying): tearing down provisioned infrastructure
 
-### What's the benefit of experiment tracking?
+The movie recommender is one among several of example workflows that we've made available on Github; you can view all our examples [here](https://github.com/fuzzylabs/matcha-examples).
 
-If you're reading through our documentation, then it's quite likely that we don't need to sell the benefit of tracking your experiments. Even so, it's worth emphasising. Having experiment tracking means that for each run of your pipeline, its metadata is stored in a central place.
+> Note: Matcha is still in alpha release. While we've worked hard to ensure there are no defects, there's a small chance that you'll find a bug or something that hasn't been documented as well as it could be. If that happens, we'd really value your feedback, which you can send by submitting an issue to Matcha on Github.
 
-### Pre-requisites
+# Pre-requisites
 
-Before trying to provision infrastructure on Azure, `matcha` needs to you to be authenticated and to have the correct permissions for the tools you're wanting to deploy. See our explainer on [Azure Permissions](azure-permissions.md).
+## An Azure cloud environment
 
-Alongside this, you need the Azure CLI installed - see [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) on how to install it
+Matcha uses Azure to provision your infrastructure, so first you'll need to set up a [Microsoft® Azure account](https://azure.com).
 
-### Getting Setup
+## Tools you'll need
 
-Let's start with logging into Azure (make sure you've followed the steps above to install the Azure CLI):
+Next, you'll need to install a couple of things.
 
-```bash
-az login
-```
+* Python 3.8 or newer, along with Virtual Env and PIP.
+* The Azure command line tool. Instructions on installing this can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli).
+* Terraform. We use this to provision services inside Azure. You'll find installation instructions for your platform [here](https://developer.hashicorp.com/terraform/downloads?product_intent=terraform). We recommend version 1.4 or newer.
 
-Clone our examples repository:
+# The movie recommender
 
-```bash
-git clone [email protected]:fuzzylabs/matcha-examples.git
-```
+Matcha has an [examples repository](https://github.com/fuzzylabs/matcha-examples) on Github, and that's what we'll be working from in this guide. There are a number of different examples in that repository, but we'll focus on the movie recommender. Note, however, that all the examples have been designed to work in much the same way as this one.
 
-Move into the recommendation example directory:
+Start by cloning the examples repository:
 
 ```bash
-cd recommendation
+git clone https://github.com/fuzzylabs/matcha-examples.git
 ```
 
-Create a virtual environment:
+Then, enter the `recommendation` directory and set up your Python environment:
 
 ```bash
+cd recommendation
 python3 -m venv venv
 source venv/bin/activate
 ```
 
-> There is a requirement for the Python version being used in this example to be 3.8+. We recommended making use of [`pyenv`](https://github.com/pyenv/pyenv) to manage your versions.
-
-Install `matcha`:
+Now, let's install Matcha:
 
 ```bash
 pip install matcha-ml
 ```
 
-Test that your installation is working by running:
+You can test that your installation is working by running
 
 ```bash
-$ matcha --version
-Matcha version: 0.1.0
+matcha --version
 ```
 
-### Provisioning your Azure environment with `matcha`
+Which should reply with something like `Matcha version: <version number>`.
 
-Now you have your virtual environment configured and `matcha` installed, it's time to provision your Azure environment. For this example, we'll deploy an experimental tracker ([MLflow](https://mlflow.org/)) to Azure. There are other components deployed as part of this, see [here](inside-matcha.md) for a detailed explanation of what `matcha` is doing.
+Now you're ready to provision your infrastructure.
 
-To start, you need to authenticate with Azure (see [pre-requisites](#pre-requisites)):
+# Provisioning
+
+Using the Azure CLI, you will need to authenticate:
 
 ```bash
 az login
 ```
 
-`matcha` has a set of sensible defaults for the infrastructure that it'll provision for you.
+When you run this command, you'll be taken to the Azure login screen in a web browser window, and you'll be asked if you want to allow the Azure CLI to access your Azure account. You'll need to grant this permission in order for Matcha to gain access to your Azure account when it provisions infrastructure.
+
+> Note: you'll need certain permissions in order for Matcha to work. If you're unsure, you can just run `matcha` and it will tell you if you're missing any permissions. For specifics around permissions, please see our explainer on [Azure Permissions](azure-permissions.md).
 
-To provision an experiment tracker using `matcha`, run the following command (you'll be asked a series of questions which helps `matcha` personalise the environment to you):
+Next, let's provision:
 
 ```bash
 matcha provision
 ```
 
-Once `provision` has finished it's thing, you can use the following command to inspect the resources that have been provisioned for you:
+Initially, Matcha will ask you a few questions about how you'd like your infrastructure to be set up. Specifically, it will ask for a _name_ for your infrastructure, a _region_ to deploy it to, and a password. After that, it will go ahead of provision infrastructure.
+
+> Note: provisioning can take up to 20 minutes.
+
+Once provisioning is completed, you can query Matcha, using the `get` command:
 
 ```bash
 matcha get
@@ -116,7 +124,7 @@ Pipeline
    - storage-path: az://<path>
 ```
 
-You can then use `get` to inspect specific resources, for example:
+You can also use `get` to inspect specific resources, for example:
 
 ```bash
 matcha get experiment-tracker
@@ -134,25 +142,70 @@ Experiment tracker
 
 > Note: You can also get these outputs in either json or YAML format using the following: `matcha get --output json`
 
-By default, `matcha` will hide sensitive resource properties. If you need one of these properties, then you can add the `--show-sensitive` flag to your `get` command.
+By default, Matcha will hide sensitive resource properties. If you need one of these properties, then you can add the `--show-sensitive` flag to your `get` command.
+
+# Training and deploying
+
+Now that you've reached this point, you'll have provisioned the following infrastructure into Azure:
+
+* The MLFlow experiment tracker and model registry.
+* Seldon for model deployment and serving.
+* A ZenML server. This example uses ZenML for defining and orchestrating the training and deployment pipelines.
+* Kubernetes. This has two roles: firstly, it's where the training workload actually runs, and secondly it's the deployment environment for all of the above components.
+
+## Setup
+
+## Setting up
+
+Before you can train the model, there's a little setup to do. We've provided a convenient script that does this for you:
+
+```bash
+./setup.sh
+```
 
-### Running your recommender
+You might wonder why this setup step is necessary, and what it's doing. While you've already set Matcha up, the code that will train the model needs to know a few things about your infrastructure before it can run. As you've seen, `matcha get` is what's used to query information about your infrastructure. Under the hood, the setup script for the movie recommender model actually invokes `matcha get` to find out everything it needs to know. Additionally, this script installs some Python dependencies that are specific to the machine learning task that we're working with; crucially, the _Surprise_ library, which is part of Scikit-learn, which we're using to do the recommendation bit itself.
 
-The environment is provisioned, you've got a movie recommender, and you're hyped and ready to go - we hope.
+## Training
 
-Running the following command will run the recommendation pipeline locally (using ZenML), but the metadata associated with it (e.g., the RMSE performance metric) will be stored in your deployed experiment tracker:
+Once the setup script completes, you're ready to train the model:
 
 ```bash
 python run.py --train
 ```
 
-From here, you'll be able to visit your experiment tracker and see the runs stored there.
+## Experiment tracking
+
+Training won't take too long. After it finishes, you'll be able to view the details of this training run in MLFlow. First, look up the URL to MLFlow:
+
+```bash
+matcha get experiment-tracker
+```
+
+Copy the tracking URL into a web browser. Then, from the `experiments` pane on the left-hand side of the MLFlow interface, you'll be able to select `recommendation_pipeline`. Each time the training pipeline runs, it will be logged here, so you can view historical runs alongside important details such as the training parameters or the model performance.
+
+![Screenshot of the MLFlow user interface showing the recommendation pipeline runs](img/getting-started/recommendation-example-mlflow.png)
+
+## Deploying
+
+Your model has been trained, but we can't interact with it until it has been deployed. Alongside the training pipeline, the movie recommender example includes a deployment pipeline, which will result in the model being deployed to Seldon, and made accessible as a web service.
+
+Run
+
+```bash
+python run.py --deploy
+```
+
+Once this has completed, you can test the model out. We've included a convenience script to help with this, called `inference.py`:
+
+```bash
+python inference.py --user 100 --movie 100
+```
 
-### Releasing Resources
+This will result in a score, which represents how strongly we recommend movie ID `100` to user ID `100`.
 
-Even though we've chosen sensible a default configuration for you, leaving the resources you've provisioned in this example running in the cloud is going to run up a bill.
+# Destroying
 
-To release the resources that you've provisioned in this example, run the following command:
+The final thing you'll want to do is decomission the infrastructure that Matcha has set up during this guide. Matcha includes a `destroy` command which will remove everything that has been provisioned, which avoids running up an Azure bill!
 
 ```bash
 matcha destroy

diff --git a/docs/img/favicon.png b/docs/img/favicon.png
diff --git a/docs/img/getting-started/recommendation-example-mlflow.png b/docs/img/getting-started/recommendation-example-mlflow.png
diff --git a/docs/img/logo.png b/docs/img/logo.png
diff --git a/docs/img/matcha-provision.gif b/docs/img/matcha-provision.gif
diff --git a/docs/img/stack-diagram.png b/docs/img/stack-diagram.png