The Scripts contained in this repository were tested with MLFlow version 1.7.0
:
pip install mlflow==1.7.0 scikit-learn pandas numpy scipy boto3
The file train.py
provides a training process by applying Scikit-learn Elastic Net Model, and save the model's metadata and artifact locally.
Execute train.py
to perform a local training job:
python train.py
The results of experiments will save locally, at ./mlruns
by default
Perform another model training job by passing different parameters executing python train.py <alpha> <l1_ratio>
. e.g.:
python train.py 0.3 0.7
Host a Tracking Server by CLI commands:
mlflow server
# or
mlflow ui
The Tracking Server will serve the local files at ./mlruns
and runs in the background.
Visit MLFlow Tracking Server's Web Dashboard UI at http://127.0.0.1:5000
, try to compare the parameters and metrics pairs
Pick a local model to serve: Get the run_id
by viewing the dashboard, then execute with CLI commands mlflow models serve
:
mlflow models serve -m runs:/<run_id>/model --no-conda
The --no-conda
option is recommended, so the Inference Server will serve with the dependencies with local resources rather than Anaconda.
[NOTE]
If the Tracking Server is still running and port
5000
is occupied, try to add-p
option and assign a different port number
Method 2: Build a Docker Image with specific model and Serve the Inference Server as Docker Container
Execute with CLI command mlflow models build-docker
:
mlflow models build-docker -m "runs:/<run_id>/model" -n "mlflow-model:latest"
Then serve with docker run
:
docker run \
-it -d \
-p 5000:5000 \
--name mlflow-model \
mlflow-model:latest
curl --location --request POST 'http://127.0.0.1:5005/invocations' \
--header 'Content-Type: application/json' \
--header 'format: pandas-split' \
--data-raw '{"columns":["alcohol","chlorides","citric acid","density","fixed acidity","free sulfur dioxide","pH","residual sugar","sulphates","total sulfur dioxide","volatile acidity"],"data":[[12.8,0.029,0.48,0.98,6.2,29,3.33,1.2,0.39,75,0.66],[12.8,0.029,0.48,0.98,6.2,29,3.33,1.2,0.39,75,0.66]]}'
import requests
import json
url = "http://127.0.0.1:5005/invocations"
payload = "{\"columns\":[\"alcohol\",\"chlorides\",\"citric acid\",\"density\",\"fixed acidity\",\"free sulfur dioxide\",\"pH\",\"residual sugar\",\"sulphates\",\"total sulfur dioxide\",\"volatile acidity\"],\"data\":[[12.8,0.029,0.48,0.98,6.2,29,3.33,1.2,0.39,75,0.66],[12.8,0.029,0.48,0.98,6.2,29,3.33,1.2,0.39,75,0.66]]}"
headers = {
'Content-Type': 'application/json',
'format': 'pandas-split'
}
r = requests.request("POST", url, headers=headers, data=payload)
print(json.loads(r.text))
The train_tracking.py
add 2 features:
- Set remote Tracking Server, and save metadata and artifacts into remote environment's filesystem
- Set the name of experiment to separate different projects
In Python, we can set remote Tracking Server by using mlflow.set_tracking_uri()
function, refers to line 30:
mlflow.set_tracking_uri(os.getenv("MLFLOW_TRACKING_URI",
"http://127.0.0.1:5000"))
So set Tracking Server URI by assigning environment MLFLOW_TRACKING_URI
:
export MLFLOW_TRACKING_URI=<URI>
# Verify
echo $MLFLOW_TRACKING_URI
And if the environment variable is unset, it will assign http://127.0.0.1:5000
as Tracking Server URI.
In order to separate different projects, use mlflow.set_experiment() function
, refers to line 32:
mlflow.set_experiment("Elastic Net")
Start a Tracking Server first:
mlflow server
Now we can visit the Tracking Server Web UI dashboard, it runs with no record found.
Execute train_tracking.py
to perform a training job:
python train_tracking.py
Then refresh the webpage, the training result shows up immediately.
[NOTE] The artifact may not be uploaded (but no error returns from Tracking Server) for unknown reason. Try the method in Chapter 3.
The complete tracking platform has three components:
- MLFlow Tracking Server
- PostgreSQL Database: A popular open-source relational database, can be used to store the models' metadata
- MinIO: Amazon S3-compatible Object Storage, used to store the artifacts of the models
With docker-compose.yml
file, web can run the services with Docker Compose:
# Prepare local directories as bind-mount persistent volumes
mkdir -p ./data/minio/mlflow
mkdir -p ./data/postgres
# Launch services
docker-compose up -d
[NOTE]
We don't run MLFlow Tracking Server along with other components due to the lack of the Healthcheck mechanism. The MLFlow Tracking Server should be initialized after the Database created, so without healthchecking, we can't use
depends_on
keyword to assign the right order to run the services and causes the failure.
Use docker-compose ps
commands to verify if the services become stable.
Build a MLFlow Docker Image, contains:
psycopg2
: PostgreSQL Database Driverboto3
: AWS Python SDK
Build with docker build
commands with Dockerfile
file:
docker build -t mlflow:v1.7.0 .
Then run as Docker Container:
docker run -it -d \
--network="mlflow_net" \
-e MLFLOW_S3_ENDPOINT_URL=http://minio:9000 \
-e AWS_ACCESS_KEY_ID=minioadmin \
-e AWS_SECRET_ACCESS_KEY=minioadmin \
-p 5000:5000 \
--name mlflow \
mlflow:v1.7.0 \
server --host="0.0.0.0" \
--backend-store-uri="postgresql+psycopg2://mlflow:mlflow@postgres/mlflow" \
--default-artifact-root="s3://mlflow"
train_tracking_s3_psql.py
add some new features:
- On line 22, set environment variables to redirect the S3 URL to out self-hosted MinIO object storage
- On line 23 and line 24, set S3 credentials
So simply execute the script:
python train_tracking_s3_psql.py
Now, when we log the results with MLFlow Tracking Server, the metadata are saved into PostgreSQL Database, and model artifacts are saved into MinIO object storage automatically.
Visit MLFlow Tracking Server and view the specific model detail page to get the URI of the model, the format should be like s3://bucket/<experiment_id>/<run_id>/artifacts/model
. Then export environment variables and server the model with commands:
export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
mlflow models serve -m s3://mlflow/<experiment_id>/<run_id>/artifacts/model --no-conda
[NOTE]
If the Tracking Server is still running and port
5000
is occupied, try to add-p
option and assign a different port number
H2OAutoML.ipynb
Illustates the step-by-step guide to perform an AutoML Training Job with MLFlow Tracking Server, enjoy it!