Skip to content

Latest commit

 

History

History
553 lines (378 loc) · 13.8 KB

modes.md

File metadata and controls

553 lines (378 loc) · 13.8 KB

DJL Serving Operation Modes

Overview

DJL Serving is a high-performance serving system for deep learning models. DJL Serving supports models with:

  1. Python Mode
  2. Java Mode
  3. Binary Mode

Also see the options for model configurations.

Python Mode

This section walks through how to serve Python based model with DJL Serving.

Define a Model

To get started, implement a python source file named model.py as the entry point. DJL Serving will run your request by invoking a handle function that you provide. The handle function should have the following signature:

def handle(inputs: Input)

If there are other packages you want to use with your script, you can include a requirements.txt file in the same directory with your model file to install other dependencies at runtime. A requirements.txt file is a text file that contains a list of items that are installed by using pip install. You can also specify the version of an item to install.

If you don't want to install package from internet, you can bundle the python installation wheel in the model directory and install the package from model directory:

./local_wheels/ABC-0.0.2-py3-none-any.whl

Packaging

DJL Serving supports model artifacts in model directory, .zip or .tar.gz format.

To package model artifacts in a .zip:

cd /path/to/model
zip model.zip *

To package model artifacts in a .tar.gz:

cd /path/to/model
tar -czvf model.tar.gz *

Serving Example

Let's run an example where we load a model in Python mode and run inference using the REST API.

Step 1: Define a Model

In this example, we will use the resnet18 model in the djl-demo repo.

The example provides a model.py that implements a handle function.

def handle(inputs: Input):
    """
    Default handler function
    """
    if not _service.initialized:
        # stateful model
        _service.initialize(inputs.get_properties())

    if inputs.is_empty():
        # initialization request
        return None

    return _service.inference(inputs)

It also provides a requirements.txt that loads torchvision 0.12.0:

torchvision==0.12.0

To get this model, clone the djl-demo repo if you haven't done so yet. Then, we can package model artifacts in .zip or .tar.gz.

git clone https://github.com/deepjavalibrary/djl-demo.git
cd djl-demo/djl-serving/python-mode/
zip -r resnet18.zip resnet18

Step 2: Start Server

Next, start DJL Serving and load this model at startup.

Linux/macOS
djl-serving -m resnet::Python=file://$PWD/resnet18.zip

Or we can load directly from model directory:

djl-serving -m resnet::Python=file://$PWD/resnet18
Windows
path-to-your\serving.bat -m "resnet::Python=file:///%cd%\resnet18.zip"

Or we can load directly from model directory:

path-to-your\serving.bat -m "resnet::Python=file:///%cd%\resnet18"

This will launch the DJL Serving Model Server, bind to port 8080, and create an endpoint named resnet with the model.

After the model is loaded, we can start making inference requests.

Step 3: Inference

To query the model using the prediction API, open another session and run the following command:

Linux/macOS
curl -O https://resources.djl.ai/images/kitten.jpg
curl -X POST "http://127.0.0.1:8080/predictions/resnet" -T "kitten.jpg"

On Windows, you can just download the image and use Postman to send POST request.

This should return the following result:

[
  {
    "tabby":0.4552347958087921,
    "tiger_cat":0.3483535945415497,
    "Egyptian_cat":0.15608155727386475,
    "lynx":0.026761988177895546,
    "Persian_cat":0.002232028404250741
  }
]

Java Mode

This section walks through how to serve model in Java mode with DJL Serving.

Translator

The Translator is a Java interface defined in DJL for pre/post-processing.

You can use a built-in DJL TranslatorFactory by configuring translatorFactory in serving.properties.

Or you can build your own custom Translator. Your Translator should have the following signature:

public void prepare(TranslatorContext ctx);

public NDList processInput(TranslatorContext ctx, I input) throws Exception;

public String processOutput(TranslatorContext ctx, NDList list) throws Exception;

Provide Model File

Next, you need to include a model file. DJL Serving supports model artifacts for the following engines:

  • MXNet
  • PyTorch (torchscript only)
  • TensorFlow
  • ONNX

You can also include any required artifacts in the model directory. For example, ImageClassificationTranslator may need a synset.txt file, you can put it in the same directory with your model file to define the labels.

Packaging

To package model artifacts in a .zip:

cd /path/to/model
zip model.zip *

To package model artifacts in a .tar.gz:

cd /path/to/model
tar -czvf model.tar.gz *

Serving Example (NLP)

Let's run an example where we load a NLP model in Java mode and run inference using the REST API.

Step 1: Download Model File

In this example, we will use the HuggingFace Bert QA model.

First, if you haven't done so yet, clone the DJL repo.

DJL provides a HuggingFace model converter utility to convert a HuggingFace model to Java:

git clone https://github.com/deepjavalibrary/djl.git
cd djl/extensions/tokenizers
python -m pip install -r src/main/python/requirements.txt
python src/main/python/model_zoo_importer.py -m deepset/bert-base-cased-squad2

This will generate a zip file into your local folder:

model/nlp/question_answer/ai/djl/huggingface/pytorch/deepset/bert-base-cased-squad2/0.0.1/bert-base-cased-squad2.zip

The .zip file contains a serving.properties file that defines the engine, translatorFactory and so on.

engine=PyTorch
option.modelName=bert-base-cased-squad2
option.mapLocation=true
translatorFactory=ai.djl.huggingface.translator.QuestionAnsweringTranslatorFactory
includeTokenTypes=True

Step 2: Start Server

Next, start DJL Serving and load this model at startup.

Linux/macOS
djl-serving -m bert-base-cased-squad2=file://$PWD/bert-base-cased-squad2.zip
Windows
path-to-your\serving.bat -m "bert-base-cased-squad2=file:///%cd%\bert-base-cased-squad2.zip"

This will launch the DJL Serving Model Server, bind to port 8080, and create an endpoint named bert-base-cased-squad2 with the model.

Step 3: Inference

To query the model using the prediction API, open another session and run the following command:

curl -k -X POST http://127.0.0.1:8080/predictions/bert-base-cased-squad2 -H "Content-Type: application/json" \
    -d '{"question": "How is the weather", "paragraph": "The weather is nice, it is beautiful day"}'

This should return the following result:

nice

Serving Example (CV)

Let's run an example where we load a CV model in Java mode and run inference using the REST API.

Step 1: Download Model File

In this example, we will use a PyTorch resnet18 model.

curl -O https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/pytorch/resnet/0.0.1/resnet18.zip

The .zip file contains a serving.properties file that defines the engine, translatorFactory and so on.

application=cv/image_classification
engine=PyTorch
option.modelName=resnet18
width=224
height=224
centerCrop=True
applySoftmax=true
option.mapLocation=true
translatorFactory=ai.djl.modality.cv.translator.ImageClassificationTranslatorFactory

Step 2: Start Server

Next, start DJL Serving and load this model at startup.

Linux/macOS
djl-serving -m resnet::PyTorch=file://$PWD/resnet18.zip
Windows
path-to-your\serving.bat -m "resnet::PyTorch=file:///%cd%\resnet18.zip"

This will launch the DJL Serving Model Server, bind to port 8080, and create an endpoint named resnet with the model.

Step 3: Inference

To query the model using the prediction API, open another session and run the following command:

Linux/macOS
curl -O https://resources.djl.ai/images/kitten.jpg
curl -X POST "http://127.0.0.1:8080/predictions/resnet" -T "kitten.jpg"

This should return the following result:

[
  {
    "className": "n02124075 Egyptian cat",
    "probability": 0.5183261632919312
  },
  {
    "className": "n02123045 tabby, tabby cat",
    "probability": 0.1956063210964203
  },
  {
    "className": "n02123159 tiger cat",
    "probability": 0.1955675184726715
  },
  {
    "className": "n02123394 Persian cat",
    "probability": 0.03224767744541168
  },
  {
    "className": "n02127052 lynx, catamount",
    "probability": 0.02553771249949932
  }
]

Serving Example (Custom Translator)

Let's run an example where we load a model in Java mode and run inference using the REST API.

Step 1: Download Model File

In this example, we will use a PyTorch resnet18 model.

mkdir resnet18 && cd resnet18
curl https://resources.djl.ai/test-models/traced_resnet18.pt -o resnet18.pt
curl -O https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/pytorch/synset.txt

Step 2: Define a Custom Translator

Next, we need to prepare a custom Translator. In this example, we will use CustomTranslator.java in the djl-demo repo.

We need to copy the Translator to the libs/classes folder.

mkdir -p resnet18/libs/classes
git clone https://github.com/deepjavalibrary/djl-demo.git
cp djl-demo/djl-serving/java-mode/devEnv/src/main/java/CustomTranslator.java resnet18/libs/classes

Step 3: Start Server

Next, start DJL Serving and load this model at startup.

Linux/macOS
djl-serving -m resnet::PyTorch=file://$PWD/resnet18
Windows
path-to-your\serving.bat -m "resnet::PyTorch=file:///%cd%\resnet18"

This will launch the DJL Serving Model Server, bind to port 8080, and create an endpoint named resnet with the model.

Step 4: Inference

To query the model using the prediction API, open another session and run the following command:

Linux/macOS
curl -O https://resources.djl.ai/images/kitten.jpg
curl -X POST "http://127.0.0.1:8080/predictions/resnet" -T "kitten.jpg"

This should return the following result:

[
  {
    "className": "n02124075 Egyptian cat",
    "probability": 0.5183261632919312
  },
  {
    "className": "n02123045 tabby, tabby cat",
    "probability": 0.1956063210964203
  },
  {
    "className": "n02123159 tiger cat",
    "probability": 0.1955675184726715
  },
  {
    "className": "n02123394 Persian cat",
    "probability": 0.03224767744541168
  },
  {
    "className": "n02127052 lynx, catamount",
    "probability": 0.02553771249949932
  }
]

Binary Mode

This section walks through how to serve model in binary mode with DJL Serving.

Binary mode doesn't support pre-processing and post-processing. DJLServing only accept Tensor (NDList/npy/npz) as input and output.

Provide Model File

For Binary Mode, you just need to place the model file in a folder.

DJL Serving supports model artifacts for the following engines:

  • MXNet
  • PyTorch (torchscript only)
  • TensorFlow
  • ONNX

Packaging

To package model artifacts in a .zip:

cd /path/to/model
zip model.zip *

To package model artifacts in a .tar.gz:

cd /path/to/model
tar -czvf model.tar.gz *

Serving Example

Let's run an example where we load a model in binary mode and run inference using the REST API.

Step 1: Download Model File

In this example, we will use a PyTorch resnet18 model.

mkdir resnet18 && cd resnet18
curl https://resources.djl.ai/test-models/traced_resnet18.pt -o resnet18.pt

We can package model artifacts in .zip or .tar.gz. In this example, we package the model in a .zip file.

cd resnet18
zip resnet18.zip *

Step 2: Start Server

Next, start DJL Serving and load this model at startup.

Linux/macOS

djl-serving -m resnet::Python=file://$PWD/resnet18.zip

Or we can load directly from model directory:

djl-serving -m resnet::PyTorch=file://$PWD/resnet18

Windows

path-to-your\serving.bat -m "resnet::PyTorch=file:///%cd%\resnet18.zip"

Or we can load directly from model directory:

path-to-your\serving.bat -m "resnet::PyTorch=file:///%cd%\resnet18"

This will launch the DJL Serving Model Server, bind to port 8080, and create an endpoint named resnet with the model.

Step 3: Inference

DJLServing in binary mode currently accepting NDList/Numpy (.npz) encoded input data. The returned data is always falls into NDList encoding.

You can use DJL API to create NDList and serialize the NDList to bytes as the input.

Direct Inference
# download a sample ndlist encoded data
curl -O https://resources.djl.ai/benchmark/inputs/ones_1_3_224_224.ndlist
curl -X POST "http://127.0.0.1:8080/predictions/resnet" \
    -T "ones_1_3_224_224.ndlist" \
    -H "Content-type: tensor/ndlist" \
    -o "out.ndlist"
Python Client Inference

You can also define a Python client to run inference.

From inference.py, the following is a code snippet to illustrate how to run inference in Python.

data = np.zeros((1, 3, 224, 224), dtype=np.float32)
outfile = TemporaryFile()
np.savez(outfile, data)
_ = outfile.seek(0)

response = http.request('POST',
	'http://localhost:8080/predictions/resnet',
	headers={'Content-Type':'tensor/npz'},
	body=outfile.read())

Run the inference.py to see how it interacts with the server in python:

python inference.py

Users are required to build their own client to do encoding/decoding.