Name		Name	Last commit message	Last commit date
parent directory ..
architectures		architectures
cli		cli
coder		coder
dataprocessing		dataprocessing
llama3		llama3
llama_vision		llama_vision
mistral		mistral
nn		nn
pixtral		pixtral
replit		replit
README.md		README.md
__init__.py		__init__.py
magic.lock		magic.lock
pipelines.py		pipelines.py
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

README.md

MAX Pipelines

These are end-to-end pipelines that demonstrate the power of MAX for accelerating common AI workloads, and more. Each of the supported pipelines can be served via an OpenAI-compatible endpoint.

MAX can also serve most PyTorch-based large language models that are present on Hugging Face, although not at the same performance as native MAX Graph versions.

Pipelines

Highly optimized MAX Graph implementations exist for several core model architectures. These include:

Llama 3.1: A text completion pipeline using the Llama 3.1 model, implemented using the MAX Graph API. This pipeline contains everything needed to run a self-hosted large language model in the LlamaForCausalLM family with state-of-the-art serving throughput.
Mistral: Support for the MistralForCausalLM family of text completion models, by default using the Mistral NeMo 12B model. This pipeline has been tuned for performance using the MAX Graph API.
Replit Code: Code generation via the Replit Code V1.5 3B model, implemented using the MAX Graph API.
DeepSeek Coder: Code generation via the Deepseek Coder V1.5 7B model, implemented using the MAX Graph API.

Instructions for how to run each pipeline can be found in their respective subdirectories, along with all configuration parameters. A shared driver is used to execute the pipelines.

Usage

The easiest way to try out any of the pipelines is with our Magic command-line tool.

Install Magic on macOS and Ubuntu with this command:
```
curl -ssL https://magic.modular.com | bash
```
Then run the source command that's printed in your terminal.

To see the available commands, you can run magic --help. Learn more about Magic here.
Clone the MAX examples repository:

If you don't already have a local clone of this repository, create one via:
```
git clone https://github.com/modularml/max.git
```
The following instructions assume that you're present within this directory, and you can change to it after cloning:
```
cd max/pipelines/python/
```

Now run one of the text completion demos with any of following commands:

magic run llama3 --prompt "I believe the meaning of life is"
magic run replit --prompt "def fibonacci(n):"
magic run mistral --prompt "Why is the sky blue?"

Host a chat completion endpoint via MAX Serve.

MAX Serve provides functionality to host performant OpenAI compatible endpoints using the FastAPI framework.

You can configure a pipeline to be hosted by using the --serve argument. For example:

magic run llama3 --serve

A request can be submitted via a cURL command.

curl -N http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "modularai/llama-3.1",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"}
    ]
}'

Additionally, finetuned weights hosted on Hugging Face can be used with one of these optimized pipeline architectures when serving via the serve command:

magic run serve --huggingface-repo-id=modularai/llama-3.1

Verified Hugging Face model architectures

If you provide a repository ID for a Hugging Face large language model that does not currently have an optimized MAX Graph implementation, MAX falls back to serving a PyTorch eager version of the model.

The following table lists the model architectures tested to work with MAX.

Architecture	Example Model
AquilaForCausalLM	BAAI/Aquila-7B
ChatGLMModel	THUDM/chatglm3-6b
GPT2LMHeadModel	openai-community/gpt2
GPTJForCausalLM	EleutherAI/gpt-j-6b
LlamaForCausalLM	meta-llama/Llama-3.2-3B-Instruct
LlamaForCausalLM	Skywork/Skywork-o1-Open-Llama-3.1-8B
LlamaForCausalLM	deepseek-ai/deepseek-coder-1.3b-instruct
PhiForCausalLM	microsoft/phi-2
Phi3ForCausalLM	microsoft/Phi-3-mini-4k-instruct
Qwen2ForCausalLM	Qwen/Qwen2.5-1.5B-Instruct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python

python

README.md

MAX Pipelines

Pipelines

Usage

Verified Hugging Face model architectures

Files

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

MAX Pipelines

Pipelines

Usage

Verified Hugging Face model architectures