We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When trying to run the image "modularai/llama-3.1" using the MAX serve container with the CPU it crashes with the error:
ValueError: not enough values to unpack (expected 4, got 1)
Launch the container with:
export HUGGING_FACE_HUB_TOKEN=<YOUR-TOKEN> docker run --rm \ --ipc=host \ --name MAX \ -p 8000:8000 \ -it \ --entrypoint /bin/bash \ --env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \ --env "HF_HUB_ENABLE_HF_TRANSFER=1" \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \ modular/max-openai-api:24.6.0 \
and then once inside the tty in the container run:
python pipelines.py serve --huggingface-repo-id modularai/llama-3.1 --max-num-steps 10 --max-cache-batch-size 10 --max-length 2048
Everything works fine until now but If you try to benchmark with the script provided in the tutorial like:
python benchmark_serving.py \ --backend modular \ --base-url http://localhost:8000 \ --endpoint /v1/completions \ --model modularai/llama-3.1 \ --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ --num-prompts 1
Then the error occurs on the server.
Tested both on Intel Xeon Gold 6238 and on AMD EPYC 9124 using the container.
The text was updated successfully, but these errors were encountered:
Thanks for reporting! it shouldn't error in CPU. We'll fix it.
Sorry, something went wrong.
No branches or pull requests
Bug description
When trying to run the image "modularai/llama-3.1" using the MAX serve container with the CPU it crashes with the error:
ValueError: not enough values to unpack (expected 4, got 1)
Steps to reproduce
Launch the container with:
and then once inside the tty in the container run:
python pipelines.py serve --huggingface-repo-id modularai/llama-3.1 --max-num-steps 10 --max-cache-batch-size 10 --max-length 2048
Everything works fine until now but If you try to benchmark with the script provided in the tutorial like:
Then the error occurs on the server.
System information
The text was updated successfully, but these errors were encountered: