vllm + api方式调用失败 #1144

xiaohaiqing · 2024-03-13T02:47:09Z

起始日期 | Start Date

No response

实现PR | Implementation PR

启动方式：

python -m fastchat.serve.controller > controller.log 2>&1 &

python -m fastchat.serve.vllm_worker --model-path /home/qwen/lora/Qwen-7B-Chat-Int4/ --tensor-parallel-size 1 --trust-remote-code --dtype float16  > model_worker.log 2>&1 &

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8001 > api_server.log 2>&1 &

调用：

摘要 | Summary

无

基本示例 | Basic Example

无

缺陷 | Drawbacks

worker报错：

api服务报错：

未解决问题 | Unresolved questions

请问这是什么原因导致的呢？

jklj077 · 2024-03-13T13:06:50Z

For the error raised by the worker, you may need to downgrade fschat<0.2.36 and vllm<0.2.7. Unfortunately, FastChat 0.2.36 adopts a quick but dirty way to realize compatibility with vLLM 0.2.7 (https://github.com/lm-sys/FastChat/blob/b21d0f780ca4472a13714262a0790f2ee1ade659/fastchat/serve/vllm_worker.py#L60). As QwenTokenizer uses custom code, the change in FastChat introduces an unexpected behaviour (seemingly only) for Qwen.

For the error raised by the api server, it is similar to #1062

xiaohaiqing added the question Further information is requested label Mar 13, 2024

jklj077 closed this as completed Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm + api方式调用失败 #1144

vllm + api方式调用失败 #1144

xiaohaiqing commented Mar 13, 2024

jklj077 commented Mar 13, 2024

vllm + api方式调用失败 #1144

vllm + api方式调用失败 #1144

Comments

xiaohaiqing commented Mar 13, 2024

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

jklj077 commented Mar 13, 2024