Does DeepSpeed Inference support offload params to cpu #3760
Replies: 3 comments
-
Exactly same error i am getting whenever i am trying to take inference from vicuna33b. |
Beta Was this translation helpful? Give feedback.
-
It seems DS Inference only support tensor parallel. I just try use training mode with zero offload. But it runs very slow. Here is my ds config used: |
Beta Was this translation helpful? Give feedback.
-
I just want to run llama 30b inference on A10, for which params exceed gpu memory capacity.
It seems InferenceEngine does not process zero related arguments.
Then I try to use deepspeed.initialize with zero offload, it will always trigger cuda oom.
Beta Was this translation helpful? Give feedback.
All reactions