Replies: 2 comments
-
Hi! Any update on this error? I replicated it using ValueError: .half() is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been casted to the correct dtype. |
Beta Was this translation helpful? Give feedback.
0 replies
-
@ocesp98, @rubenCrayon can you please try the updated zero-inference with 4bit quantization? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is it possible to use deepspeed inference with a 4/8-bit quantized model using bitsandbytes?
I use the bitsandbytes package like this:
However, it throws an error:
The ultimate goal is to combine the quantization with deepspeed zero infinity offload in the hope to run a larger model that currently does not fit on my GPU.
Beta Was this translation helpful? Give feedback.
All reactions