Add transformers integration section #1904

younesbelkada · 2024-03-14T20:40:34Z

cc @dacorvo @SunMarc

SunMarc

Thanks for adding the transformers integration section !

quanto-introduction.md

Co-authored-by: Marc Sun <[email protected]>

dacorvo · 2024-03-15T08:36:42Z

quanto-introduction.md

+)
+```
+
+You can quantize the weights and / or activations in int8, float8, int4 or int2 by simply passing the correct argument in `QuantoConfig`. The activations can be either in int8 or float8 - note for float8 you need to have hardware that is compatible with float8 precision, otherwise quanto will silently upcast the weights and activations to torch.float32 or torch.float16 (depending on the original data type of the model) when we perform the matmul


What you describe here also happens if activations are not quantized, because mixed mm are not supported by any hardware. Note also that pytorch will raise an exception if you try to use float8 on MPS device.

Makes sense, ! I added something in the lines you suggested

dacorvo · 2024-03-15T08:38:44Z

quanto-introduction.md

+
+Quanto is device agnostic, meaning you can quantize your model regardless if you are on CPU / GPU / MPS (Apple Silicon) therefore you can run quantized models on any of these devices.
+
+Quanto is also torch.compile friendly, you can quantize a model with quanto and call `torch.compile` to the model to compile it for faster generation.


This won't work if there is some dynamic quantization is involved, ie:

QAT,

quantized activations.
QTensor cannot be created from inside a dynamo graph optimum-quanto#46
So basically for frozen models it is ok (I think this is always the case in transformers).

I added few explanations of the edge cases in b778f8d ! Also note one can change the activation type with transformers integration, so users should keep them to None for torch.compile

Update quanto-introduction.md

ce846c9

SunMarc approved these changes Mar 14, 2024

View reviewed changes

quanto-introduction.md Outdated Show resolved Hide resolved

quanto-introduction.md Outdated Show resolved Hide resolved

quanto-introduction.md Outdated Show resolved Hide resolved

quanto-introduction.md Outdated Show resolved Hide resolved

SunMarc requested a review from dacorvo March 14, 2024 20:49

younesbelkada and others added 4 commits March 15, 2024 09:29

Update quanto-introduction.md

6a44fa1

Co-authored-by: Marc Sun <[email protected]>

Update quanto-introduction.md

3c2933c

Co-authored-by: Marc Sun <[email protected]>

Update quanto-introduction.md

4314094

Co-authored-by: Marc Sun <[email protected]>

Update quanto-introduction.md

a1feeb7

Co-authored-by: Marc Sun <[email protected]>

dacorvo reviewed Mar 15, 2024

View reviewed changes

younesbelkada added 2 commits March 15, 2024 09:58

explain edge cases for torch.compile

b778f8d

more clarification on float8

b268955

younesbelkada requested a review from dacorvo March 15, 2024 09:09

dacorvo merged commit fa24d45 into DavidCorvoysier/quanto Mar 15, 2024

dacorvo deleted the younesbelkada-patch-add-quanto branch March 15, 2024 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transformers integration section #1904

Add transformers integration section #1904

younesbelkada commented Mar 14, 2024

SunMarc left a comment

dacorvo Mar 15, 2024

younesbelkada Mar 15, 2024

dacorvo Mar 15, 2024

younesbelkada Mar 15, 2024


		Quanto is device agnostic, meaning you can quantize your model regardless if you are on CPU / GPU / MPS (Apple Silicon) therefore you can run quantized models on any of these devices.

		Quanto is also torch.compile friendly, you can quantize a model with quanto and call `torch.compile` to the model to compile it for faster generation.

Add transformers integration section #1904

Add transformers integration section #1904

Conversation

younesbelkada commented Mar 14, 2024

SunMarc left a comment

Choose a reason for hiding this comment

dacorvo Mar 15, 2024

Choose a reason for hiding this comment

younesbelkada Mar 15, 2024

Choose a reason for hiding this comment

dacorvo Mar 15, 2024

Choose a reason for hiding this comment

younesbelkada Mar 15, 2024

Choose a reason for hiding this comment