QTIP Quantization Support? #2663
Labels
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
I've recently seen this new 2-4 bit quant format that provides significantly better results than the current int4 ones in TensorRT-LLM, so I wanted to make sure you were aware of it! Perhaps it could be integrated?
https://arxiv.org/pdf/2406.11235
cc @Tracin
The text was updated successfully, but these errors were encountered: