Improve Quantization #42

Quentin-Anthony · 2024-08-11T12:33:18Z

The quantization support I've added through --low-prec-bytes-per-val is a bit barebones. It'd be nice to add enough flexibility to handle per-block quantization (e.g. some only quantize the linears to int4) and some of the new formats that aren't a multiple of a byte (e.g. int4, fp6, etc)

Relevant: #36

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Quantization #42

Improve Quantization #42

Quentin-Anthony commented Aug 11, 2024 •

edited

Loading

Improve Quantization #42

Improve Quantization #42

Comments

Quentin-Anthony commented Aug 11, 2024 • edited Loading

Quentin-Anthony commented Aug 11, 2024 •

edited

Loading