Skip to content

Releases: Xilinx/brevitas

Release v0.11.0

10 Oct 12:31
Compare
Choose a tag to compare

Breaking Changes

  • Remove ONNX QOp export (#917)
  • QuantTensor cannot have empty metadata fields (e.g., scale, bitwidth, etc.) (#819)
  • Bias quantization now requires the specification of bit-width (#839)
  • QuantLayers do not expose quant_metadata directly. This is delegated to the proxies (#883)
  • QuantDropout has been removed (#861)
  • QuantMaxPool has been removed (#858)

Highlights

  • Support for OCP/FNUZ FP8 quantization

    • Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
    • Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, etc.)
    • Support for ONNX QDQ export
  • Support for OCP MX Quantization

    • Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
    • Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, group size, etc.)
  • New QuantTensor supports:

    • FloatQuantTensor: supports OCP FP formats and general minifloat quantization
    • GroupwiseQuantTensor: supports for OCP MX formats and general groupwise int/minifloat quantization
  • Support for Channel splitting

  • Support for HQO optimization for zero point

  • Support for HQO optimization for scale (prototype)

  • Improved SDXL entrypoint under brevitas_examples

  • Improved LLM entrypoint under brevitas_examples

    • Compatibility with accelerate
  • Prototype support for torch.compile:

    • Check PR #1006 for an example on how to use it

What's Changed

For a more comprehensive list of changes and fix, check the list below:

  • Enhance: Importing quantized models after bias correction by @costigt-dev in #868
  • Fix QCDQDecoupledWeightQuantProxyHandlerMixin return args by @costigt-dev in #870
  • Fix - Speech to text: Create an empty json file by @costigt-dev in #871
  • Feat (scaling/standalone): flag to retrieve full state dict by @Giuseppe5 in #874
  • Notebooks: makes notebooks deterministic and prints output of asserts by @fabianandresgrob in #847
  • Fix (proxy): revert value tracer change by @Giuseppe5 in #888
  • Fix (proxy): fix for attributes retrieval by @Giuseppe5 in #880
  • Feat (notebook): add example for dynamic quantization to ONNX export by @fabianandresgrob in #877
  • Fix (gpxq): handling empty tensors with GPxQ and adding unit tests by @i-colbert in #892
  • Fix (ptq): expose uint_sym_act flag and fix issue with minifloat sign by @fabianandresgrob in #898
  • Feat (minifloat): add support for user specified minifloat format by @fabianandresgrob in #821
  • Feat: Add QuantConv3d and QuantConv3dTranspose by @costigt-dev in #805
  • Add tutorial examples of per-channel quantization by @OscarSavolainenDR in #867
  • Fix (tests): revert pytest pin by @Giuseppe5 in #903
  • Remove: Remove original_cat workaround by @costigt-dev in #902
  • Infra: Update issue template by @nickfraser in #893
  • Pull Request Template by @capnramses in #885
  • Fix (core): add return in state_dict by @Giuseppe5 in #910
  • Fix (quant_tensor): fix typing and remove unused checks by @Giuseppe5 in #913
  • Fix (nn): removed unused caching in adaptive avgpool2d by @Giuseppe5 in #911
  • Fix (quant_tensor): remove unused checks by @Giuseppe5 in #918
  • Setup: pin ONNX to 1.15 due to ORT incompatibility by @Giuseppe5 in #924
  • Feat (examples): add support for Stable Diffusion XL by @Giuseppe5 in #909
  • Assert all ptq-common bit widths are positive integers by @OscarSavolainenDR in #931
  • Enhance: Quant Tensor Test by @costigt-dev in #894
  • Fix (examples/stable_diffusion): README formatting and clarification by @Giuseppe5 in #932
  • Fix (examples/ptq): fix for bitwidth check by @Giuseppe5 in #934
  • Feat: functionalize QuantTensor by @Giuseppe5 in #878
  • Feat (minifloat): cleanup minifloat impl by @Giuseppe5 in #922
  • Fix tests in dev by @Giuseppe5 in #939
  • Feat (proxy): scale computation delegated to bias proxy by @Giuseppe5 in #938
  • Fix (gpxq): adding input quant to process input by @i-colbert in #943
  • Fix (quant): propagate device and dtype in subinjector by @Giuseppe5 in #942
  • Fix (gpxq): correct variable name by @Giuseppe5 in #944
  • Fix (quant_tensor): fix AvgPool functional implementation by @Giuseppe5 in #945
  • Feat (quant_tensor): support for dim() and ndim by @Giuseppe5 in #947
  • Fix (graph/standardize): correct check for Mean to AvgPool by @Giuseppe5 in #948
  • Feat (graph/standardize): default keepdim value by @Giuseppe5 in #950
  • Fix bullet formatting in getting started guide by @timkpaine in #952
  • Fix (quant/float): correct scaling_impl and float_scaling_impl by @Giuseppe5 in #953
  • Fix/remove-numel - Remove numel is zero check from context manager exit method by @costigt-dev in #920
  • Feat (examples/ptq): support for dynamic act quant by @Giuseppe5 in #935
  • Feat (quant_tensor): support for FloatQuantTensor by @Giuseppe5 in #919
  • Fix (examples/llm): Add all rewriters to the list by @nickfraser in #956
  • Fix (core/quant/float): use eps to avoid log(0) by @Giuseppe5 in #957
  • Fix (test/actions): Excluded torch==1.9.1, platform=macos-latest tests by @nickfraser in #960
  • Adding FP8 weight export by @costigt-dev in #907
  • Fix (llm): fix device issue for eval when not using default device by @fabianandresgrob in #949
  • Fix (GPFQ): using random projection for speed up/less memory usage by @fabianandresgrob in #964
  • Fix (calibrate/minifloat): fix for act calibration by @Giuseppe5 in #966
  • Fix (quant/float): restore fix for log(0) by @Giuseppe5 in #968
  • Setup: pin numpy version by @Giuseppe5 in #974
  • Feat (minifloat): support for FNUZ variants by @Giuseppe5 in #973
  • Fix (core/float): add default for float_scaling_impl by @Giuseppe5 in #972
  • Feat (graph/equalize): upcast during equalization computation by @Giuseppe5 in #970
  • Generative improv by @Giuseppe5 in #965
  • Fix (requirements/setuptools): Set maximum requirement for setuptools by @nickfraser in #963
  • Fix: Typo fix on SDXL command line args by @nickfraser in #976
  • Fix (graph/bias_correction): Fix when layer parameters are offloaded to accelerate by @nickfraser in #962
  • Fix (ptq/bias_correction): remove unnecessary forward pass by @Giuseppe5 in #980
  • Fix (export/qonnx): Fixed symbolic kwargs order. by @nickfraser in #988
  • Various SDXL quantization fixes by @nickfraser in #977
  • Fix (brevitas_examples/sdxl): Various fixes by @Giuseppe5 in #991
  • Feat (proxy/parameter_quant): cache quant weights by @Giuseppe5 in #990
  • Docs: Added 0.10.3 release note to README. by @nickfraser in #993
  • Added some preliminary unit tests to the CNNs 'quantize_model' by @OscarSavolainenDR in #927
  • Feat (tests): extended minifloat unit tests by @alexredd99 in #979
  • Fix (proxy/runtime_quant): correct handling of mixed type quantization by @Giuseppe5 in #985
  • docs (readme): Fixed GH actions badges by @nickfraser in #996
  • Feat: Update LLM entry-point ...
Read more

Release v0.10.3

23 Jul 13:48
Compare
Choose a tag to compare

What's Changed

  • Backport: Fix (export/qonnx): Fixed symbolic kwargs order. (#988) by @nickfraser in #992
  • numpy version, onnx version and maximum setuptools version set

Full Changelog: v0.10.2...v0.10.3

Release v0.10.2

19 Feb 16:37
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.10.1...v0.10.2

Release v0.10.1

15 Feb 11:50
Compare
Choose a tag to compare

Highlights

  • A2Q+ support paper
  • A2Q+ examples with CIFAR10 and Super Resolution
  • Support for concatenation equalization for weights and activations
  • Support for GPFQ + A2Q L1 Norm bound
  • Possibility to explicitly export Q node for weights in QCDQ export
  • Support for float16 and bfloat16 for QCDQ export
  • Support for Dynamic Activation Quantization for ONNX QDQ export
  • Support for channel-splitting paper
  • (Beta) Better compatibility with Huggingface accelerate and optimum
  • (Beta) Improved support and testing for minifloat quantization

What's Changed

Full Changelog: v0.10.0...v0.10.1

A2Q+ CIFAR10 model release

12 Feb 18:17
c78f974
Compare
Choose a tag to compare
Pre-release

This release contains training code and pre-trained weights to demonstrate accumulator-aware quantization (A2Q) on an image classification task. Code is also provided to demonstrate Euclidean projection-based weight initialization (EP-init) as proposed in our paper "A2Q+: Improving Accumulator-Aware Weight Quantization".

Find the associated docs at https://github.com/Xilinx/brevitas/tree/a2q_cifar10_r1/src/brevitas_examples/imagenet_classification/a2q.

A2Q+ model release

30 Jan 19:00
17fb49e
Compare
Choose a tag to compare
A2Q+ model release Pre-release
Pre-release

A2Q+ Super Resolution Experiments with Brevitas

This release contains training code and pre-trained weights to demonstrate accumulator-aware quantization (A2Q+) as proposed in our paper "A2Q+: Improving Accumulator-Aware Weight Quantization" on a super resolution task.

Find the associated docs at https://github.com/Xilinx/brevitas/tree/super_res_r2/src/brevitas_examples/super_resolution.

Release v0.10.0

08 Dec 16:36
Compare
Choose a tag to compare

Highlights

  • Support for PyTorch up to version 2.1 .
  • Support for GPTQ PTQ algorithm.
  • Support for GPFQ PTQ algorithm.
  • Support for SmoothQuant / activation equalization PTQ algorithm.
  • Support for MSE based scale and zero-point for weights and activations.
  • Support for row-wise scaling at the input of QuantLinear.
  • Support for quantization of a slice of a weight tensor.
  • End-to-end support for learned rounding in ImageNet PTQ.
  • End-to-end example training scripts for A2Q (low precision accumulation) over superresolution.
  • Experimental support for minifloats (eXmY quantization).
  • Experimental LLM PTQ flow with support for weight-only and weight+activation quantization, together with GPTQ, AWQ and SmoothQuant.
  • Experimental Stable Diffusion PTQ flow with support for weight-only quantization.
  • Deprecated FINN ONNX export flow.
  • Update custom value_trace FX tracer to latest FX.
  • New custom variant of make_fx tracer with support for custom torch.library ops through @Wrap annotation.

What's Changed

Read more

A2Q model release

20 Sep 16:07
acf1f5d
Compare
Choose a tag to compare
A2Q model release Pre-release
Pre-release

Integer-Quantized Super Resolution Experiments with Brevitas

This release contains scripts demonstrating how to train integer-quantized super resolution models using Brevitas.
Code is also provided to demonstrate accumulator-aware quantization (A2Q) as proposed in our ICCV 2023 paper "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance".

Find the associated docs at https://github.com/Xilinx/brevitas/tree/super_res_r1/src/brevitas_examples/super_resolution .

Release v0.9.1

28 Apr 16:57
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.9.0...v0.9.1

Release v0.9.0

21 Apr 17:50
Compare
Choose a tag to compare

Highlights

Overview of changes

Graph quantization

Quantized layers

  • Initial support for QuantMultiheadAttention #568
  • Breaking change: rename Quant(Adaptive)AvgPool to Trunc(Adaptive)AvgPool by @volcacius in #562

Quantizers

QuantTensor

PTQ

Export

CI, linting

FX

Examples

For the Full Changelog please check : v0.8.0...v0.9.0