You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quantizing a model using dynamic quantization will produce subgraphs like this for quantized matmuls:
In text:
Y, Y_zero, Y_scale = DynamicQuantizeLinear(X)
Y = MatMulInteger(Y, W_quant, Y_zero, W_zero)
Y = Cast(Y, to: f32)
Z = Mul(Y_scale, W_scale)
Y = Mul(Y, Z)
Where the W_* values are constants.
The Cast here can obviously be fused into the MatMulInteger op, as could the final Mul op of the float matrix by Z scale vector.
The text was updated successfully, but these errors were encountered:
Quantizing a model using dynamic quantization will produce subgraphs like this for quantized matmuls:
In text:
Where the
W_*
values are constants.The
Cast
here can obviously be fused into theMatMulInteger
op, as could the finalMul
op of the float matrix byZ
scale vector.The text was updated successfully, but these errors were encountered: