Skip to content

Commit

Permalink
Fix typo in the GroupNorm description (onnx#6358)
Browse files Browse the repository at this point in the history
### Description
<!-- - Describe your changes. -->

- Fix typo in the description of the GroupNorm op regarding the shape of
scale and bias.
- As discussed with @gramalingam and @AlexandreEichenberger, it's better
to mark the older opset 18 version as deprecated since it contained a
mistake.
- Also add a missing type cast in the numpy reference implementation of
QuantizeLinear for FP4.

Signed-off-by: Yuan Yao <[email protected]>
  • Loading branch information
yuanyao-nv authored Sep 11, 2024
1 parent 001550a commit 444e894
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 42 deletions.
40 changes: 3 additions & 37 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -21477,7 +21477,7 @@ This version of the operator has been available since version 18 of the default
<dd>Constrain input and output types to all numeric tensor types.</dd>
</dl>

### <a name="GroupNormalization-18"></a>**GroupNormalization-18**</a>
### <a name="GroupNormalization-18"></a>**GroupNormalization-18** (deprecated)</a>

A GroupNormalization function. Carries out group normalization as described in
the paper https://arxiv.org/abs/1803.08494
Expand All @@ -21497,41 +21497,7 @@ This version of the operator has been available since version 18 of the default

#### Version

This version of the operator has been available since version 18 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>epsilon</tt> : float (default is 1e-05)</dt>
<dd>The epsilon value to use to avoid division by zero.</dd>
<dt><tt>num_groups</tt> : int (required)</dt>
<dd>The number of groups of channels. It should be a divisor of the number of channels `C`.</dd>
</dl>

#### Inputs

<dl>
<dt><tt>X</tt> (differentiable) : T</dt>
<dd>Input data tensor. Dimensions for image cases are `(N x C x H x W)`, where `N` is the batch size, `C` is the number of channels, and `H` and `W` are the height and width of the data. Statistics are computed for every group of channels over `C`, `H`, and `W`. For non-image cases, the dimensions are in the form of `(N x C x D1 x D2 ... Dn)`.</dd>
<dt><tt>scale</tt> (differentiable) : T</dt>
<dd>Scale tensor of shape `(num_groups)`.</dd>
<dt><tt>bias</tt> (differentiable) : T</dt>
<dd>Bias tensor of shape `(num_groups)`.</dd>
</dl>

#### Outputs

<dl>
<dt><tt>Y</tt> (differentiable) : T</dt>
<dd>The output tensor of the same shape as `X`.</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
<dd>Constrain input and output types to float tensors.</dd>
</dl>
This version of the operator has been deprecated since version 18 of the default ONNX operator set.

### <a name="LpPool-18"></a>**LpPool-18**</a>

Expand Down Expand Up @@ -24864,7 +24830,7 @@ This version of the operator has been available since version 21 of the default
y = scale * (x - mean) / sqrt(variance + epsilon) + bias,
```
where the mean and variance are computed per instance per group of channels, and
`scale` and `bias` should be specified for each group of channels. The number of
`scale` and `bias` should be specified for each channel. The number of
groups `num_groups` should be divisible by the number of channels so that there are
an equal number of channels per group.

Expand Down
2 changes: 1 addition & 1 deletion docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -11736,7 +11736,7 @@ expect(
y = scale * (x - mean) / sqrt(variance + epsilon) + bias,
```
where the mean and variance are computed per instance per group of channels, and
`scale` and `bias` should be specified for each group of channels. The number of
`scale` and `bias` should be specified for each channel. The number of
groups `num_groups` should be divisible by the number of channels so that there are
an equal number of channels per group.

Expand Down
2 changes: 1 addition & 1 deletion onnx/defs/nn/defs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2699,7 +2699,7 @@ This operator transforms input according to
y = scale * (x - mean) / sqrt(variance + epsilon) + bias,
```
where the mean and variance are computed per instance per group of channels, and
`scale` and `bias` should be specified for each group of channels. The number of
`scale` and `bias` should be specified for each channel. The number of
groups `num_groups` should be divisible by the number of channels so that there are
an equal number of channels per group.

Expand Down
1 change: 1 addition & 0 deletions onnx/defs/nn/old.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4020,6 +4020,7 @@ ONNX_OPERATOR_SET_SCHEMA(
GroupNormalization,
18,
OpSchema()
.Deprecate()
.SetDoc(GroupNormalization_ver18_doc)
.Attr("epsilon", "The epsilon value to use to avoid division by zero.", AttributeProto::FLOAT, 1e-5f)
.Attr(
Expand Down
2 changes: 1 addition & 1 deletion onnx/reference/ops/op_quantize_linear.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ def _run( # noqa: PLR0911
if tensor_type == TensorProto.FLOAT4E2M1:
x += zero_point
f4 = subbyte.float32_to_float4e2m1_unpacked(x)
return (f4,) # type: ignore[attr-defined]
return (f4.astype(float4e2m1),) # type: ignore[attr-defined]

raise ValueError(
f"Unexpected type: output_dtype={tensor_type} is not a supported quantized type."
Expand Down
4 changes: 2 additions & 2 deletions onnx/test/version_converter/automatic_upgrade_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1718,8 +1718,8 @@ def test_BitwiseXor(self) -> None:
def test_GroupNormalization(self) -> None:
self._test_op_upgrade(
"GroupNormalization",
18,
[[3, 4, 2, 2], [1], [1]],
21,
[[3, 4, 2, 2], [4], [4]],
[[3, 4, 2, 2]],
attrs={"epsilon": 1e-5, "num_groups": 2},
)
Expand Down

0 comments on commit 444e894

Please sign in to comment.