Question about discrite sampling in RT-DETRv2 #515

SebastianJanampa · 2024-12-14T23:17:27Z

Hello,
I love your work. I have a question regarding the discrete sampling.

You state on the paper:

we propose an optional discrete_sample operator to replace the grid_sample, thus removing the deployment constraints of RT-DETR. Specifically, we perform a rounding operation on the predicted sampling offsets, omitting the time-consuming bilinear interpolation.

Could you please explain how much faster the model is when you use discrete sampling?

Also, what is the difference between your proposed grid_sampling and torch.grid_sample with interpolation = nearest?

The text was updated successfully, but these errors were encountered:

SebastianJanampa · 2024-12-14T23:20:49Z

Also, why is there a deployment constraint when torch.grid_sample is used?

lyuwenyu · 2024-12-15T13:21:15Z

When you deploy with TensorRT, grid_sample is only supported in versions above 8.5. Therefore, it is necessary for some old devices and scenarios where software cannot be upgraded.

And other proprietary inference engines for some npu may not support grid_sample as well.

So RTDETRv2 is necessary for the community.

For specific speed, it is related to specific device and software. We only give the difference of theoretical calculation.

SebastianJanampa · 2024-12-15T20:05:59Z

Thank you so much for your well-explained answer.

If, for deployment, you used discrete sampling, why is the model not trained using torch.grid_sample(..., mode='nearest')

nicklasb · 2025-01-05T18:22:48Z

@lyuwenyu :
I have exactly the issue with grid_sampler not being supported (by ESP-DL).
However, when I am trying cross_attn_method: discrete, I am getting this issue:

  File "C:\Users\Nickl\Projects\RT-DETR\rtdetrv2_pytorch\tools\..\src\zoo\rtdetr\rtdetrv2_decoder.py", line 156, in forward
    output = self.ms_deformable_attn_core(value, value_spatial_shapes, sampling_locations, attention_weights, self.num_points_list)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nickl\Projects\RT-DETR\rtdetrv2_pytorch\tools\..\src\zoo\rtdetr\utils.py", line 124, in deformable_attention_core_func_v2
    sampling_value_l: torch.Tensor = value_l[s_idx, :, sampling_coord[..., 1], sampling_coord[..., 0]] # n l c
                                     ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 68 is out of bounds for dimension 2 with size 60

Note that I had no issues with the cross_attn_method: default.

Note also that I have modified the library for 1-channel grayscale images, but I don't think that I have broken it in this detailed way, and as late in the process as in the transformer.

nicklasb · 2025-01-05T19:28:00Z

Adding separate clamping for width and height made it work for me (didn't seem to affect performance, only ran a tuning epoch though, just to be sure):

            sampling_coord[..., 0] = sampling_coord[..., 0].clamp(0, w - 1)  # Clamp width
            sampling_coord[..., 1] = sampling_coord[..., 1].clamp(0, h - 1)  # Clamp height

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about discrite sampling in RT-DETRv2 #515

Question about discrite sampling in RT-DETRv2 #515

SebastianJanampa commented Dec 14, 2024

SebastianJanampa commented Dec 14, 2024

lyuwenyu commented Dec 15, 2024 •

edited

Loading

SebastianJanampa commented Dec 15, 2024

nicklasb commented Jan 5, 2025 •

edited

Loading

nicklasb commented Jan 5, 2025

Question about discrite sampling in RT-DETRv2 #515

Question about discrite sampling in RT-DETRv2 #515

Comments

SebastianJanampa commented Dec 14, 2024

SebastianJanampa commented Dec 14, 2024

lyuwenyu commented Dec 15, 2024 • edited Loading

SebastianJanampa commented Dec 15, 2024

nicklasb commented Jan 5, 2025 • edited Loading

nicklasb commented Jan 5, 2025

lyuwenyu commented Dec 15, 2024 •

edited

Loading

nicklasb commented Jan 5, 2025 •

edited

Loading