Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about discrite sampling in RT-DETRv2 #515

Open
SebastianJanampa opened this issue Dec 14, 2024 · 5 comments
Open

Question about discrite sampling in RT-DETRv2 #515

SebastianJanampa opened this issue Dec 14, 2024 · 5 comments

Comments

@SebastianJanampa
Copy link

Hello,
I love your work. I have a question regarding the discrete sampling.

You state on the paper:

we propose an optional discrete_sample operator to replace the grid_sample, thus removing the deployment constraints of RT-DETR. Specifically, we perform a rounding operation on the predicted sampling offsets, omitting the time-consuming bilinear interpolation.

Could you please explain how much faster the model is when you use discrete sampling?

Also, what is the difference between your proposed grid_sampling and torch.grid_sample with interpolation = nearest?

@SebastianJanampa
Copy link
Author

Also, why is there a deployment constraint when torch.grid_sample is used?

@lyuwenyu
Copy link
Owner

lyuwenyu commented Dec 15, 2024

When you deploy with TensorRT, grid_sample is only supported in versions above 8.5. Therefore, it is necessary for some old devices and scenarios where software cannot be upgraded.

And other proprietary inference engines for some npu may not support grid_sample as well.

So RTDETRv2 is necessary for the community.


For specific speed, it is related to specific device and software. We only give the difference of theoretical calculation.

@SebastianJanampa
Copy link
Author

Thank you so much for your well-explained answer.

If, for deployment, you used discrete sampling, why is the model not trained using torch.grid_sample(..., mode='nearest')

@nicklasb
Copy link

nicklasb commented Jan 5, 2025

@lyuwenyu :
I have exactly the issue with grid_sampler not being supported (by ESP-DL).
However, when I am trying cross_attn_method: discrete, I am getting this issue:

  File "C:\Users\Nickl\Projects\RT-DETR\rtdetrv2_pytorch\tools\..\src\zoo\rtdetr\rtdetrv2_decoder.py", line 156, in forward
    output = self.ms_deformable_attn_core(value, value_spatial_shapes, sampling_locations, attention_weights, self.num_points_list)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Nickl\Projects\RT-DETR\rtdetrv2_pytorch\tools\..\src\zoo\rtdetr\utils.py", line 124, in deformable_attention_core_func_v2
    sampling_value_l: torch.Tensor = value_l[s_idx, :, sampling_coord[..., 1], sampling_coord[..., 0]] # n l c
                                     ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 68 is out of bounds for dimension 2 with size 60

Note that I had no issues with the cross_attn_method: default.

Note also that I have modified the library for 1-channel grayscale images, but I don't think that I have broken it in this detailed way, and as late in the process as in the transformer.

@nicklasb
Copy link

nicklasb commented Jan 5, 2025

Adding separate clamping for width and height made it work for me (didn't seem to affect performance, only ran a tuning epoch though, just to be sure):

            sampling_coord[..., 0] = sampling_coord[..., 0].clamp(0, w - 1)  # Clamp width
            sampling_coord[..., 1] = sampling_coord[..., 1].clamp(0, h - 1)  # Clamp height

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants