-
-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Bounding boxes classes performances #525
Comments
|
Thanks for your reply. I'm thinking of using DINOv2 as backbone. Will it be an easy task to do ? Thanks again |
Yes, you just need to register new backbone using see details https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetrv2_pytorch/src/nn/backbone/hgnetv2.py#L272 Then replace old one with your registered module name in config |
Great, thanks I will try to do that. Implementation Issues with TimmModel and HGNetv2 BackbonesHGNetv2 ImplementationConfiguration# rtdetrv2_r50vd.yml
RTDETR:
backbone: HGNetv2
encoder: HybridEncoder
decoder: RTDETRTransformerv2
# rtdetrv2_r18vd_120e_coco.yml
HGNetv2:
name: L Error
Error location: TimmModel ImplementationConfiguration# rtdetrv2_r50vd.yml
RTDETR:
backbone: TimmModel
encoder: HybridEncoder
decoder: RTDETRTransformerv2
# rtdetrv2_r18vd_120e_coco.yml
TimmModel:
name: resnet34
return_layers: ['layer2', 'layer4'] Error
Error location: The assertion error suggests a mismatch between the number of feature maps being returned and the expected number of input channels in the encoder. Would you like help resolving these issues, particularly with the TimmModel implementation? |
And this line should adapt to specific bakcbone.
|
ViT and HybridEncoder Compatibility AnalysisThanks, it finally worked. I tried to use Vision Transformer (ViT) architecture as backbone with TimmModel, but it seems like its output is not compatible with HybridEncoder expectations. Here is the summary of what I understood: HybridEncoder Expectations
ViT Last 3 Layers Output
Mismatch Issues1. Dimensional Structure
2. Channel Progression
3. Spatial Resolution
I'm trying to adapt ViT outputs. But I think the adaptation might not be optimal because: 1- ViT's strength lies in global attention2- Forcing spatial structure might lose the global relationship information3- The original feature hierarchy of ResNet is fundamentally different from ViT's feature representationCan you please confirm that? And if there is a way to make them compatible. Thanks a lot! |
Yes, I think you are right. One possible solution is to add an extra adaptation module. You can reference this paper. |
Ok thanks very much. I will check it. |
RT-DETR v2 Training Issues with Custom Dataset
I'm currently training RT-DETR v2 (PyTorch implementation) on a custom dataset. While the model performs well at detecting bounding boxes and their coordinates, it's showing suboptimal performance in class identification.
Questions
1. Class Performance Emphasis
Is there a way to adjust the training process to put more emphasis on classification performance?
2. Separate Classification Model
I noticed there's a dedicated classification task in the codebase:
Would training a separate classification model improve the overall performance?
3. Performance Improvement
What are some recommended approaches to improve the model's class identification accuracy?
The text was updated successfully, but these errors were encountered: