Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why don't you keep image ratio? #139

Open
ArgoHA opened this issue Jan 9, 2025 · 4 comments
Open

Why don't you keep image ratio? #139

ArgoHA opened this issue Jan 9, 2025 · 4 comments

Comments

@ArgoHA
Copy link

ArgoHA commented Jan 9, 2025

Is there a reason you train D-FINE without keeping the image ratio? You just use resize function getting image to square, but usually detectors use letterbox like:

                A.LongestMaxSize(max_size=max(self.target_h, self.target_w)),
                A.PadIfNeeded(
                    min_height=self.target_h,
                    min_width=self.target_w,
                    border_mode=cv2.BORDER_CONSTANT,
                    value=(114, 114, 114),
                ),

Is there a reason why you are not doing that and I should not add it to the training pipeline?

@ArgoHA
Copy link
Author

ArgoHA commented Jan 10, 2025

I see that in torch inference code you do resize without keeping the ratio (as you do during training):

transforms = T.Compose([
        T.Resize((640, 640)),
        T.ToTensor(),
    ])

But for onnx inference you do "Resizes an image while maintaining aspect ratio and pads it". Is there a reason for that? I would assume you lose accuracy if you train on squeezed images and then keep ratio during onnx inference.

Overall I really like your work and would like to contribute. What do you think about this aspect ratios issue?

I would do this: implement aspect ratio preservation as a flag for training and inference. During inference I would also cut grey paddings to not waste time on computing 114 pixels (it worked great for me before, with several yolo models)

@ArgoHA
Copy link
Author

ArgoHA commented Jan 11, 2025

@Peterande
Here are things I would like to work on:

  1. Image ratio flag for train and letterbox with cut paddings for infer.
  2. Mosaic augmentation usage during training (I see that you have code for it, but it is not being used)
  3. Unified configs (they are too spread out, changing image size should be in 1 place I believe)
  4. Add more metrics (Precision, Recall, F1, TPs, FPs, FNs)
  5. Add wandb integration
  6. Unified infer classes for Torch / TensorRT / OpenVino (preprocessing, infer, postprocessing. Get raw image, return processed results)
  7. Ability to pick metrics that you want to save best model on.
  8. Debug flag to save part of 1) Preprocessed images with annotations (to know what you feed your model), 2) Images with model predictions during validation (to see what model predicts during training).
  9. Maybe clean training verbose, add something like tqdm and ETA for the whole training.

Let me know if you guys are interested in any of these or have other ideas for contribution.

@HebeiFast
Copy link
Collaborator

We are excited about these ideas! They all seem super valuable and will take our project to the next level. We're looking forward to your contributions with great anticipation.

@lz1004
Copy link

lz1004 commented Jan 13, 2025

@ArgoHA I also noticed the aspect ratio issue. Let me know if I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants