Why don't you keep image ratio? #139

ArgoHA · 2025-01-09T14:13:43Z

Is there a reason you train D-FINE without keeping the image ratio? You just use resize function getting image to square, but usually detectors use letterbox like:

                A.LongestMaxSize(max_size=max(self.target_h, self.target_w)),
                A.PadIfNeeded(
                    min_height=self.target_h,
                    min_width=self.target_w,
                    border_mode=cv2.BORDER_CONSTANT,
                    value=(114, 114, 114),
                ),

Is there a reason why you are not doing that and I should not add it to the training pipeline?

The text was updated successfully, but these errors were encountered:

ArgoHA · 2025-01-10T16:08:40Z

I see that in torch inference code you do resize without keeping the ratio (as you do during training):

transforms = T.Compose([
        T.Resize((640, 640)),
        T.ToTensor(),
    ])

But for onnx inference you do "Resizes an image while maintaining aspect ratio and pads it". Is there a reason for that? I would assume you lose accuracy if you train on squeezed images and then keep ratio during onnx inference.

Overall I really like your work and would like to contribute. What do you think about this aspect ratios issue?

I would do this: implement aspect ratio preservation as a flag for training and inference. During inference I would also cut grey paddings to not waste time on computing 114 pixels (it worked great for me before, with several yolo models)

ArgoHA · 2025-01-11T10:17:01Z

@Peterande
Here are things I would like to work on:

Image ratio flag for train and letterbox with cut paddings for infer.
Mosaic augmentation usage during training (I see that you have code for it, but it is not being used)
Unified configs (they are too spread out, changing image size should be in 1 place I believe)
Add more metrics (Precision, Recall, F1, TPs, FPs, FNs)
Add wandb integration
Unified infer classes for Torch / TensorRT / OpenVino (preprocessing, infer, postprocessing. Get raw image, return processed results)
Ability to pick metrics that you want to save best model on.
Debug flag to save part of 1) Preprocessed images with annotations (to know what you feed your model), 2) Images with model predictions during validation (to see what model predicts during training).
Maybe clean training verbose, add something like tqdm and ETA for the whole training.

Let me know if you guys are interested in any of these or have other ideas for contribution.

HebeiFast · 2025-01-11T15:13:17Z

We are excited about these ideas! They all seem super valuable and will take our project to the next level. We're looking forward to your contributions with great anticipation.

lz1004 · 2025-01-13T08:23:07Z

@ArgoHA I also noticed the aspect ratio issue. Let me know if I can help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why don't you keep image ratio? #139

Why don't you keep image ratio? #139

ArgoHA commented Jan 9, 2025

ArgoHA commented Jan 10, 2025

ArgoHA commented Jan 11, 2025

HebeiFast commented Jan 11, 2025

lz1004 commented Jan 13, 2025

Why don't you keep image ratio? #139

Why don't you keep image ratio? #139

Comments

ArgoHA commented Jan 9, 2025

ArgoHA commented Jan 10, 2025

ArgoHA commented Jan 11, 2025

HebeiFast commented Jan 11, 2025

lz1004 commented Jan 13, 2025