Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting all NAN masks when training on custom data #92

Open
sankarip opened this issue Nov 27, 2024 · 1 comment
Open

Getting all NAN masks when training on custom data #92

sankarip opened this issue Nov 27, 2024 · 1 comment

Comments

@sankarip
Copy link

I'm tuning tuning the obj365 pth with my custom data using the command torchrun --master_port=7777 --nproc_per_node=1 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml --use-amp --seed=0 -t dfine_s_obj365.pth

I'm getting an error: assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
This comes from the masks being all NAN. Any ideas on how to fix this? I've tried lowering learning rate, but that had no effect.

@DmitriiSavin
Copy link

DmitriiSavin commented Nov 27, 2024

Hi!
I have experienced a similar issue after a couple of training epoches.
Removing --use-amp from the command fixed it

#43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants