Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training log and loss curve #4

Open
Jarvis73 opened this issue Aug 14, 2023 · 3 comments
Open

Training log and loss curve #4

Jarvis73 opened this issue Aug 14, 2023 · 3 comments

Comments

@Jarvis73
Copy link

Nice work!
I've been training MaskDiT on my own dataset recently, but I'm unsure if the loss is decreasing normally because I'm not seeing satisfactory results in the generated images. Can you please provide the log records of training on ImageNet 256x256? Or the loss curve.

Thanks very much!

@Jarvis73 Jarvis73 changed the title Training loss curve Training log and loss curve Aug 14, 2023
@devzhk
Copy link
Contributor

devzhk commented Aug 14, 2023

Hi,

Thanks for your interest in our work. I don't know how much it will help, but here is the training loss curve. How much different is your dataset from ImageNet256?
training_loss_mask_training

@Jarvis73
Copy link
Author

Hi, devzhk.

I am training MaskDiT on a self-collected text2image dataset (cross-attention is used to inject text condition). My loss is similar to the curve you provided. However, after training for about 70 epochs, the loss starts to have NaN values and it becomes more and more frequent as the training progresses. I used PyTorch's mixed precision. Have you encountered this situation before?

@devzhk
Copy link
Contributor

devzhk commented Aug 18, 2023

Hi,

Maybe try to turn off mixed precision for debugging? If disabling mixed precision works, then try to diable AMP for each individual module to see which one is causing the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants