-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Val loss improvement #1903
base: sd3
Are you sure you want to change the base?
Val loss improvement #1903
Conversation
kohya-ss
commented
Jan 27, 2025
•
edited
Loading
edited
- train/eval state for the network and the optimizer.
- stable timesteps
- stable noise
- support block swap
I love the approach to holding the rng_state aside, setting the validation state using the validation seed, and then restoring the rng_state afterwards. It's much more elegant than tracking the state separately and has no overhead. |
I would also add that once this is put in place, there won't be a need for a moving average to track the validation loss. Using consistent timesteps and noise will make it almost entirely stable, so displaying the mean of the validation loss amounts for each validation run should be all that's needed. Since the validation set is subject to change if the core dataset changes, I've found tracking the validation loss relative to the initial loss is also helpful to make progress across different training runs comparable. |
This looks great! What are you using for formatting the code? I've been manually formatting but might be easier to align the formatting if I use the same formatting tool. |
That makes sense. Currently, there is a problem viewing logs in TensorBoard, but I would like to at least get the mean of the validation loss to be displayed correctly.
For formatting, I use black with the |
It seems that correction for timestep sampling works better (I previously used debiased 1/√SNR, which is similar in meaning). Additionally, I have some thoughts on the args. |
https://github.com/[spacepxl/demystifying-sd-finetuning](https://github.com/spacepxl/demystifying-sd-finetuning) |
This makes some sense.
Although it means giving multiple meanings to a single setting value, it is worth considering. |
@gesen2egee you would need a different fit equation for each new model, and it's not really relevant when you make validation fully deterministic. I've tried applying it to training loss and it was extremely harmful. You can also visualize the raw training loss by plotting it like so: That was done by storing all loss and timestep values, and coloring them by training step. Not sure if there's a way to do that natively in tensorboard/wandb, I did this with matplotlib and just logged it as an image. |
In get_timesteps maybe if min_timestep < max_timestep:
timesteps = torch.randint(min_timestep, max_timestep, (b_size,), device="cpu")
else:
timesteps = torch.ones(b_size, device="cpu") * min_timestep I know this isn't completed but I tried it anyways. |