Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move scale_weight_norms inside sync_gradients #1908

Draft
wants to merge 2 commits into
base: sd3
Choose a base branch
from

Conversation

rockerBOO
Copy link
Contributor

@rockerBOO rockerBOO commented Jan 28, 2025

scale_weight_norms was happening every step vs every gradient step (like with gradient accumulation) causing larger gradient_accumulation_steps to process the scaling many more times without any changes to the weights.

I also moved the other sync_gradients outside the accumulation inside the accumulation and merged the 2. Possibly not the correct approach but felt it was appropriate for them to happen inside that accumulation but they might have the same end result. This is for sampling images and saving per step.

This PR has some formatting that should be applied with another PR for validation loss improvements #1903. Will probably wait for that PR to go through before this one to align the formatting changes from black.

A limited test but scale_weight_norm = 2.5. Assessments done with 1 epoch so a better measurement might prove different results.

With this PR
1 epoch, GA steps: 4 = 2:35 epoch

With sd3 branch
1 epoch, GA steps: 4 = 2:51 epoch

GA: 8, 2:26 vs 2:44

Different dataset

GA: 64, 5:08 vs 5:52
GA: 114 (half batch), 5:04 vs 5:50

Untitled-2025-01-28-1526(2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant