Move scale_weight_norms inside sync_gradients #1908

rockerBOO · 2025-01-28T20:07:30Z

scale_weight_norms was happening every step vs every gradient step (like with gradient accumulation) causing larger gradient_accumulation_steps to process the scaling many more times without any changes to the weights.

I also moved the other sync_gradients outside the accumulation inside the accumulation and merged the 2. Possibly not the correct approach but felt it was appropriate for them to happen inside that accumulation but they might have the same end result. This is for sampling images and saving per step.

This PR has some formatting that should be applied with another PR for validation loss improvements #1903. Will probably wait for that PR to go through before this one to align the formatting changes from black.

A limited test but scale_weight_norm = 2.5. Assessments done with 1 epoch so a better measurement might prove different results.

With this PR
1 epoch, GA steps: 4 = 2:35 epoch

With sd3 branch
1 epoch, GA steps: 4 = 2:51 epoch

GA: 8, 2:26 vs 2:44

Different dataset

GA: 64, 5:08 vs 5:52
GA: 114 (half batch), 5:04 vs 5:50

rockerBOO added 2 commits January 28, 2025 15:04

Move scale_weight_norms inside sync_gradients

2517dfd

Move optimizer below sync_gradients

3703fb3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move scale_weight_norms inside sync_gradients #1908

Move scale_weight_norms inside sync_gradients #1908

rockerBOO commented Jan 28, 2025 •

edited

Loading

Move scale_weight_norms inside sync_gradients #1908

Are you sure you want to change the base?

Move scale_weight_norms inside sync_gradients #1908

Conversation

rockerBOO commented Jan 28, 2025 • edited Loading

rockerBOO commented Jan 28, 2025 •

edited

Loading