You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In general, you want to initialize neural net outputs to the mean of the target distribution, so the model doesn't need to waste time learning the mean. See "init well" here. For standard diffusion training, the target distribution is zero-centered so the network outputs should be zero at init.
Hi, Why zero initialize
patch_out
in hourglass transformer? It makes output zero in beginning, what's the intuition of it?The text was updated successfully, but these errors were encountered: