You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We empirically find that the last layer in mochi is quite sensitive so we leave it in full precision. We do not know whether this holds true for general DiT.
why was last DiT block of Mochi Skipped in Sageattnetion Example? do you suggest to skip last blocks for better accuracy?
The text was updated successfully, but these errors were encountered: