Grad overflow on iteration and loss "nan" when using One-bit Adam #1472
-
When I use 1-bit Adam to train my model, there are lots of "Grad overflow on iteration" and "Overflow detected. Skipping step" in my log. Eventually, the loss becomes "nan" and the training process seems blocked. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi, thanks for trying 1-bit Adam. Unfortunately the information you provided is not sufficient to determine the potential root cause, but here are some suggestions/questions:
|
Beta Was this translation helpful? Give feedback.
Hi, thanks for trying 1-bit Adam. Unfortunately the information you provided is not sufficient to determine the potential root cause, but here are some suggestions/questions: