You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to reimplement the layers proposed by the Hyperbolic Neural Networks paper. We use float64 instead of float32 for the entire model and inputs. Hence, we avoid numerical instability. However, if we do not clamp the inputs to the tanh functions between (-15, 15), the network does not seem to train at all. It would be great if you could provide a reason for doing this and for picking the value of 15.
PS: I really liked the paper and thank you for making the code available.
The text was updated successfully, but these errors were encountered:
I am not fully sure, but I believe the reason is that exp(15) is very large and exp(-15) is very small, so, in this range, small updates in the exponent can result in very large/tiny fluctuations which can lead to numerical instabilities, overflow/underflow. Moreover, gradient flow might be very noisy (for exp(15)) or 0 (for exp(-15)), which can mean learning signal is very poor. Anyways, MAXINT32 ~= exp(21).
Using float64 would help, but would double the training speed ....
One solution to this problem is to always gyro-translate your embeddings to have 0 mean. Or increase dimensionality (see https://arxiv.org/abs/1804.03329 ).
hyperbolic_nn/util.py
Line 26 in 45be2f6
We are trying to reimplement the layers proposed by the Hyperbolic Neural Networks paper. We use float64 instead of float32 for the entire model and inputs. Hence, we avoid numerical instability. However, if we do not clamp the inputs to the tanh functions between (-15, 15), the network does not seem to train at all. It would be great if you could provide a reason for doing this and for picking the value of 15.
PS: I really liked the paper and thank you for making the code available.
The text was updated successfully, but these errors were encountered: