Why clamp inputs to all tanh calls? #1

dhruvdcoder · 2019-04-24T06:09:54Z

Line 26 in 45be2f6

return tf.tanh(tf.minimum(tf.maximum(x, -MAX_TANH_ARG), MAX_TANH_ARG))

We are trying to reimplement the layers proposed by the Hyperbolic Neural Networks paper. We use float64 instead of float32 for the entire model and inputs. Hence, we avoid numerical instability. However, if we do not clamp the inputs to the tanh functions between (-15, 15), the network does not seem to train at all. It would be great if you could provide a reason for doing this and for picking the value of 15.

PS: I really liked the paper and thank you for making the code available.

octavian-ganea · 2019-04-24T08:32:10Z

I am not fully sure, but I believe the reason is that exp(15) is very large and exp(-15) is very small, so, in this range, small updates in the exponent can result in very large/tiny fluctuations which can lead to numerical instabilities, overflow/underflow. Moreover, gradient flow might be very noisy (for exp(15)) or 0 (for exp(-15)), which can mean learning signal is very poor. Anyways, MAXINT32 ~= exp(21).

Using float64 would help, but would double the training speed ....

One solution to this problem is to always gyro-translate your embeddings to have 0 mean. Or increase dimensionality (see https://arxiv.org/abs/1804.03329 ).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why clamp inputs to all tanh calls? #1

Why clamp inputs to all tanh calls? #1

dhruvdcoder commented Apr 24, 2019

octavian-ganea commented Apr 24, 2019 •

edited

Loading

Why clamp inputs to all tanh calls? #1

Why clamp inputs to all tanh calls? #1

Comments

dhruvdcoder commented Apr 24, 2019

octavian-ganea commented Apr 24, 2019 • edited Loading

octavian-ganea commented Apr 24, 2019 •

edited

Loading