-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyperbolic MLR function #2
Comments
Thanks for your nice words.
|
Thank you. |
Good question. First, our hyp MLR goes from hyperbolic space to Euclidean space; so if you want to use that and go back to hyperbolic space, you would need to do an additional exp_0. But I agree, you could do it in both ways. We choose the exp_0(f(log_0(x)) because, in this way, we recover the scalar-vector mobius multiplication when the matrix is scalar times identity, and we have the additional properties we described in the paper (e.g. associativity, orthogonal preservation, etc). You would probably use these properties with the MLR feed-forward layer. However, I agree it is an interesting research direction to understand which of these layers is more powerful. Also, check https://arxiv.org/pdf/1901.06033.pdf who uses in section 3.2 our MLR as the decoder layer in a VAE. |
Great. Thanks for your clear and quick responses. |
Hi! First of all, thank you for this research, it is very fascinating.
I have a question about your implementation of the hyperbolic MLR function.
In your implementation, you define it as:
2./ np.sqrt(c) * |A_mlr| * arcsinh(np.sqrt(c) * pxdota * lambda_px)
First, my question regards the l2 normalization of A_mlr in creating pxdota, why do you do this?
Second, seeing how
lambda_px = 2. / 1 - c * |minus_p_plus_x|^2
, I find it difficult to see how your implementation of the hyperbolic MLR is equivalent to the definition ofP(y=k | x) in your paper:
lambda_p * |A_mlr| / np.sqrt(c) * arcsinh(2*np.sqrt(c) * pxdota /( (1 - c * |minus_p_plus_x|^2)*|A_mlr|)
=
2./(np.sqrt(c)(1 -c* |p|^2)) * |A_mlr| * arcsinh(np.sqrt(c) * pxdota * lambda_px / |A_mlr|)
It seems to me the
1/(1-c * |p|^2)
before the asinh term, and the1/|A_mlr|
term in the asinh term are missing, but I can't figure out where they went!Does it have to do with the fact that the variable A_mlr first needs to be scaled by (lambda_0 / lambda_p) to be able to optimize it as a euclidean parameter?
I am currently writing a paper that makes extensive use of the definitions in your paper, and like to keep the implementation as close as possible to yours.
EDIT: I just realized that the l2_normalization is the implicit
1/|A_mlr|
. This just leaves the1/(1-c*|p|^2)
that is missing.The text was updated successfully, but these errors were encountered: