Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperbolic MLR function #2

Open
dutchJSCOOP opened this issue Jan 10, 2020 · 4 comments
Open

Hyperbolic MLR function #2

dutchJSCOOP opened this issue Jan 10, 2020 · 4 comments

Comments

@dutchJSCOOP
Copy link

dutchJSCOOP commented Jan 10, 2020

Hi! First of all, thank you for this research, it is very fascinating.

I have a question about your implementation of the hyperbolic MLR function.
In your implementation, you define it as:
2./ np.sqrt(c) * |A_mlr| * arcsinh(np.sqrt(c) * pxdota * lambda_px)

First, my question regards the l2 normalization of A_mlr in creating pxdota, why do you do this?

Second, seeing how lambda_px = 2. / 1 - c * |minus_p_plus_x|^2, I find it difficult to see how your implementation of the hyperbolic MLR is equivalent to the definition of
P(y=k | x) in your paper:
lambda_p * |A_mlr| / np.sqrt(c) * arcsinh(2*np.sqrt(c) * pxdota /( (1 - c * |minus_p_plus_x|^2)*|A_mlr|)

= 2./(np.sqrt(c)(1 -c* |p|^2)) * |A_mlr| * arcsinh(np.sqrt(c) * pxdota * lambda_px / |A_mlr|)

It seems to me the 1/(1-c * |p|^2) before the asinh term, and the 1/|A_mlr| term in the asinh term are missing, but I can't figure out where they went!

Does it have to do with the fact that the variable A_mlr first needs to be scaled by (lambda_0 / lambda_p) to be able to optimize it as a euclidean parameter?

I am currently writing a paper that makes extensive use of the definitions in your paper, and like to keep the implementation as close as possible to yours.

EDIT: I just realized that the l2_normalization is the implicit 1/|A_mlr|. This just leaves the 1/(1-c*|p|^2) that is missing.

@octavian-ganea
Copy link
Contributor

octavian-ganea commented Jan 10, 2020

Thanks for your nice words.

  1. pxdota is absorbing the a_k normalization inside arcsinh of the MLR formula (eq 23 from https://papers.nips.cc/paper/7780-hyperbolic-neural-networks.pdf). So pxdota in our code is actually <-pk + x, ak/||ak||>.
  2. You are right, A_mlr is actually a'_k as denoted in our paper, which is an Euclidean param. So lambda_pk is absorbed inside ||a'_k||.

@dutchJSCOOP
Copy link
Author

Thank you.
I have a more theoretical question as well. You derive the hyperbolic MLR from the euclidean: p(y=k|x) = exp(Ax - b) = exp(f(x)). Here, f(x) = Ax -b can just be seen as a fully connected layer/ feed-forward layer (with bias) in the standard euclidean case, without the non-linearity.
I expected that the hyperbolic feed-forward layer would then simply be the hyperbolic MLR layer without the exponential (and normalization) and with the hyperbolic non-linearity.
You define a feed-forward layer as exp_0(f(log_0(x)), thus performing the matrix multiplication in the tangent space at 0 and then mapping back to the hyperbolic space.
Are these two notions equivalent? Why can you not just formulate the MLR as p(y=k|x) = exp(mobius_add(mobius_mult(x,A),b)))?
This is a bit outside the scope of a Github issue, so if you prefer I can shoot you an email.

@octavian-ganea
Copy link
Contributor

octavian-ganea commented Jan 12, 2020

Good question. First, our hyp MLR goes from hyperbolic space to Euclidean space; so if you want to use that and go back to hyperbolic space, you would need to do an additional exp_0. But I agree, you could do it in both ways. We choose the exp_0(f(log_0(x)) because, in this way, we recover the scalar-vector mobius multiplication when the matrix is scalar times identity, and we have the additional properties we described in the paper (e.g. associativity, orthogonal preservation, etc). You would probably use these properties with the MLR feed-forward layer. However, I agree it is an interesting research direction to understand which of these layers is more powerful. Also, check https://arxiv.org/pdf/1901.06033.pdf who uses in section 3.2 our MLR as the decoder layer in a VAE.

@dutchJSCOOP
Copy link
Author

Great. Thanks for your clear and quick responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants