-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain the monotonically decreasing part #108
Comments
First of all, I advice you to read paper: https://arxiv.org/abs/1801.07698 Notice To make things easier to explain (and it also for better for Metric-Learning), we are normalizing all features before final layer by L2 norm and also the weight of final layer are normalized by L2 norm. Then multiplying features and weights is just In this cosine plot we are interested in interval 0-180. Then it look exactly the same like in ArcFace paper. What does This is just one way of explaining, there are many more. This is my explanation which come from studding this topic for a while. It is not perfect, it would need a blog post to explain all idea behind it. |
Great explanation. In a nutshell, the increasing part of the curve has opposite gradients.. This means that increasing curves will push features away from class center! I'm sure you don't want such property... |
@melgor Thank you for your thorough response. It clarified a lot of things. I just have this question: in the first plot, it seems that the author is only drawing cos(theta) in the range 0 to 180. but the target logit is ||W|| ||X|| cos(theta) so the logit is not only dependent on the cosine function, but also the multiplication of ||w|| and ||x|| |
@amirhfarzaneh |
Can anybody please explain why ψ should be monotonically decreasing for every interval?
The text was updated successfully, but these errors were encountered: