Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing notation for training and test data as well as for target variable #41

Open
napsternxg opened this issue Apr 3, 2019 · 1 comment

Comments

@napsternxg
Copy link

I have a minor comment about the mathematical notation in the post.

Throughout the post you have used X to mean test points and Y as training points. E.g. in start of section Posterior Distribution you introduce:

First, we form the joint distribution P_{X,Y} between the test points X and the training points Y.

This was a bit confusing to met at first as in the general ML literature y is usually the target variable. However, here Y is the test data, but I couldn't understand how to go from the observation of independent variables of the test data to the target (unobserved) variable for the test data . I like the notation in one of the notebooks to be easier to read where they say that the distribution is actually of the vector P_{f_a, f_b} which is easier to understand. It also makes it easier to see how we can use it for prediction for unknown values of observed independent variables x.

Also the code in this blog also follows the prediction view with y referring to the target variable.

I have tried to understand GPs before and everytime it has been hard because of the confusing notation and the generalization to infinite dimensional Gaussian which are used by conditioning on the observed data. I think the distill.pub article can really help in breaking the notation issue in explaining GPs.

Finally, this image has been the most descriptive explanation of GPs for me. Source
image

@grtlr
Copy link
Contributor

grtlr commented Apr 10, 2019

Thank you very much for your comment! Thinking about it, I have to agree that this can be confusing. On the other hand this is somewhat of a standard notation for Gaussian processes. If you look at the (very nice) figure that you provided, they actually use the same notation—if I'm not mistaken. I will leave this issue open for now, as it might help others. And maybe we can even adjust the notation in the future to make things clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants