-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADA-SVR (3/4) PR example of models #100
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #100 +/- ##
==========================================
- Coverage 81.70% 81.66% -0.04%
==========================================
Files 43 43
Lines 2312 2318 +6
==========================================
+ Hits 1889 1893 +4
- Misses 423 425 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for opening this. I would try to minimize the communication about this method, as it not rigorous.
examples/_utils/plot_dataset.py
Outdated
vmax=1.0, | ||
): | ||
""" | ||
Plot for the confidence in the hipothse that the variables are important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plot for the confidence in the hipothse that the variables are important. | |
Plot the variable importance p-values |
examples/_utils/plot_dataset.py
Outdated
plt.subplots_adjust(top=1.0, bottom=0.2) | ||
|
||
|
||
def plot_pvalue_H0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better: plot_p_values
examples/_utils/plot_dataset.py
Outdated
- vmax: Maximum value of the colorbar (float) | ||
|
||
Returns: | ||
- a figure with 3 subplots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
examples/methods/ada_svr.py
Outdated
# | ||
# **Advantages**: | ||
# | ||
# - The method is fast because it uses the central limit theorem to estimate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually fast because it uses linear regression to get the uncertainty estimate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CLT is simply there to strengthen some Gaussian assumption. It is here for the sake of theory only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can say it's fast because of it's a broad estimate of the weight distribution.
fix typo errors Co-authored-by: bthirion <[email protected]>
… into PR_example_ADA_SVR
examples/methods/ada_svr.py
Outdated
# - The method assumes that the distribution of the coefficients of SVR is normal centred around zeros. | ||
# - The method is not valid for small sample sizes. | ||
# - The method has all the disadvantages of linear models: only for linear | ||
# relationships, not good predicting performance, unintuitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unintuitive:
Following the explanation of Molnar, the interpretation of the weight can be counter-intuitive when the dimension is higher than 4. (see the last paragraph of linear model )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you means that the model is "non-sparse", making it harder to figure out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, my understanding of the argument of the Molnar is that when the dimension is higher than 4, it's quite impossible to image the hyperplane provided by the weights.
From Molnar's book : "The interpretation of a weight can be unintuitive because it depends on all other features. A feature with high positive correlation with the outcome y and another feature might get a negative weight in the linear model, because, given the other correlated feature, it is negatively correlated with y in the high-dimensional space. Completely correlated features make it even impossible to find a unique solution for the linear equation. An example: You have a model to predict the value of a house and have features like number of rooms and size of the house. House size and number of rooms are highly correlated: the bigger a house is, the more rooms it has. If you take both features into a linear model, it might happen, that the size of the house is the better predictor and gets a large positive weight. The number of rooms might end up getting a negative weight, because, given that a house has the same size, increasing the number of rooms could make it less valuable or the linear equation becomes less stable, when the correlation is too strong."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but there are 2 ways to consider the problem:
- if your data is high-dimensional (dimension > 4) you cannot reliably interpret the weights ---which is True or False , depending on the correlation between features. But then, interpretability is not related to the model, only to the data...
- if your model is sparse, i.e. can explain y with very few features, it becomes interpretable again. A Lasso regression or a regression tree are "sparse" in that sense. However, this is not a good idea, because the sparse solution rached by the model may not be significantly better than other sparse solutions.
Overall, I think that this is a bad argument. In any case, it is not related to ada-svr.
I used it only as an example for creating a template for all the other methods. |
pvalue, pvalue_corrected, one_minus_pvalue, one_minus_pvalue_correlation = ( | ||
ada_svr_pvalue(beta_hat, scale) | ||
) | ||
plot_pvalue_H0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to have plotting functions as black boxes. See examples from sklearn and nilearn, where this pattern is avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scikit-learn and nilearn has two different politics for plotting.
Nilearn has a submodule "plotting" where most of the functions for plotting are implemented. In general the only plot which doesn't use these functions is because there are simple, such as a bar plot or line plots, or because there are too specific, such as a specific dataset.
Scikit-learn doesn't have a specific submodule for plotting results. Consequence, most of the plot is not in a black box. There are few exceptions, which are mainly for complex plotting.
I have difficulty potting a trade-off between the readability of the notebook and using a black box function.
The main difference that I identify is that, instead of creating a full figure, the function creates a plot on a specific axis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion:
- Having predefined functions makes it harder for users to tweak the plotting script to adapt the figure to their constraints (publication template, presentation ... )
- The plotting functionalities are currently not too heavy; it wouldn't reduce the readability of the notebook to include them.
- There is redundancy in plotting coefficients, p-values, and corrected p-values. Unless you want to highlight what one of these quantities adds on top of the others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you propose to only plot corrected p-values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general question: Is using a different template for this method than LOCO / PI / CPI desirable?
A key difference is that it does not use data splits (train/test) which makes .fit
and .score
unecessary, but also makes the syntax inconsistent across methods.
I would suggest to at least maximize the consistency between methods that do not require data splitting (ada-svr, knockoffs ...)
plt.subplots_adjust(top=1.0, hspace=0.3) | ||
|
||
|
||
def plot_pvalue_H1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If plotting functions are kept, I think this one should be merged with plot_H0
Also one_minus_pvalue
should not be a required argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "one_minus_pvalue should not be a required argument." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plot_pvalue_H1
takes as arguments (
pvalue,
pvalue_corrected,
one_minus_pvalue,
one_minus_pvalue_corrected)
I one_minus_pvalue could simply be computed inside the function to simplify function calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping it because it was the output of the existing function in stat_tools.
By looking more in details, I think we can remove the one_minus_pvalue and one_minus_pvalue_corrected of these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, why should we do that ? This was here for numerical reasons...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to keep them in memory if we provide a function which deals with this numerical error.
From the code point of view, the computation of 1-pvalue is always the same and it can be seen as a duplicate line of code. I prefer to propose a function than the value.
We can discuss it in issue #107.
pvalue, pvalue_corrected, one_minus_pvalue, one_minus_pvalue_correlation = ( | ||
ada_svr_pvalue(beta_hat, scale) | ||
) | ||
plot_pvalue_H0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion:
- Having predefined functions makes it harder for users to tweak the plotting script to adapt the figure to their constraints (publication template, presentation ... )
- The plotting functionalities are currently not too heavy; it wouldn't reduce the readability of the notebook to include them.
- There is redundancy in plotting coefficients, p-values, and corrected p-values. Unless you want to highlight what one of these quantities adds on top of the others
See the issue #104 for the discussion around this topic. |
Note that Ada-SVR is different from Knockoffs and CPI: it is variable importance estimator tied to a given model (essentially a linear regression). In some sense, it is comparable to the desparsified Lasso. Again, let me reiterate that Ada-SVR is not something we want to put forward. I think that we should not base our APÏ discussions on it. |
fixe typo Co-authored-by: Joseph Paillard <[email protected]>
I want to propose it in order to have a discussion around the documentation of the model through an example.
The main file of this pull request is examples/methods/ada_svr.py
It composes multiple sections:
I would like to generalise to all the models in the library.
You should consider improving the organisation of the files as well as the plots.
Do you prefer to have an issue with talking about it or the discussion on this topic can stay in this PR?