Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADA-SVR (3/4) PR example of models #100

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

lionelkusch
Copy link
Collaborator

@lionelkusch lionelkusch commented Dec 26, 2024

It composes multiple sections:

  • Import section
  • Generate toy dataset
  • Usage methods
  • Plot the results
  • Principle of the methods
  • Assumptions, Advantages and Disadvantages

I would like to generalise to all the models in the library.

You should consider improving the organisation of the files as well as the plots.

Do you prefer to have an issue with talking about it or the discussion on this topic can stay in this PR?

Copy link

codecov bot commented Dec 26, 2024

Codecov Report

Attention: Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.

Project coverage is 81.66%. Comparing base (b811f2e) to head (6e2d4c9).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/hidimstat/ada_svr.py 92.85% 1 Missing ⚠️
src/hidimstat/permutation_test.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #100      +/-   ##
==========================================
- Coverage   81.70%   81.66%   -0.04%     
==========================================
  Files          43       43              
  Lines        2312     2318       +6     
==========================================
+ Hits         1889     1893       +4     
- Misses        423      425       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for opening this. I would try to minimize the communication about this method, as it not rigorous.

examples/_utils/plot_dataset.py Outdated Show resolved Hide resolved
examples/_utils/plot_dataset.py Outdated Show resolved Hide resolved
vmax=1.0,
):
"""
Plot for the confidence in the hipothse that the variables are important.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Plot for the confidence in the hipothse that the variables are important.
Plot the variable importance p-values

plt.subplots_adjust(top=1.0, bottom=0.2)


def plot_pvalue_H0(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better: plot_p_values

- vmax: Maximum value of the colorbar (float)

Returns:
- a figure with 3 subplots
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None

#
# **Advantages**:
#
# - The method is fast because it uses the central limit theorem to estimate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually fast because it uses linear regression to get the uncertainty estimate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CLT is simply there to strengthen some Gaussian assumption. It is here for the sake of theory only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can say it's fast because of it's a broad estimate of the weight distribution.

examples/methods/ada_svr.py Outdated Show resolved Hide resolved
examples/methods/ada_svr.py Outdated Show resolved Hide resolved
examples/methods/ada_svr.py Outdated Show resolved Hide resolved
hidimstat/ada_svr.py Outdated Show resolved Hide resolved
# - The method assumes that the distribution of the coefficients of SVR is normal centred around zeros.
# - The method is not valid for small sample sizes.
# - The method has all the disadvantages of linear models: only for linear
# relationships, not good predicting performance, unintuitive.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintuitive:
Following the explanation of Molnar, the interpretation of the weight can be counter-intuitive when the dimension is higher than 4. (see the last paragraph of linear model )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you means that the model is "non-sparse", making it harder to figure out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my understanding of the argument of the Molnar is that when the dimension is higher than 4, it's quite impossible to image the hyperplane provided by the weights.

From Molnar's book : "The interpretation of a weight can be unintuitive because it depends on all other features. A feature with high positive correlation with the outcome y and another feature might get a negative weight in the linear model, because, given the other correlated feature, it is negatively correlated with y in the high-dimensional space. Completely correlated features make it even impossible to find a unique solution for the linear equation. An example: You have a model to predict the value of a house and have features like number of rooms and size of the house. House size and number of rooms are highly correlated: the bigger a house is, the more rooms it has. If you take both features into a linear model, it might happen, that the size of the house is the better predictor and gets a large positive weight. The number of rooms might end up getting a negative weight, because, given that a house has the same size, increasing the number of rooms could make it less valuable or the linear equation becomes less stable, when the correlation is too strong."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but there are 2 ways to consider the problem:

  • if your data is high-dimensional (dimension > 4) you cannot reliably interpret the weights ---which is True or False , depending on the correlation between features. But then, interpretability is not related to the model, only to the data...
  • if your model is sparse, i.e. can explain y with very few features, it becomes interpretable again. A Lasso regression or a regression tree are "sparse" in that sense. However, this is not a good idea, because the sparse solution rached by the model may not be significantly better than other sparse solutions.

Overall, I think that this is a bad argument. In any case, it is not related to ada-svr.

@lionelkusch
Copy link
Collaborator Author

Thx for opening this. I would try to minimize the communication about this method, as it not rigorous.

I used it only as an example for creating a template for all the other methods.

hidimstat/ada_svr.py Outdated Show resolved Hide resolved
pvalue, pvalue_corrected, one_minus_pvalue, one_minus_pvalue_correlation = (
ada_svr_pvalue(beta_hat, scale)
)
plot_pvalue_H0(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to have plotting functions as black boxes. See examples from sklearn and nilearn, where this pattern is avoided.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scikit-learn and nilearn has two different politics for plotting.

Nilearn has a submodule "plotting" where most of the functions for plotting are implemented. In general the only plot which doesn't use these functions is because there are simple, such as a bar plot or line plots, or because there are too specific, such as a specific dataset.

Scikit-learn doesn't have a specific submodule for plotting results. Consequence, most of the plot is not in a black box. There are few exceptions, which are mainly for complex plotting.

I have difficulty potting a trade-off between the readability of the notebook and using a black box function.
The main difference that I identify is that, instead of creating a full figure, the function creates a plot on a specific axis.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion:

  • Having predefined functions makes it harder for users to tweak the plotting script to adapt the figure to their constraints (publication template, presentation ... )
  • The plotting functionalities are currently not too heavy; it wouldn't reduce the readability of the notebook to include them.
  • There is redundancy in plotting coefficients, p-values, and corrected p-values. Unless you want to highlight what one of these quantities adds on top of the others

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you propose to only plot corrected p-values?

@lionelkusch lionelkusch mentioned this pull request Dec 27, 2024
Copy link
Collaborator

@jpaillard jpaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general question: Is using a different template for this method than LOCO / PI / CPI desirable?
A key difference is that it does not use data splits (train/test) which makes .fit and .score unecessary, but also makes the syntax inconsistent across methods.

I would suggest to at least maximize the consistency between methods that do not require data splitting (ada-svr, knockoffs ...)

doc_conf/references.bib Show resolved Hide resolved
hidimstat/visualisation/plot_dataset.py Outdated Show resolved Hide resolved
plt.subplots_adjust(top=1.0, hspace=0.3)


def plot_pvalue_H1(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If plotting functions are kept, I think this one should be merged with plot_H0

Also one_minus_pvalue should not be a required argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "one_minus_pvalue should not be a required argument." ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plot_pvalue_H1 takes as arguments (
pvalue,
pvalue_corrected,
one_minus_pvalue,
one_minus_pvalue_corrected)
I one_minus_pvalue could simply be computed inside the function to simplify function calls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping it because it was the output of the existing function in stat_tools.
By looking more in details, I think we can remove the one_minus_pvalue and one_minus_pvalue_corrected of these functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, why should we do that ? This was here for numerical reasons...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to keep them in memory if we provide a function which deals with this numerical error.

From the code point of view, the computation of 1-pvalue is always the same and it can be seen as a duplicate line of code. I prefer to propose a function than the value.

We can discuss it in issue #107.

pvalue, pvalue_corrected, one_minus_pvalue, one_minus_pvalue_correlation = (
ada_svr_pvalue(beta_hat, scale)
)
plot_pvalue_H0(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion:

  • Having predefined functions makes it harder for users to tweak the plotting script to adapt the figure to their constraints (publication template, presentation ... )
  • The plotting functionalities are currently not too heavy; it wouldn't reduce the readability of the notebook to include them.
  • There is redundancy in plotting coefficients, p-values, and corrected p-values. Unless you want to highlight what one of these quantities adds on top of the others

@lionelkusch
Copy link
Collaborator Author

As a general question: Is using a different template for this method than LOCO / PI / CPI desirable? A key difference is that it does not use data splits (train/test) which makes .fit and .score unecessary, but also makes the syntax inconsistent across methods.

I would suggest to at least maximize the consistency between methods that do not require data splitting (ada-svr, knockoffs ...)

See the issue #104 for the discussion around this topic.

@bthirion
Copy link
Contributor

Note that Ada-SVR is different from Knockoffs and CPI: it is variable importance estimator tied to a given model (essentially a linear regression). In some sense, it is comparable to the desparsified Lasso.
By contrast, KO and CPI are plug-in approaches that can work with many different estimators.

Again, let me reiterate that Ada-SVR is not something we want to put forward. I think that we should not base our APÏ discussions on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants