Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desparsified lasso(1/4): add comments and docstring of the functions #127

Open
wants to merge 47 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
99fa174
Remove the method parameter because only lasso is supported
lionelkusch Jan 2, 2025
3eebe97
Comment reid procedure
lionelkusch Jan 2, 2025
ffbb6e9
update comment
lionelkusch Jan 2, 2025
f92ea4d
Comments desparsified and reorganise the structure
lionelkusch Jan 2, 2025
d465c36
Improve comment and let's 2 question?
lionelkusch Jan 3, 2025
eeae34b
Comment deparsified lasso grouping
lionelkusch Jan 3, 2025
d23e3d6
format
lionelkusch Jan 3, 2025
4ad785c
comment group_red
lionelkusch Jan 3, 2025
4787c27
Comment emperical snr
lionelkusch Jan 3, 2025
97ff04f
Comment the side function of desparsified lasso
lionelkusch Jan 3, 2025
4cdaf90
Fix bugs
lionelkusch Jan 3, 2025
931e52f
Put back stationnary noise
lionelkusch Jan 3, 2025
d3d91e8
Fix the changement of signature
lionelkusch Jan 3, 2025
92c7f57
Modify documentation to include new function
lionelkusch Jan 3, 2025
8366e70
Add a question
lionelkusch Jan 3, 2025
e81e3db
Improve commit
lionelkusch Jan 13, 2025
9052326
remove unecesary parameter
lionelkusch Jan 13, 2025
7e4435a
Merge branch 'main' into PR_desparsified_lasso
lionelkusch Jan 13, 2025
8068e89
Fix citation
lionelkusch Jan 13, 2025
b4055a3
Add reference to citation
lionelkusch Jan 13, 2025
109dcd2
Improve docstring with copilot
lionelkusch Jan 14, 2025
0ca0ec1
Add test
lionelkusch Jan 14, 2025
977cd30
Improve docstring
lionelkusch Jan 14, 2025
f25cc8c
Format files
lionelkusch Jan 14, 2025
f304a5e
Format files
lionelkusch Jan 14, 2025
aa44640
Remove memory parameters
lionelkusch Jan 14, 2025
ab0ebd5
Update the function
lionelkusch Jan 15, 2025
9081811
Group the function in one
lionelkusch Jan 15, 2025
e2dd939
Update the tests
lionelkusch Jan 15, 2025
aea49d6
Fix usage of the function
lionelkusch Jan 15, 2025
4b376c7
Remove one option for confidence interval
lionelkusch Jan 15, 2025
6c9989a
Formating
lionelkusch Jan 15, 2025
0511cc6
Formating files
lionelkusch Jan 15, 2025
942a508
update reference
lionelkusch Jan 15, 2025
6001f9b
Fix bugs
lionelkusch Jan 15, 2025
ef5ec38
formating
lionelkusch Jan 15, 2025
c5a070f
Merge branch 'main' into PR_desparsified_lasso
lionelkusch Jan 17, 2025
9a099b9
Improve coverage
lionelkusch Jan 17, 2025
e9ba724
Improve coverage
lionelkusch Jan 17, 2025
44985a1
Apply suggestions from code review
lionelkusch Jan 20, 2025
367d0df
Update the doctsring of a file
lionelkusch Jan 20, 2025
0d09b9b
Apply suggestions from code review
lionelkusch Jan 20, 2025
7c6ada3
Replace lambda by alpha
lionelkusch Jan 20, 2025
9dbebc4
Merge branch 'PR_desparsified_lasso' of https://github.com/lionelkusc…
lionelkusch Jan 20, 2025
91c8b67
Replace distrib by distribution
lionelkusch Jan 20, 2025
1b381a5
format
lionelkusch Jan 20, 2025
f9f6f0f
Add dimension of array in docstring
lionelkusch Jan 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc_conf/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@ Functions
clustered_inference
data_simulation
desparsified_lasso
desparsified_lasso_pvalue
desparsified_group_lasso_pvalue
ensemble_clustered_inference
group_reid
reid
hd_inference
knockoff_aggregation
model_x_knockoff
Expand Down
101 changes: 100 additions & 1 deletion doc_conf/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -177,4 +177,103 @@ @article{liuFastPowerfulConditional2021
archiveprefix = {arxiv},
keywords = {Statistics - Methodology},
file = {/home/ahmad/Zotero/storage/8HRQZX3H/Liu et al. - 2021 - Fast and Powerful Conditional Randomization Testin.pdf;/home/ahmad/Zotero/storage/YFNDKN2B/2006.html}
}
}

@article{zhang2014confidence,
title={Confidence intervals for low dimensional parameters in high dimensional linear models},
author={Zhang, Cun-Hui and Zhang, Stephanie S},
journal={Journal of the Royal Statistical Society Series B: Statistical Methodology},
volume={76},
number={1},
pages={217--242},
year={2014},
publisher={Oxford University Press}
}

@article{van2014asymptotically,
title={On asymptotically optimal confidence regions and tests for high-dimensional models},
author={van de Geer, Sara and B{\"u}hlmann, Peter and Ritov, Ya'acov and Dezeure, Ruben},
journal={The Annals of Statistics},
pages={1166--1202},
year={2014},
publisher={JSTOR}
}

@article{javanmard2014confidence,
title={Confidence intervals and hypothesis testing for high-dimensional regression},
author={Javanmard, Adel and Montanari, Andrea},
journal={The Journal of Machine Learning Research},
volume={15},
number={1},
pages={2869--2909},
year={2014},
publisher={JMLR. org}
}

@article{bellec2022biasing,
title={De-biasing the lasso with degrees-of-freedom adjustment},
author={Bellec, Pierre C and Zhang, Cun-Hui},
journal={Bernoulli},
volume={28},
number={2},
pages={713--743},
year={2022},
publisher={Bernoulli Society for Mathematical Statistics and Probability}
}

@article{celentano2023lasso,
title={The lasso with general gaussian designs with applications to hypothesis testing},
author={Celentano, Michael and Montanari, Andrea and Wei, Yuting},
journal={The Annals of Statistics},
volume={51},
number={5},
pages={2194--2220},
year={2023},
publisher={Institute of Mathematical Statistics}
}

@phdthesis{chevalier2020statistical,
title={Statistical control of sparse models in high dimension},
author={Chevalier, J{\'e}r{\^o}me-Alexis},
year={2020},
school={Universit{\'e} Paris-Saclay}
}

@article{fan2012variance,
title={Variance estimation using refitted cross-validation in ultrahigh dimensional regression},
author={Fan, Jianqing and Guo, Shaojun and Hao, Ning},
journal={Journal of the Royal Statistical Society Series B: Statistical Methodology},
volume={74},
number={1},
pages={37--65},
year={2012},
publisher={Oxford University Press}
}

@article{reid2016study,
title={A study of error variance estimation in lasso regression},
author={Reid, Stephen and Tibshirani, Robert and Friedman, Jerome},
journal={Statistica Sinica},
pages={35--67},
year={2016},
publisher={JSTOR}
}

@article{chevalier2020statistical,
title={Statistical control for spatio-temporal meg/eeg source imaging with desparsified mutli-task lasso},
author={Chevalier, J{\'e}r{\^o}me-Alexis and Salmon, Joseph and Gramfort, Alexandre and Thirion, Bertrand},
journal={Advances in Neural Information Processing Systems},
volume={33},
pages={1759--1770},
year={2020}
}

@article{eshel2003yule,
title={The yule walker equations for the AR coefficients},
author={Eshel, Gidon},
journal={Internet resource},
volume={2},
pages={68--73},
year={2003}
}

13 changes: 9 additions & 4 deletions examples/plot_2D_simulation_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,13 @@
from sklearn.feature_extraction import image

from hidimstat.clustered_inference import clustered_inference
from hidimstat.desparsified_lasso import desparsified_lasso
from hidimstat.desparsified_lasso import (
desparsified_lasso,
desparsified_lasso_pvalue,
)
from hidimstat.ensemble_clustered_inference import ensemble_clustered_inference
from hidimstat.scenario import multivariate_simulation
from hidimstat.stat_tools import pval_from_cb, zscore_from_pval
from hidimstat.stat_tools import zscore_from_pval

#############################################################################
# Specific plotting functions
Expand Down Expand Up @@ -237,8 +240,10 @@ def plot(maps, titles):
# and referred to as Desparsified Lasso.

# compute desparsified lasso
beta_hat, cb_min, cb_max = desparsified_lasso(X_init, y, n_jobs=n_jobs)
pval, pval_corr, one_minus_pval, one_minus_pval_corr = pval_from_cb(cb_min, cb_max)
beta_hat, sigma_hat, omega_diag = desparsified_lasso(X_init, y, n_jobs=n_jobs)
pval, pval_corr, one_minus_pval, one_minus_pval_corr, cb_min, cb_max = (
desparsified_lasso_pvalue(X_init.shape[0], beta_hat, sigma_hat, omega_diag)
)

# compute estimated support (first method)
zscore = zscore_from_pval(pval, one_minus_pval)
Expand Down
13 changes: 9 additions & 4 deletions src/hidimstat/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
from .adaptive_permutation_threshold import ada_svr
from .clustered_inference import clustered_inference, hd_inference
from .desparsified_lasso import desparsified_group_lasso, desparsified_lasso
from .desparsified_lasso import (
desparsified_lasso,
desparsified_lasso_pvalue,
desparsified_group_lasso_pvalue,
)
from .Dnn_learner_single import DnnLearnerSingle
from .ensemble_clustered_inference import ensemble_clustered_inference
from .knockoff_aggregation import knockoff_aggregation
from .knockoffs import model_x_knockoff
from .multi_sample_split import aggregate_quantiles
from .noise_std import group_reid, reid
from .noise_std import reid
from .permutation_test import permutation_test_cv
from .scenario import multivariate_1D_simulation
from .standardized_svr import standardized_svr
Expand All @@ -26,10 +30,11 @@
"clustered_inference",
"dcrt_zero",
"desparsified_lasso",
"desparsified_group_lasso",
"desparsified_lasso_pvalue",
"desparsified_group_lasso_pvalue",
"DnnLearnerSingle",
"ensemble_clustered_inference",
"group_reid",
"reid",
"hd_inference",
"knockoff_aggregation",
"model_x_knockoff",
Expand Down
80 changes: 34 additions & 46 deletions src/hidimstat/clustered_inference.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.utils import resample
from sklearn.utils.validation import check_memory

from .desparsified_lasso import desparsified_group_lasso, desparsified_lasso
from .stat_tools import pval_from_cb
from .desparsified_lasso import (
desparsified_lasso,
desparsified_lasso_pvalue,
desparsified_group_lasso_pvalue,
)


def _subsampling(n_samples, train_size, groups=None, seed=0):
Expand Down Expand Up @@ -45,7 +47,7 @@ def _ward_clustering(X_init, ward, train_index):
return X_reduced, ward


def hd_inference(X, y, method, n_jobs=1, memory=None, verbose=0, **kwargs):
def hd_inference(X, y, method, n_jobs=1, verbose=0, **kwargs):
"""Wrap-up high-dimensional inference procedures

Parameters
Expand All @@ -65,11 +67,6 @@ def hd_inference(X, y, method, n_jobs=1, memory=None, verbose=0, **kwargs):
n_jobs : int or None, optional (default=1)
Number of CPUs to use during parallel steps such as inference.

memory : str or joblib.Memory object, optional (default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove the memory argument ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is for optimisation of the calculation by memorising the results of a call of a function with the same arguments. I don't think the basic user requires it and I don't take the time to look in detail if it's very efficient.
I think that it's interesting when the function is run multiple times on the same data but I don't think that it's important to keep it for the moment because it should be the case.

Used to cache the output of the computation of the clustering
and the inference. By default, no caching is done. If a string is
given, it is the path to the caching directory.

verbose: int, optional (default=1)
The verbosity level. If `verbose > 0`, we print a message before
runing the clustered inference.
Expand All @@ -96,34 +93,28 @@ def hd_inference(X, y, method, n_jobs=1, memory=None, verbose=0, **kwargs):
one_minus_pval_corr : ndarray, shape (n_features,)
One minus the p-value corrected for multiple testing.
"""

if method == "desparsified-lasso":

beta_hat, cb_min, cb_max = desparsified_lasso(
X,
y,
confidence=0.95,
n_jobs=n_jobs,
memory=memory,
verbose=verbose,
**kwargs,
)
pval, pval_corr, one_minus_pval, one_minus_pval_corr = pval_from_cb(
cb_min, cb_max, confidence=0.95
)

elif method == "desparsified-group-lasso":

beta_hat, pval, pval_corr, one_minus_pval, one_minus_pval_corr = (
desparsified_group_lasso(
X, y, n_jobs=n_jobs, memory=memory, verbose=verbose, **kwargs
if method not in ["desparsified-lasso", "desparsified-group-lasso"]:
raise ValueError("Unknow method")
group = method == "desparsified-group-lasso"
print("hd_inference", group, kwargs)
beta_hat, theta_hat, omega_diag = desparsified_lasso(
X, y, group=group, n_jobs=n_jobs, verbose=verbose, **kwargs
)
if not group:
pval, pval_corr, one_minus_pval, one_minus_pval_corr, cb_min, cb_max = (
desparsified_lasso_pvalue(
X.shape[0],
beta_hat,
theta_hat,
omega_diag,
confidence=0.95,
**kwargs,
)
)

else:

raise ValueError("Unknow method")

pval, pval_corr, one_minus_pval, one_minus_pval_corr = (
desparsified_group_lasso_pvalue(beta_hat, theta_hat, omega_diag, **kwargs)
)
return beta_hat, pval, pval_corr, one_minus_pval, one_minus_pval_corr


Expand Down Expand Up @@ -178,7 +169,6 @@ def clustered_inference(
method="desparsified-lasso",
seed=0,
n_jobs=1,
memory=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: why get rid of memory ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

verbose=1,
**kwargs,
):
Expand Down Expand Up @@ -220,11 +210,6 @@ def clustered_inference(
n_jobs : int or None, optional (default=1)
Number of CPUs to use during parallel steps such as inference.

memory : str or joblib.Memory object, optional (default=None)
Used to cache the output of the computation of the clustering
and the inference. By default, no caching is done. If a string is
given, it is the path to the caching directory.

verbose: int, optional (default=1)
The verbosity level. If `verbose > 0`, we print a message before
runing the clustered inference.
Expand Down Expand Up @@ -257,9 +242,6 @@ def clustered_inference(
Spatially relaxed inference on high-dimensional linear models.
arXiv preprint arXiv:2106.02590.
"""

memory = check_memory(memory)

n_samples, n_features = X_init.shape

if verbose > 0:
Expand All @@ -273,20 +255,26 @@ def clustered_inference(
train_index = _subsampling(n_samples, train_size, groups=groups, seed=seed)

# Clustering
X, ward = memory.cache(_ward_clustering)(X_init, ward, train_index)
X, ward = _ward_clustering(X_init, ward, train_index)

# Preprocessing
X = StandardScaler().fit_transform(X)
y = y - np.mean(y)

# Inference: computing reduced parameter vector and stats
print("Clustered inference", kwargs)
beta_hat_, pval_, pval_corr_, one_minus_pval_, one_minus_pval_corr_ = hd_inference(
X, y, method, n_jobs=n_jobs, memory=memory, **kwargs
X, y, method, n_jobs=n_jobs, **kwargs
)

# De-grouping
beta_hat, pval, pval_corr, one_minus_pval, one_minus_pval_corr = _degrouping(
ward, beta_hat_, pval_, pval_corr_, one_minus_pval_, one_minus_pval_corr_
ward,
beta_hat_,
pval_,
pval_corr_,
one_minus_pval_,
one_minus_pval_corr_,
)

return beta_hat, pval, pval_corr, one_minus_pval, one_minus_pval_corr
Loading
Loading