Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desparsified lasso(1/4): add comments and docstring of the functions #127

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

lionelkusch
Copy link
Collaborator

I improved the comments and the docstring.
Some modifications are related to the issues #112 and #60 .
I let some elements to part to check or to validate:

  1. Do we want other functions for the estimation of the standard deviation of the noise? (L116 of desparsified_lasso.py)
  2. The lambda_max function can be used in the lines L135 and L340 of desparsified_lasso.py. Do we use this function or not?
  3. The calculation of confint_radius inverse 2 times the omega_diag. I don't see the reason of it. (L173 of desparsified_lasso.py)
  4. The function group_reid as an option for "null model", do we keep it or not? (L212 of noise_std.py)
  5. Why the name of the method in group reid is different than the name of the "function"? (L231 of noise_std.py)
  6. Should I remove the comments L273 of noise_std.py?

Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.99%. Comparing base (19e6cf7) to head (f9f6f0f).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
+ Coverage   81.70%   81.99%   +0.29%     
==========================================
  Files          43       43              
  Lines        2312     2322      +10     
==========================================
+ Hits         1889     1904      +15     
+ Misses        423      418       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@jpaillard jpaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible / would it make sense to have both deparsified_lasso and deparsified_group_lasso in a single function?

For CPI / LOCO / PI, I used a group argument by default = none, corresponding to the non-grouped case.
When group is not none, it should be a dict with keys corresponding to group ids and values to ids of variable belonging to the group.

doc_conf/references.bib Outdated Show resolved Hide resolved
Comment on lines 250 to 251
fit_Y : bool, optional (default=True)
Whether to fit Y in noise estimation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not so clear to me. Could you provide some more details?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is linked to my question 4:

The function group_reid as an option for "null model", do we keep it or not? (L212 of noise_std.py)

In the Reid function, there is an option to not refit y. I don't understand either what is the usage of this parameter.

@lionelkusch
Copy link
Collaborator Author

Would it be possible / would it make sense to have both deparsified_lasso and deparsified_group_lasso in a single function?

For CPI / LOCO / PI, I used a group argument by default = none, corresponding to the non-grouped case. When group is not none, it should be a dict with keys corresponding to group ids and values to ids of variable belonging to the group.

The group has a different meaning in this case. It's related to time.
However, I group the function for avoiding duplicated code.
Can you tell me if the code is still readable?

Copy link
Contributor

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thax for improving this. Please find some comments enclosed.

@@ -65,11 +67,6 @@ def hd_inference(X, y, method, n_jobs=1, memory=None, verbose=0, **kwargs):
n_jobs : int or None, optional (default=1)
Number of CPUs to use during parallel steps such as inference.

memory : str or joblib.Memory object, optional (default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove the memory argument ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is for optimisation of the calculation by memorising the results of a call of a function with the same arguments. I don't think the basic user requires it and I don't take the time to look in detail if it's very efficient.
I think that it's interesting when the function is run multiple times on the same data but I don't think that it's important to keep it for the moment because it should be the case.

src/hidimstat/clustered_inference.py Outdated Show resolved Hide resolved
@@ -178,7 +169,6 @@ def clustered_inference(
method="desparsified-lasso",
seed=0,
n_jobs=1,
memory=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: why get rid of memory ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -113,11 +112,6 @@ def ensemble_clustered_inference(
Number of CPUs used to compute several clustered inference
algorithms at the same time.

memory : str, optional (default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

error_ratio = cov_hat / cov

assert_almost_equal(np.max(error_ratio), 1.0, decimal=0)
assert_almost_equal(np.log(np.min(error_ratio)), 0.0, decimal=1)


def test_reid_exception():
"test the exceptions of reid"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a one-line docstring ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/hidimstat/desparsified_lasso.py Outdated Show resolved Hide resolved
src/hidimstat/desparsified_lasso.py Outdated Show resolved Hide resolved
Lower bounds of confidence intervals
cb_max : array-like
Upper bounds of confidence intervals
If confidence_interval_only=False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this pattern of having different outputs. One solution is to always send the full version. Another possibility is to use a dictionary. But I think that systematically doing the full version is better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a boolean for choosing between the difference of outputs and this allows the user to choose what it wants.
If you want to always have the full output, I can remove the boolean associate of these difference outputs.

src/hidimstat/desparsified_lasso.py Outdated Show resolved Hide resolved
src/hidimstat/desparsified_lasso.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants