Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model template #104

Open
lionelkusch opened this issue Dec 27, 2024 · 7 comments
Open

Model template #104

lionelkusch opened this issue Dec 27, 2024 · 7 comments
Labels
management of project question regarding the policy of the project method implementation Question regarding methods implementations

Comments

@lionelkusch
Copy link
Collaborator

Based on the PR #58, #73, #100, #101, #102, I propose the following requirement for each model:
acn: acronym
name: full name of the model

acn.py (python file)

import ...

__all__ = [acn, acn_...]

def acn(
    X: np.ndarray[Any, np.dtype[np.float64]],
    y: np.ndarray[Any, np.dtype[np.float64]],
    karg_1:type =...,
    karg_2:type =...)
    ""
    name
   
    short description :footcite:t:`.....`

    Parameters
    -----------------
    X : ndarray, shape (n_samples, n_features)
        Data.

    y : ndarray, shape (n_samples,)
        Target.

    karg_1:....

    karg_2:...

    Returns
    -----------
    .... : array, shape (n_features,)
   

    References
    ----------
    .. footbibliography::

    """
    assert ....

    //function implementation
   
    assert ....

   return ...

// additional functions (example: compute pvalues)
def acn_...(
arg_1:type,
...
karg_1:type, ..
):
    ""   
    short description

    Parameters
    -----------------
    arg_1:type ...
       .......

    karg_1:....

    Returns
    -----------
    .... : ....

    """
    assert ....

    //function implementation
   
    assert ....

   return ...

// private functions
def _prf_1(
arg_1:type,
...
karg_1:type, ..
):
    ""
    short description

    Parameters
    -----------------
    arg_1:type ...
       .......

    karg_1:....

    Returns
    -----------
    .... : ....

    """
    assert ....

    //function implementation
   
    assert ....

   return ...

test/test_acn.py (test for the methods)

import ...

\\unit tests
def test_acn_...():
     ""
     Description of the test (try to reduce to unique one call by functions)
     ""
    .....
    assert ...

\\optional tests (functional, ....)
def test_acn_...():
     ""
     Description of the test
     ""
    .....
    assert ...

example/inference_models/acn.py (example of the methods)

""
acn: name
==================================================================
short description
"""

#############################################################################
# Imports needed for this script
# ------------------------------
import ...

#############################################################################
# dataset
# --------------------

...
plot_dataset()

#############################################################################
# Usage the methods
# -----------------
# see the API for more details about the optional parameter:
# :py:func:`hidimstat.acn`

output = acn(X,y)

#############################################################################
# description of the output
#

#############################################################################
# Plot the results
# ----------------
# 

....

#############################################################################
# Interpretation of result
#.....

############################################################################
#
# Principle of the methods
# ------------------------
# .....

#############################################################################
# Assumptions, Advantages and Disadvantages
# -----------------------------------------
# 
# **Assumptions**:
# ...
# 
# **Advantages**:
#....
# 
# **Disadvantages**:
# ....
#

#############################################################################
# References
# ----------
# .. footbibliography::
@lionelkusch lionelkusch added method implementation Question regarding methods implementations management of project question regarding the policy of the project labels Dec 27, 2024
@lionelkusch
Copy link
Collaborator Author

This template work only for the functions which doesn't require fitting methods.
When a fitting methods is required, I propose to use a class like for LOCO or CPI and the all the function will be present in the class as methods.

@lionelkusch
Copy link
Collaborator Author

@bthirion @jpaillard @AngelReyero.
What do you think?

@AngelReyero
Copy link
Collaborator

Therefore is it only for the Permutation Feature Importance? What other methods do not need refitting? The only difference with the other methods is the fit function?

@lionelkusch
Copy link
Collaborator Author

The knockoff, clustered_inference, dcrt_zero, desparsified_lasso are functions which don't need fitting.
From my analysis of the library, actually, only two methods need a fit function: CPI and LOCO.

For me, the fit function means that the algorithm is required to keep an internal state for applying other functions. This is not the case for most algorithms instead because there is fast or their output contains all information.

@bthirion
Copy link
Contributor

I agree with the general outline.
One aspect is that we won't create one example for each function of the library. Examples are here to guide users toward the main information. They should not be exhaustive.
Please also remember that structure is here to help by providing guidelines and a common understanding. What matters first is functionality, clarity of the material and making maintenance easy.

@AngelReyero
Copy link
Collaborator

In the case of the knockoffs for instance, wouldn't we also need a similar internal state to keep the information estimated to generate the knockoffs? For example, if using the generating method from https://arxiv.org/abs/2407.06892 there is need to fit multiple regressors. Similarly for other generating methods based on covariance estimation, where in order to generate knockoffs it is necessary to keep the covariance estimate.

@lionelkusch
Copy link
Collaborator Author

Yes, I released that some methods require having access to the estimator, such as knockoffs or permutation_test. This can be generalised to most of the methods.

If we stay with functions, which signatures are the most interesting for you:
def acn(
X: np.ndarray[Any, np.dtype[np.float64]],
y: np.ndarray[Any, np.dtype[np.float64]],
estimator: scikit-learn.BaseEstimator,
karg_1:type =...,
karg_2:type =...)

def acn(
X: np.ndarray[Any, np.dtype[np.float64]],
estimator: scikit-learn.BaseEstimator,
karg_1:type =...,
karg_2:type =...)

In the second case, 'y' is computed at the beginning of the function.
For the estimator, do you want to impose a requirement that the estimator is already fit it or not?

However, if we have the estimators as parameters, I will prefer a class implementation for getting track of the estimators, especially if we need to fit them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
management of project question regarding the policy of the project method implementation Question regarding methods implementations
Projects
None yet
Development

No branches or pull requests

3 participants