You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The combinations $(a, f)$ and $(c,f)$ are structural zeros (i.e., it's impossible to have non-zero values in these cells). Now, assume I want to fit the model n ~ C(F1):C(F2) on that data as follows
y, X=Formula('n ~ C(F1):C(F2)').get_model_matrix(df, ensure_full_rank=False)
then the corresponding variables C(F1)[T.a]:C(F2)[T.f] and C(F1)[T.c]:C(F2)[T.f] are columns of X. Is there a way to remove these parameters already in the formula? Is there another concept in formulaic to deal with this type of constraints?
The text was updated successfully, but these errors were encountered:
Apologies for the delay in my response. Life has been pretty hectic of late.
At present, there is no way to handle this in Formulaic (short of deleting these columns after the model matrix is created). Is there precedent for supporting this kind of transformation in other formula implementations? (This isn't a requisite for including it in Formulaic, but it does help to think through how others have solved this issue).
If we were to add support for this, I think the easiest approach would be to generate the matrix as is, and then remove any columns that are identically zero. This does mean that some unnecessary work is done, which is a little inelegant... but I'm not sure it makes sense to pass around richer metadata than this. Of course, that means it could just as easily be done outside formulaic too.
In an ideal world, what would you like to see done?
In an ideal world, what would you like to see done?
When creating a model matrix, an extra argument could be supported such as formulaic.model_matrix(formula, df, drop_structural_zeros=True) where all structural zeros have been dropped.
A more manual approach could be to facilitate the resulting ModelSpec with methods to drop columns. This way we can manually modify the spec in an iterative process that is up to the researcher. A researcher could build the full model then check which columns are only 0s (e.g. cols_to_drop = model_mat.columns[((model_mat != 0).sum() == 0)].to_list()) and then drop those columns from the model spec (model_mat.model_spec.drop(cols_to_drop)) returning a new spec, then run get_model_matrix on this updated spec.
What's the preferred way to model structural zeros in a
Formula
?Assume the following toy example: I have a$3\times 2$ contingency table that looks like this
given as a pandas dataframe as follows:
The combinations$(a, f)$ and $(c,f)$ are structural zeros (i.e., it's impossible to have non-zero values in these cells). Now, assume I want to fit the model
n ~ C(F1):C(F2)
on that data as followsthen the corresponding variables
C(F1)[T.a]:C(F2)[T.f]
andC(F1)[T.c]:C(F2)[T.f]
are columns ofX
. Is there a way to remove these parameters already in the formula? Is there another concept informulaic
to deal with this type of constraints?The text was updated successfully, but these errors were encountered: