-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More flexible model and contrast definition #362
Comments
Thanks for the issue! Feedback:
But:
I'd suggest a couple of POC PRs, and going from there. |
I agree, we should settle on 1, max. 2 options. My point was mainly to provide an overview of what would be possible.
What kind of validation steps do you invisage here? Wouldn't such a model fail anyway because the design matrix is not full-rank? From your response I deduce the following follow-up tasks
|
It's just that those failures are hard for users to interpret. We should start building a up a set of validation checks to make debugging easier. For example, specifying a model with variables that don't exist in the sample sheet should produce an error telling the user that. There's already a validation step that e.g. checks the contrasts are compatible with the sample sheet, we could maybe add model checks there as well, to prevent adding a new step https://github.com/pinin4fjords/shinyngs/blob/develop/exec/validate_fom_components.R |
Hello @grst! Thank you for proposing this! It looks good to me.
This is because the method that we are working on (ie. propd) can work with multiple conditions (even though now this is not allowed in the module, so that it remains coherent with the current contrast implementation). |
Hi @suzannejin, I'll post a separate issue to discuss the contrast specification in more detail and we can certainly take it up there. Potentially we also need to accomondate different ways of specifying contrasts for different methods. About your specific example
what would that mean? Compare both B and C separately to the baseline A? Or all against all? |
@grst Sure! The flexibility to accomodate different ways could be nice. In the specific case I was mentioning above is more for all against all. It was just to mention this as potential option to consider, but it is not so important to have this option for the moment though, as it asks a slightly different question: is a gene changing across the different conditions. |
so basically the ANOVA case? |
yes exactly! |
Suggested way forward:
Then work on additional contrast types (#377). |
Those use cases would be a very powerful addition. However, I would like to highlight, that the handling and interpretation of log2 fold need careful examination. Typically, the significance estimate (p-value) is used for analysis in these cases. To ensure interpretable log2 values, it is often necessary to perform an additional pairwise test. This consideration applies to use cases modeled with both full and reduced models, encompassing the mentioned example as well. |
@nschcolnicov @alanmmobbs93 to move this forward over the christmas break:
|
Description of feature
Currently, models and contrasts to compare are defined in the
contrast.csv
file as follows:This limits contrasts to pairwise comparisons of a single variable and the model definition is implicit based on the variable and blocking variables. More complex comparisons such as interaction terms or mixed effects are not possible or require workarounds like constructing artificial variables combining multiple variables.
I propose to make the model definition more flexible by
~ treatment + response
(variable + covariate) ,~ treatment * response
(interaction model). This would also extend to mixed effects models, should they be added in the future, e.g.~ treatment * timepoint + (1 | patient_id)
How to define contrasts
There are different ways to define contrasts beyond simple comparisons. We need to discuss which ones to support and how.
(variable, baseline, target)
tuples -> I'd definitely keep this one way or another as this cover a lot of cases and is very intutitive to define. However, we still need alternatives for more complex casestreatmenta:responseresponder
responseresponder - responsenon_responder
, orcond(response="responder") - cond(response="non_responder")
(0, -1, 1, 0, 0, 0, 0)
The challenge for 2-3 is to define this in a way that's method-agnostic. We'd likely need a step that converts a coefficient name or contrast formula into a contrast vector. One such way is limma's
makeContrasts
function. Another way is thecond()
method implemented in glmGamPoi:If anyone's aware of other alternatives, I'm happy to consider them.
Configuration format
In principle it would still be possible to use a CSV file, e.g.
Alternatively, I could see benefits from switching to json/yaml with an appropriate json schema for validation. IMO multiple contrasts per model and different ways to specify contrasts could be better represented with a hierarchical configuration format than csv. Having
;
-separated fields within a csv column as it is currently implemented forblocking
is a bit of a red flag as it's hard to read and hard to validate.LMK what you think
CC @apeltzer @tschwarzl @nschcolnicov @atrigila @alanmmobbs93
FYI @suzannejin
The text was updated successfully, but these errors were encountered: