-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to define complex contrasts? #386
Comments
I think that the YML structure should be defined first, and I'd propose it should be explicit so we could later gather all the components in the way each tool needs it:
For this approach, all scripts should have a first step to parse each formula (from a yml file), iterating over the contrasts IDs to construct what we need. The variables names can be extracted directly from the formula. The cons of this approach is that the user should know how to create the coefficients and at least some knowledge about formulas. YML formatsI think that there should be two main variables here:
Linear modelsmodels:
- formula: "~ genotype + treatment + genotype:treatment"
mixed: false
contrasts:
- id: genotype_a_vs_b
type: simple
comparison:
reference: 'genotypeA.treatmentTreated'
target: 'genotypeB.treatmentTreated' We should be able to compile the contrasts functions: ## FOR LIMMA ---------------------------------------
# Dynamically construct the contrast expression
contrast_expression <- paste(target, " - ", reference)
# Create the contrast matrix with the specified contrast_id
contrasts <- makeContrasts(
contrasts = setNames(list(contrast_expression), contrast_id),
levels = design
)
## FOR DESeq2 --------------------------------------
# Construct the contrast as a list
target <- "genotypeB.treatmentTreated"
reference <- "genotypeA.treatmentTreated"
contrast <- list( c(reference), c(target) )
res <- results(dds, contrast = contrast) Other examples for the same formula, open to get more complex scenarios to think about them models:
- formula: "~ 0 + treatment"
mixed: false
contrasts:
## paste internally tretmentA and treatmentB, how the user is working now
- id: treatment_b_vs_a
type: simple
comparison:
variable: treatment
reference: A
target: B
## Writing the coefficients, it will be just like the previous one
- id: treatment_b_vs_a_explitic # Should be the same as 'treatment_b_vs_a'
type: simple
reference: treatmentA
target: treatmentB
## Writing the coefficients, the reference is the mean of two conditions
- id: treatment_c_vs_ab
type: simple
comparison:
reference: 'treatmentB - treatmentA'
target: 'treatmentC'
## All against all
- id: treatment_all_vs_all ## Asuming that "treatment" has multiple levels
type: pairwise ## Runs all-against-all, taking A as first reference (alphabetically). The code should construct the contrasts programatically
comparison:
variable: 'treatment' Linear mixed models - formula: "~ treatment + (1 | response )"
mixed: true # -> this should trigger `DREAM_DIFFERENTIAL` only. It can also be detected automatically from the formula in R
contrasts:
- id: treatment_a_vs_b
type: simple
comparison:
variable: 'treatment'
reference: 'A'
target: 'B'
- id: treatment_c_vs_ab
type: simple
comparison:
variable: 'treatment'
reference: 'B, A'
target: 'C' |
Description of feature
Follow up of #377. The goal is to find a consensus on how we want users to specify more complex contrasts based on linear model coefficients.
Background
Typically linear (and other) models rely on a design matrix for fitting the model and on contrast vectors for performing comparisons.
A convenient way of constructing such a design matrix is to use a Wilkinson formula. For instance, the formula
~ treatment + response
with the following data frame...... would result in the following design matrix
A comparison can be specified using a contrast vector that has the same columns as the design matrix. For instance, to compare
response
againstprogression
, we'd need the following contrast vectorFor comparing
response
againststable_disease
A handy R library for exploring design matrices is the R package/Shiny app ExploreModelMatrix.
Below I showcase two options to define complex contrasts, namely limma's
makeContrasts
and glmGamPoi'scond
.While both originate from a specific method, they are method agnostic as they generate a numeric contrasts vector.
Limma's makeContrasts function
Pro
Con
responsestable
doesn't have a column in the design matrix because it's the baseline level)This only becomes more complicated with more complex designs, e.g. interaction terms.
glmGamPoi
cond()
glmGamPoi introduces a neat helper function to define contrasts. Using
cond()
one can simply specify a certain set of column/value combinations to retrieve a vector that represents this data in the design matrix. These vectors can be combined into arbitrary contrasts using standard arithmetic operationsPro
Con
(Disclaimer: I'm a big fan of this approach and implemented this in a standalone Python library. The original idea is from glmGamPoi though and it's the only R version I know)
Others
There's also contrast and
contrasts()
from{rms}
, that are a bit similar tocond()
, but they apply the contrast directly to the model and are not implemented for limma or DESeq2.Suggested implementation
Based on one of the solutions above, create a design matrix and contrast vectors early on. Then pass the design and contrast to the respective methods.
CC @tschwarzl @apeltzer @atrigila @nschcolnicov @alanmmobbs93
The text was updated successfully, but these errors were encountered: